Supercharge Your Business with Speed Rob Bearden - Joe Ansaldi | Cloudera 2021
>> Okay. We want to pick up on a couple of themes that Mick discussed, you know, supercharging your business with AI, for example, and this notion of getting hybrid right. So right now we're going to turn the program over to Rob Bearden, the CEO of Cloudera and Manuvir Das who's the head of enterprise computing at NVIDIA. And before I hand it off to Rob, I just want to say for those of you who follow me at the Cube, we've extensively covered the transformation of the semiconductor industry. We are entering an entirely new era of computing in the enterprise and it's being driven by the emergence of data intensive applications and workloads. No longer will conventional methods of processing data suffice to handle this work. Rather, we need new thinking around architectures and ecosystems. And one of the keys to success in this new era is collaboration between software companies like Cloudera and semiconductor designers like NVIDIA. So let's learn more about this collaboration and what it means to your data business. Rob, take it away. >> Thanks Mick and Dave. That was a great conversation on how speed and agility is everything in a hyper competitive hybrid world. You touched on AI as essential to a data first strategy in accelerating the path to value and hybrid environments. And I want to drill down on this aspect. Today, every business is facing accelerating change. Everything from face-to-face meetings to buying groceries has gone digital. As a result, businesses are generating more data than ever. There are more digital transactions to track and monitor now. Every engagement with coworkers, customers and partners is virtual. From website metrics to customer service records and even onsite sensors. Enterprises are accumulating tremendous amounts of data and unlocking insights from it is key to our enterprises success. And with data flooding every enterprise, what should the businesses do? At Cloudera, we believe this onslaught of data offers an opportunity to make better business decisions faster and we want to make that easier for everyone, whether it's fraud detection, demand forecasting, preventative maintenance, or customer churn. Whether the goal is to save money or produce income, every day that companies don't gain deep insight from their data is money they've lost. And the reason we're talking about speed and why speed is everything in a hybrid world and in a hyper competitive climate, is that the faster we get insights from all of our data, the faster we grow and the more competitive we are. So those faster insights are also combined with the scalability and cost benefit that cloud provides. And with security and edge to AI data intimacy, that's why the partnership between Cloudera and NVIDIA together means so much. And it starts with a shared vision, making data-driven decision-making a reality for every business. And our customers will now be able to leverage virtually unlimited quantities and varieties of data to power an order of magnitude faster decision-making. And together we turbo charged the enterprise data cloud to enable our customers to work faster and better, and to make integration of AI approaches a reality for companies of all sizes in the cloud. We're joined today by NVIDIA's Manduvir Das, and to talk more about how our technologies will deliver the speed companies need for innovation in our hyper competitive environment. Okay, Manuvir, thank you for joining us. Over to you now. >> Thank you Rob, for having me. It's a pleasure to be here on behalf of NVIDIA. We're so excited about this partnership with Cloudera. You know, when, when NVIDIA started many years ago, we started as a chip company focused on graphics. But as you know, over the last decade, we've really become a full stack, accelerated computing company where we've been using the power of GPU hardware and software to accelerate a variety of workloads, AI being a prime example. And when we think about Cloudera, and your company, your great company, there's three things we see Rob. The first one is that for the companies that were already transforming themselves by the use of data, Cloudera has been a trusted partner for them. The second thing we've seen is that when it comes to using your data, you want to use it in a variety of ways with a powerful platform, which of course you have built over time. And finally, as we've heard already, you believe in the power of hybrid, that data exists in different places and the compute needs to follow the data. Now, if you think about NVIDIA's mission going forward to democratize accelerated computing for all companies, our mission actually aligns very well with exactly those three things. Firstly, you know, we've really worked with a variety of companies to date who have been the early adopters using the power acceleration by changing their technology and their stacks. But more and more we see the opportunity of meeting customers where they are with tools that they're familiar with, with partners that they trust. And of course, Cloudera being a great example of that. The second part of NVIDIA's mission is we focused a lot in the beginning on deep learning where the power of GPU is really shown through. But as we've gone forward, we found that GPU's can accelerate a variety of different workloads from machine learning to inference. And so again, the power of your platform is very appealing. And finally, we know that AI is all about data, more and more data. We believe very strongly in the idea that customers put their data, where they need to put it. And the compute, the AI compute, the machine learning compute, needs to meet the customer where their data is. And so that matches really well with your philosophy, right? And, and Rob, that's why we were so excited to do this partnership with you. It's come to fruition. We have a great combined stack now for the customer and we already see people using it. I think the IRS is a fantastic example where, literally, they took the workflow they had, they took the servers they had, they added GPUs into those servers. They did not change anything. And they got an eight times performance improvement for their fraud detection workflows, right? And that's the kind of success we're looking forward to with all customers. So the team has actually put together a great video to show us what the IRS is doing with this technology. Let's take a look. >> How you doing? My name's Joe Ansaldi. I'm the branch chief of the technical branch in RAS. It's actually the research division, research and statistical division of the IRS. Basically, the mission that RAS has is we do statistical and research on all things related to taxes, compliance issues, fraud issues, you know, anything that you can think of basically, we do research on that. We're running into issues now that we have a lot of ideas to actually do data mining on our big troves of data, but we don't necessarily have the infrastructure or horsepower to do it. So our biggest challenge is definitely the, the infrastructure to support all the ideas that the subject matter experts are coming up with in terms of all the algorithms they would like to create. And the diving deeper within the algorithm space, the actual training of those algorithms, the number of parameters each of those algorithms have. So that's, that's really been our challenge now. The expectation was that with NVIDIA and Cloudera's help and with the cluster, we actually build out to test this on the actual fraud detection algorithm. Our expectation was we were definitely going to see some speed up in computational processing times. And just to give you context, the size of the data set that we were, the SME was actually working her algorithm against was around four terabytes. If I recall correctly, we had a 22 to 48 times speed up after we started tweaking the original algorithm. My expectations, quite honestly, in that sphere, in terms of the timeframe to get results, was it that you guys actually exceeded them. It was really, really quick. The definite now term, short term, what's next is going to be the subject matter expert is actually going to take our algorithm run with that. So that's definitely the now term thing we want to do. Going down, go looking forward, maybe out a couple of months, we're also looking at procuring some A-100 cards to actually test those out. As you guys can guess, our datasets are just getting bigger and bigger and bigger, and it demands to actually do something when we get more value added out of those data sets is just putting more and more demands on our infrastructure. So, you know, with the pilot, now we have an idea with the infrastructure, the infrastructure we need going forward and then also just our in terms of thinking of the algorithms and how we can approach these problems to actually code out solutions to them. Now we're kind of like the shackles are off and we can just run a, you know, run to our heart's desire, wherever our imaginations takes our SMEs to actually develop solutions. Now have the platforms to run them on. Just kind of to close out, we really would be remiss, I've worked with a lot of companies through the year and most of them been spectacular. And you guys are definitely in that category, the whole partnership, as I said, a little bit early, it was really, really well, very responsive. I would be remiss if I didn't thank you guys. So thank you for the opportunity. Doing fantastic. and I'd have to also, I want to thank my guys. my staff, Raul, David worked on this, Richie worked on this, Lex and Tony just, they did a fantastic job and I want to publicly thank them for all the work they did with you guys and Chev, obviously also is fantastic. So thank you everyone. >> Okay. That's a real great example of speed and action. Now let's get into some follow up questions guys, if I may, Rob, can you talk about the specific nature of the relationship between Cloudera and NVIDIA? Is it primarily go to market or are you doing engineering work? What's the story there? >> It's really both. It's both go to market and engineering The engineering focus is to optimize and take advantage of NVIDIA's platform to drive better price performance, lower cost, faster speeds, and better support for today's emerging data intensive applications. So it's really both. >> Great. Thank you. Manuvir, maybe you could talk a little bit more about why can't we just use existing general purpose platforms that are, that are running all this ERP and CRM and HCM and you know, all the, all the Microsoft apps that are out there. What, what do NVIDIA and Cloudera bring to the table that goes beyond the conventional systems that we've known for many years? >> Yeah. I think Dave, as we've talked about the asset that the customer has is really the data, right? And the same data can be utilized in many different ways. Some machine learning, some AI, some traditional data analytics. So, the first step here was really to take a general platform for data processing, Cloudera data platform, and integrate with that. Now NVIDIA has a software stack called rapids, which has all of the primitives that make different kinds of data processing go fast on GPU's. And so the integration here has really been taking rapids and integrating it into a Cloudera data platform so that regardless of the technique the customer is using to get insight from the data, the acceleration will apply in all cases. And that's why it was important to start with a platform like Cloudera rather than a specific application. >> So, I think this is really important because if you think about, you know, the software defined data center brought in, you know, some great efficiencies, but at the same time, a lot of the compute power is now going towards doing things like networking and storage and security offloads. So the good news, the reason this is important is because when you think about these data intensive workloads, we can now put more processing power to work for those, you know, AI intensive things. And so that's what I want to talk about a little bit, maybe a question for both of you, maybe Rob, you could start. You think about AI that's done today in the enterprise. A lot of it is modeling in the cloud, but when we look at a lot of the exciting use cases, bringing real-time systems together, transaction systems and analytics systems, and real-time AI inference, at least even at the edge, huge potential for business value. In a consumer, you're seeing a lot of applications with AI biometrics and voice recognition and autonomous vehicles and the liking. So you're putting AI into these data intensive apps within the enterprise. The potential there is enormous. So what can we learn from sort of where we've come from, maybe these consumer examples and Rob, how are you thinking about enterprise AI in the coming years? >> Yeah, you're right. The opportunity is huge here, but you know, 90% of the cost of AI applications is the inference. And it's been a blocker in terms of adoption because it's just been too expensive and difficult from a performance standpoint. And new platforms like these being developed by Cloudera and NVIDIA will dramatically lower the cost of enabling this type of workload to be done. And what we're going to see the most improvements will be in the speed and accuracy for existing enterprise AI apps like fraud detection, recommendation engine, supply chain management, drug province. And increasingly the consumer led technologies will be bleeding into the enterprise in the form of autonomous factory operations. An example of that would be robots. That AR, VR and manufacturing so driving better quality. The power grid management, automated retail, IOT, you know, the intelligent call centers, all of these will be powered by AI, but really the list of potential use cases now are going to be virtually endless. >> I mean, Manufir, this is like your wheelhouse. Maybe you could add something to that. >> Yeah. I mean, I agree with Rob. I mean he listed some really good use cases, you know, The way we see this at NVIDIA, this journey is in three phases or three steps, right? The first phase was for the early adopters. You know, the builders who assembled use cases, particular use cases like a chat bot from the ground up with the hardware and the software. Almost like going to your local hardware store and buying piece parts and constructing a table yourself right now. Now, I think we are in the first phase of the democratization. For example, the work we do with Cloudera, which is for a broader base of customers, still building for a particular use case, but starting from a much higher baseline. So think about, for example, going to Ikea now and buying a table in a box, right. And you still come home and assemble it, but all the parts are there, the instructions are there, there's a recipe you just follow and it's easy to do, right? So that's sort of the phase we're in now. And then going forward, the opportunity we really look forward to for the democratization, you talked about applications like CRM, et cetera. I think the next wave of democratization is when customers just adopt and deploy the next version of an application they already have. And what's happening is that under the covers, the application is infused by AI and it's become more intelligent because of AI and the customer just thinks they went to the store and bought a table and it showed up and somebody placed it in the right spot. Right? And they didn't really have to learn how to do AI. So these are the phases. And I think we're very excited to be going there. >> You know, Rob, the great thing about, for your customers is they don't have to build out the AI. They can, they can buy it. And just in thinking about this, it seems like there are a lot of really great and even sometimes narrow use cases. So I want to ask you, you know, staying with AI for a minute, one of the frustrations, and Mick I talked about this, the GIGO problem that we've all, you know, studied in college, you know, garbage in, garbage out. But, but the frustrations that users have had is really getting fast access to quality data that they can use to drive business results. So do you see, and how do you see AI maybe changing the game in that regard, Rob, over the next several years? >> So yeah, the combination of massive amounts of data that had been gathered across the enterprise in the past 10 years with an open APIs are dramatically lowering the processing costs that perform at much greater speed and efficiency. And that's allowing us as an industry to democratize the data access while at the same time delivering the federated governance and security models. And hybrid technologies are playing a key role in making this a reality and enabling data access to be quote, hybridized, meaning access and treated in a substantially similar way, irrespective of the physical location of where that data actually resides. >> And that's great. That is really the value layer that you guys are building out on top of all this great infrastructure that the hyperscalers have have given us. You know, a hundred billion dollars a year that you can build value on top of, for your customers. Last question, and maybe Rob, you could, you could go first and then Manuvir, you could bring us home. Where do you guys want to see the relationship go between Cloudera and NVIDIA? In other words, how should we as outside observers be, be thinking about and measuring your project, specifically in the industry's progress generally? >> Yes. I think we're very aligned on this and for Cloudera, it's all about helping companies move forward, leverage every bit of their data and all the places that it may be hosted and partnering with our customers, working closely with our technology ecosystem of partners, means innovation in every industry and that's inspiring for us. And that's what keeps us moving forward. >> Yeah and I agree with Rob and for us at NVIDIA, you know, we, this partnership started with data analytics. As you know, Spark is a very powerful technology for data analytics. People who use Spark rely on Cloudera for that. And the first thing we did together was to really accelerate Spark in a seamless manner. But we're accelerating machine learning. We're accelerating artificial intelligence together. And I think for NVIDIA it's about democratization. We've seen what machine learning and AI have done for the early adopters and help them make their businesses, their products, their customer experience better. And we'd like every company to have the same opportunity.
SUMMARY :
And one of the keys to is that the faster we get and the compute needs to follow the data. Now have the platforms to run them on. of the relationship between The engineering focus is to optimize and you know, all the, And so the integration here a lot of the compute power And increasingly the Maybe you could add something to that. from the ground up with the the GIGO problem that we've all, you know, irrespective of the physical location that the hyperscalers have have given us. and all the places that it may be hosted And the first thing we did
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
NVIDIA | ORGANIZATION | 0.99+ |
Mick | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
22 | QUANTITY | 0.99+ |
Raul | PERSON | 0.99+ |
Joe Ansaldi | PERSON | 0.99+ |
90% | QUANTITY | 0.99+ |
Richie | PERSON | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
RAS | ORGANIZATION | 0.99+ |
Lex | PERSON | 0.99+ |
second | QUANTITY | 0.99+ |
Ikea | ORGANIZATION | 0.99+ |
Tony | PERSON | 0.99+ |
first phase | QUANTITY | 0.99+ |
IRS | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
first step | QUANTITY | 0.99+ |
eight times | QUANTITY | 0.99+ |
48 times | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.99+ |
Chev | PERSON | 0.99+ |
Firstly | QUANTITY | 0.98+ |
three steps | QUANTITY | 0.98+ |
Today | DATE | 0.98+ |
one | QUANTITY | 0.98+ |
three things | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
first | QUANTITY | 0.96+ |
three phases | QUANTITY | 0.95+ |
Manuvir | ORGANIZATION | 0.95+ |
first one | QUANTITY | 0.95+ |
Manuvir | PERSON | 0.95+ |
Cloudera | TITLE | 0.93+ |
around four terabytes | QUANTITY | 0.93+ |
first strategy | QUANTITY | 0.92+ |
each | QUANTITY | 0.91+ |
last decade | DATE | 0.89+ |
years ago | DATE | 0.89+ |
Spark | TITLE | 0.89+ |
SME | ORGANIZATION | 0.88+ |
Manuvir Das | PERSON | 0.88+ |
Rob Bearden, Hortonworks | theCUBE NYC 2018
>> Live from New York, it's theCUBE, covering theCUBE, New York City, 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> And welcome to theCUBE here in New York City. We're live from CUBE NYC, this is our big data now: AI, now all things cloud 9 years covering the beginning of Hadoop. Now into cloud and data as the center of the value I'm John Furrier with David Vellante. Our special guest is Rob Bearden, CEO of Hortonworks CUBE alumni, been on many times Great supporter of theCUBE, legend in OpenSource Great to see you. >> It's great to be here, thanks. Yes, absolutely. >> So one of the things I wanted to talk to you about is that OpenSource certainly has been a big part of the Ethos, just seeing it in all sectors, again, growing even in Blockchain, Open Ethos is growing. The role of data now certainly in the center. You guys have been on this vision of open data, if you will and making data, and move and flight, maybe rest all these things are going on. Certainly the Hadoop world has changed, not just Hadoop and data lakes anymore, it's data. All things data, it's happening. This is core to your business, you guys have been banging this drum for a long time. Stock's at an all-time high. Congratulations on the business performance. So it's working, things are working for you guys. >> I think the model in this strategy are really coming together nicely. And to your point, it's about all the data. It's about the entire life-cycle of the data and bringing all data under management through its entire life-cycle. And being able to give the enterprise that accessibility to that data across each tier on-prem, private cloud, and across all the multi-clouds. And that's really changed, really in many regards, the overall core architecture of Hadoop and how it needs to manage data. And how it needs to interact with other data sources. And our model and strategy is been about not going above the Hadoop stack, but actually going out to the edge, and bringing data under management from the point of origination through its entire movement life-cycle until it comes at rest, and then have the ability, to deploy and access that data across each tier and across a multi-cloud environment. And it's a hybrid architecture world now. >> You guys have been on this trend for a while now, it's kind of getting lift obviously you're seeing the impact that cloud, impact AI cause the faster computer you have, the faster you can process data, the faster the data can be used, machine learning it's a nice flywheel. So again, that flywheel is being recognized. So I have to ask you, what is in your opinion, been the impact of cloud computing, specifically the Amazons, and the Azures, and now Google where certainly AI is in the center of their proposition, now hybrid cloud is validated with Amazon announcing RDS on the premises on VMWARE. That's the first Amazon ever, ever on premises activity. So this is clearly a validation of hybrid cloud. How has the cloud impacted the data space, and if you will, it used to be data warehousing, cloud has changed that. What's your opinion? >> Well what's it's done is given a, an architectural extension to the enterprise of what their data architecture needs to be, and the real key is, it's now, it's not about hybrid or cloud or on-prem, it's about having a data strategy overall. And how do I bring all my different assets, and bring a connected community together, in real-time? because what enterprise is trying to do is, connect and have higher velocity and faster visibility between the enterprise, the product, their customer, and their supply chain. And to do that, they need to be able to aggregate data into the best economic platform from the point of origination, maybe starting from the component on their product, a single component, and be able to bring all that data together through its life-cycle, aggregate it, and then deploy it on the most economically feasible tier. Whether that's on-prem, or a private cloud, or across multiple public clouds. And our platform with HDF, HDP, and data plane and complete that hybrid data architecture. And by doing that, the real value is then the cloud, AI and machine learning capabilities have the ability now to access all data across the enterprise, whether it be their tier in the cloud, or whether that be on-prem. And our strategy is around bringing that and being that fabric, to bring all the interconnectivity irrespective of whether it sits on the edge and the cloud is somewhere in between. Because the more accessibility AI has to data, the faster velocity of driving value back in to that AI cycle. >> Yeah, people don't want to move data if they don't have to And so, and we've been on this for a while, that this idea that you want to bring the cloud model to your data, and not the data to the cloud always. And so, how do you do that? How do you make it this kind of same, same environment? What role does HortonWorks play in it? >> Well the first thing we want to do is, bring the data under management from and through its life-cycle where HDF goes to the edge, brings the data through its movement cycle, aggregates the streams. HDP is the data at rest platform that can sit on-prem and a public cloud or a private cloud. And then data plains that fabric, that ensures that we have connectivity to all types of data across all tiers. And then serves as the common security and governance framework, irrespective of which tier that is. And that's very very important. And then that then gives the AI platforms the ability to bring AI onto a broader array of data, that they can then have a higher and better impact on it than just having an isolated AI impact on just a single tier I data in the cloud. >> Well that messages seems to be resonating, we talked earlier about the stock price, but also I think Neil Bushery and Frank Sluben popularized the metric of number of seven-figure deals. You guys are closing some big deals, and remember in the early days Robert Vor Breath, people are like how these guys going to sell anything, it's all open-source and you're doing a lot of a million plus dollar deals. So it's resonating not only with the streep but also with enterprises, your thoughts. >> Last quarter we, I think the key is that the industry really understands, the investors understand, the enterprises really now understand the importance of hybrid and hybrid cloud. And it's not going to be all about managing data lakes on-prem. All the data's not going to go and have this giant line of demarkation and now all reside in the cloud. It has to coexist across each tier and our role is to be that aggregation point. >> And you've seen the big cloud players now, all it's the big three, all have on-prem strategies. Azure with Azure Stack, Google we saw Kubernetes on-prem, and even AWS now, the last load up putting RDS on-prem announced that VMWorld. So they've all sort of recognized that not everything's going to go into the cloud. So that's got to be, you know good confirmation for you guys >> It's great validation. What is also says though is, we must have cloud first architecture and a cloud first approach with all of our tech. And the key to that is, from our standpoint, within our strategy is to containerize everything. And we had an announcement earlier this week that was really a three-way announcement between us, Red Hat, and IBM; and the essence of that announcement is we've adopted the Kubernetes distro from Red Hat. To where we're are containerizing all of our platforms with Red Hat's Kubernetes distribution. And what that does, is gives us the ability to optimize our platforms for OpenShift, the Red Hat pass, and optimize then the deployment of that and the IBM private cloud, right. And naturally data plane will also then give us the ability, to extend those workloads; those very granular workloads up in to the public clouds, and we can even leverage their native objects stores. >> So that's an interesting love triangle right? You and Red Hat are kind of birds of a feather with open-source. IBM has always been a big proponent of open-source, you know funded Linux in the early days. And then brings this, a massive channel and brand, you know to that world. >> Yes. And you know this is really going to accelerate our movement into a cloud first architecture, with pure containerization. And the reason that's so important is, it gives us that modularity to move those applications and those workloads, across whichever tiers most appropriate architecturally for it to run and be deployed. >> You know we said this on theCUBE many many years ago, and continues to be this theme, enterprise is one really wanting hardened solutions, but they don't mind experimenting. And Stu Miniman and I, were always talking about and comparing OpenStack ecosystem to what's happened in the Hadoop ecosystem. There's some pockets of relevance and it's a lot of work to build your own, and OpenStack has a great solution for certain use cases, now mostly on the infrastructure side But when cloud came in and changed the game, because you saw things like Kubernetes. I mean we're here at the Hadoop show that started with Hadoop, now it's AI, the word Kubernetes is being talked about. You mentioned hybrid cloud, these aren't words that were spoken at an event like this. So the IT problem in multi-cloud has always been a storage issue. So you do some storage work, you got to store the data somewhere, but now you're talking about Kubernetes. You're talking about orchestration around workloads, the role of data in workloads. This is what enterprise IT actually cares about right now. This is not like, a small little thing, it's a big deal because data is not only in the workloads, they're using instrumentation with containers, with service meshes around the coin. You're starting to see policy, this is hardcore B2B enterprise features. >> This is where with what we're seeing is a massive transformational shift of how the IT architecture's going to look for the next 20 years. Right. The IT world it is been horribly constrained from this very highly configured, very procedural-based applications and now they want to create high velocity engagement between the enterprise, their product, their customer and supply chain. They were so constrained with these very procedural-based applications and containerization gives the ability now to create that velocity and to move those workloads, and those interactions between that four pillars. >> Now let's talk about the edge. Cause the pendulum is clearly swinging sort of back to some decentralization going on, and the edge to us is a data play. We talk about it all the time. What are your thoughts on the edge, where does HortonWorks fit? What's your vision of the data modeling and how that evolves? >> That goes back to, the insight to that would be our strategy and what we did and had the great fortune, quite frankly, of having the ability to merge on Yara and HortonWorks back in 2015. And we wanted, and the whole goal of that besides working with a great team, Joe Witt had built, is being able to get to the edge. And what we wanted to have the ability to do, was to operate on every sensor, on every device at the edge for the customer so that they could bring the data under management whenever that may be, through its entire life-cycle; so from point of origination through its movement until it comes at rest. So our belief is that if we can bring enough intelligence and faster insights as that data is being generated, and as events or conditions are happening, moving, or changing before it ever comes to rest we can process and take prescriptive action. Leveraging AI and machine learning as it's in its life-cycle we can dramatically decrease the amount of data we have to bring to rest. We can just bring the province the metadata to rest and have that insight. And we try to get to these high velocity, real-time insights starting with the data on the edge. And that's why we think it's so important to manage the entire life-cycle. And then, what's even more important is then put that data, on to what ever tier. That may be bring it back to rest in a day like on-prem, right, to aggregate with other like data structures. Or it may be, take it into cold storage on a native object store in a cloud, that has the lowest cost of storage structure for a particular time. >> Or take an action on the edge and leave it there. >> Yeah. You guys definitely think about the edge in a big way, that's pretty obvious. But what I want to get your thoughts on is an emerging area we're watching, and I'll call it for lack of a better description, programmable data. And you mentioned data architecture is being setup probably set a 10, 20 year run for enterprises they setup their data architecture with the cloud architects. Making data programmable is kind of a dev-ops concept right. And this is something that you guys have thought about with the data plane, what's your reaction to this notion of making data programmable? When you start talking about Kubernetes, you're going to have statefull applications, stateless applications, you have new dynamics I call it API 2.0 happening. Whole new infrastructure happening, data has to be programmable, going to need policy around it, the role of data's certainly changing rather than storing it somewhere. What's your view of programmable data, making it programmable? >> Well you've got to be able to, to truly have programmable data, you can't have slices of accessibility or window. You have to understand the lineage of that entire data, and the context of that data through its entire life-cycle. That's step and point number one. Point number two is, you have to be able to have that containerized so that you can take the module of data that you want to take prescriptive action against, or create action against a condition. And to be able to do that in granular bites or chunks, right. And then you've got to have accessibility to all the other contextual data, which means whether that's as its in motion as its at rest or, as its contextual cousin if you will, that sits up in an object store on another tier in a public cloud. Right. But what's important is that you have to be able to control and understand the entire lineage of that. And therefore, that's where our second step in this is data plane. And having the ability to have a full security model through that entire architectural chain, as well as the entire governance and lineage leveraging, leveraging atlas through data plane. And that then gives you the ability to take these very prescriptive actions that are driven through AI and machine learning insights. >> And that makes you very agile, love it. I mean the ethos of open-source and dev-ops is literally being applied to every thing. We see it with at the network layer, you see it at the data layer, you're starting to see this concept of dev and ops being applied in a big way. >> The next you know, previous years we've talked about what we're trying to accomplish. And we've started HortonWorks, it was about changing the data architecture for the next 20 years and how data was going to be managed. And that's had, to your earlier point we opened up the show, that's had twists and turns. Hadoop's evolved, the nature and velocity of data has evolved in the last five, six, seven, eight years you know. It's about going to the edge, it's about leveraging the cloud and we're very excited about where we're positioned as this massive transformation's happening. And what we're seeing is the iteration of change, is happening at an incredibly fast pace. Even much more so than it was two, three years ago. >> Yeah, the clock speeds definitely up, their data is working. People putting it to work. What works... >> They're able to get more value faster because of it. >> The AI is great. >> The data economy is here and now. And the enterprise understands it. So they want to now move aggressively to change and transform their business model to take advantage of what their data is giving them the ability to do. >> That's great. They always want the value, and they want it fast and anything gets in the way they'll remove the blockers as what we say. >> Alright, it's theCUBE here Rob Bearden, CEO of Hortonworks giving his vision but also an update on the company; data at the center of the value proposition. This is about AI, it's about big data, it's about the cloud. It's theCUBE bringing you, theCUBE data here in New York City. CUBENYC, that's the hashtag; check us out on Twitter. Stay with us for a live coverage all day today and tomorrow here in New York City. We'll be right back after this short break. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media Now into cloud and data as the center of the value It's great to be here, thanks. So one of the things I wanted to talk to you about above the Hadoop stack, but actually going out to the edge, How has the cloud impacted the data space, and if you will, have the ability now to access all data across the and not the data to the cloud always. HDP is the Well that messages seems to be resonating, And it's not going to be So that's got to be, you know good confirmation for you guys And the key to that is, from our standpoint, And then brings this, a massive channel and brand, And the reason that's because data is not only in the workloads, they're using containerization gives the ability now to create going on, and the edge to us is a data play. the metadata to rest and have that insight. And this is something that you guys have thought about And having the ability to have a full security model And that makes you very agile, love it. And that's had, to your earlier point we opened up the show, Yeah, the clock speeds definitely up, their data And the enterprise understands it. and they want it fast and anything gets in the way it's about the cloud.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Frank Sluben | PERSON | 0.99+ |
2015 | DATE | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
10 | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Yara | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Joe Witt | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
tomorrow | DATE | 0.99+ |
Amazons | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Kubernetes | TITLE | 0.99+ |
CUBE | ORGANIZATION | 0.99+ |
second step | QUANTITY | 0.99+ |
VMWorld | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Last quarter | DATE | 0.99+ |
HortonWorks | ORGANIZATION | 0.99+ |
Robert Vor Breath | PERSON | 0.98+ |
first | QUANTITY | 0.98+ |
Neil Bushery | PERSON | 0.98+ |
six | QUANTITY | 0.98+ |
each tier | QUANTITY | 0.98+ |
Hadoop | TITLE | 0.97+ |
seven-figure deals | QUANTITY | 0.97+ |
Point number two | QUANTITY | 0.97+ |
two | DATE | 0.97+ |
seven | QUANTITY | 0.97+ |
theCUBE | ORGANIZATION | 0.97+ |
OpenShift | TITLE | 0.97+ |
OpenSource | ORGANIZATION | 0.96+ |
each tier | QUANTITY | 0.96+ |
2018 | DATE | 0.96+ |
three years ago | DATE | 0.95+ |
earlier this week | DATE | 0.95+ |
first thing | QUANTITY | 0.93+ |
eight years | QUANTITY | 0.93+ |
single component | QUANTITY | 0.93+ |
VMWARE | TITLE | 0.93+ |
Linux | TITLE | 0.92+ |
first approach | QUANTITY | 0.92+ |
point number one | QUANTITY | 0.9+ |
first architecture | QUANTITY | 0.9+ |
Red Hat | ORGANIZATION | 0.88+ |
NYC | LOCATION | 0.88+ |
Ethos | ORGANIZATION | 0.88+ |
CEO | PERSON | 0.88+ |
Open Ethos | ORGANIZATION | 0.88+ |
one | QUANTITY | 0.87+ |
three-way announcement | QUANTITY | 0.87+ |
next 20 years | DATE | 0.86+ |
Red Hat | TITLE | 0.84+ |
single tier | QUANTITY | 0.83+ |
OpenStack | TITLE | 0.82+ |
20 year | QUANTITY | 0.82+ |
Azure Stack | TITLE | 0.79+ |
9 years | QUANTITY | 0.77+ |
many years ago | DATE | 0.77+ |
Hortonworks CUBE | ORGANIZATION | 0.76+ |
three | QUANTITY | 0.76+ |
Rob Bearden, Hortonworks | DataWorks Summit 2018
>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)
SUMMARY :
in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
James Kobielus | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Jim Kobielus | PERSON | 0.99+ |
London | LOCATION | 0.99+ |
300 passengers | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
Rob | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
hundreds of thousands of dollars | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
each component | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
DataWorks Summit | EVENT | 0.99+ |
one | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
millions of dollars | QUANTITY | 0.98+ |
Atlas | TITLE | 0.98+ |
first steps | QUANTITY | 0.98+ |
HDP 3.0 | TITLE | 0.97+ |
One pane | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
First | QUANTITY | 0.96+ |
next year | DATE | 0.96+ |
each | QUANTITY | 0.96+ |
DataPlane | TITLE | 0.96+ |
theCUBE | ORGANIZATION | 0.96+ |
Hadoop | TITLE | 0.96+ |
DataWorks | ORGANIZATION | 0.95+ |
Spark | TITLE | 0.95+ |
today | DATE | 0.94+ |
EU | LOCATION | 0.93+ |
this morning | DATE | 0.91+ |
Atlanta, | LOCATION | 0.91+ |
Berlin | LOCATION | 0.9+ |
each type | QUANTITY | 0.88+ |
Global Data Protection Regulation GDPR | TITLE | 0.87+ |
one common | QUANTITY | 0.86+ |
few months ago | DATE | 0.85+ |
NiFi | ORGANIZATION | 0.85+ |
Data Platform 3.0 | TITLE | 0.84+ |
each tier | QUANTITY | 0.84+ |
Data Studio | ORGANIZATION | 0.84+ |
Data Studio | TITLE | 0.83+ |
day one | QUANTITY | 0.83+ |
one management platform | QUANTITY | 0.82+ |
MiNiFi | ORGANIZATION | 0.82+ |
San | LOCATION | 0.71+ |
DataPlane | ORGANIZATION | 0.69+ |
Kafka | TITLE | 0.67+ |
Encore ERP | TITLE | 0.66+ |
one common set | QUANTITY | 0.65+ |
Data Steward Studio | ORGANIZATION | 0.65+ |
HDF | ORGANIZATION | 0.59+ |
Georgia | LOCATION | 0.55+ |
announcements | QUANTITY | 0.51+ |
Jose | ORGANIZATION | 0.47+ |
Rob Bearden, Hortonworks & Rob Thomas, IBM | BigData NYC 2017
>> Announcer: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE media, and its ecosystem sponsor. >> Okay, welcome back, everyone. We're here live in New York City for BigData NYC, our annual event with SiliconANGLE Media, theCUBE, and Wikibon, in conjunction with Strata Hadoop, which is now called Strata Data as that show evolves. I'm John Furrier, cohost of theCUBE, with Peter Burris, head of research for SiliconANGLE Media, and General Manager of Wikibon. Our next two guests are two legends in the big data industry, Rob Bearden, the CEO of Hortonworks, really one of the founders of the big data movement, you know, got Cloudaire and Hortonworks, really kind of built that out, and Rob Thomas, General Manager of IBM Analytics. Big-time investments have made both of them. Congratulations for your success, guys. Welcome back to theCUBE, great to see you guys! >> Great to see you. >> Great, yeah. >> And got an exciting partnership to talk about, as well. >> So, but let's do a little history, you guys, obviously, I want to get to that, and get clarified on the news in a second, but you guys have been there from the beginning, kind of looking at the market, developing it, almost from the embryonic state to now. I mean, what a changeover. Give a quick comparison of where we've come from and what's the current landscape now, because you have, it evolved into so much more. You got IOT, you got AI, you have a lot of things in the enterprise. You've got cloud computing. A lot of tailwinds for this industry. It's gotten bigger. It's become big and now it's huge. What's your thoughts, guys? >> You know I, so you look at arcs and really all this started with Hadoop, and Rob and I met early in the days of that. You kind of gone from the early few years is about optimizing operations. Hadoop is a great way for a company to become more efficient, take out costs in their data infrastructure, and so that put huge momentum into this area, and now we've kind of fast-forwarded to the point where now it's about, "So how "am I actually going to extract insight?" So instead of just getting operational advantages, how am I going to get competitive advantage, and that's about bringing the world of data science and machine learning, run it natively on Hadoop, that's the next chapter, and that's what Rob and I are working closely together on. >> Rob, your thoughts, too? You know, we've been talking about data in motion. You guys were early on in that, seeing that trend. Real time is still hot. Data is still the core asset people are trying to figure out and move from wrangling to actually enabling that data. >> Right. Well, you know, in the early days of Big Data, it was, to Rob's point, it was very much about bringing operational leverage and efficiency and being able to aggregate very siloed data sets, and unlocking that data and bringing it into a central platform. In the early days in resources, and Hadoop went to making Hadoop an enterprise-viable data platform, with security, governance, operations, management capability, that mirrored any of the proprietary transactional or EDW platforms, and what the lessons learned in that were, is that by bringing all that data together in a central data set, we now can understand what's happening with our customers, and with our other assets pre-transaction, and so they can become very prescriptive in engaging in new business models, and so what we've learned now is the further upstream we can get in the world of IOT and bring that data under management from the point of origination and be able to manage that all the way through its life cycle, we can create new business models with higher velocity of engagement and a lot more rapid value that gets created. It, though, creates a number of new challenges in all the areas of how you secure that data, how you bring governance across that entire life cycle from a common stream set. >> Well, let's talk about the news you guys have. Obviously, the partnership. Partnerships become the new normal in an open source era that we're living in. We're seeing open source software grow really exponentially in the forecast coming in the next five years and ten years and exponential growth in new code. Just new people coming on board, new developers, dev ops is mainstream. Partnerships are key for communities. 90% of the code is going to be open source, 10%, as they say, the Code Sandwich as Jim Zemlin, the executive director of Linux Foundation, wants to, and you're seeing that work. You guys have worked together with Apache Atlas. What's the news, what's the relationship with Hortonworks and IBM? Share the news. >> So, a lot of great work's been happening there, and generally in the open source community, around Apache Atlas, and making sure that we're bringing missing critical governance capabilities across the big data sets and environments. As we then get into the complexity of now multiple data lakes, multiple tiers of data coming from multiple sources, that brings a higher level of requirement in both the security and governance aspects, and that's where the partnership with IBM is continuing to drive Apache Atlas into mission critical enterprise viability, but then when we get into the distributed models and enterprise requirements, the IBM platforms leveraging Atlas and what we're doing together then take that into the mission critical enterprise capability. >> You got the open source, and now you got the enterprise. Rob, we've talked many times about the enterprise as a hard, hard environment to crack for say, a start up, but even now, they're becoming reliant on open source, but yet, they have a lot of operational challenges. How does this relate to the challenge of, you know, CIO and his staff, now new personas coming in, you seeing the data science role, you see it expanding from analytics to dev ops. A day of challenges. >> Look, enterprises are getting better at this. Clearly we've seen progress the last five years on that, but to kind of go back and link the points, there's a phrase I heard I like. It says, "There's no AI without IA," meaning information architecture. Fundamentally, what our partnership is about is delivering the right information architecture. So it's Hadoop federated with whatever you have in terms of warehouses and databases. We partner around IBM common sequel for that. It's meta data for your core governance because without governance you don't have compliance, you can't offer self-service analytics, so we are forming what I would call the fluid data layer for an enterprise that enables them to get to this future of AI, and my view is there's a stop in between, which is data science, machine learning, applications that are ready today that clients can put into production and improve the outcomes they're getting. That's what we're focused on right now is how do we take the information architecture we've been able to establish, and then help clients on this journey? That's what enterprises want, because that's how they're going to build differentiation in their businesses. >> But the definition of an information architecture is closest to applications, and maybe this informs your perspective, it's close to the applications that the business is running on. Goes back to your observation about, "We used to be focusing, optimizing operations." As you move away from those applications, your information architecture becomes increasingly diffuse. It's not as crystal clear. How do you drive that clarity, as the data moves to derived new applications? >> Rob and I have talked about this. I think we're at the dawn of probably a new era in application development. Much more agile, flexible applications that are taking advantage of data wherever it resides. We are really early in that. Right now we are in the let's actually put into practice, machine learning and data science, let's extract value the data we got, that will then inform a new set of applications, which is related to the announcements that Hortonworks made this week around data plane, which is looking at multi-cloud environments and how would you manage applications and data across those? Rob, you can speak to that better than I can, I think. >> Well, the data plan thing, this information architecture, I think you're 100% right on. The data that we're hearing from customers in the enterprise is, they see the IOT buzz, oh, of course they're going to connect with IOT devices down the road, but when they see the security challenges, when they see the operational challenges around hiring people to actually run the dev ops, they have to then re-architect. So there's certainly a conversation we see on what is the architecture for the data, but also a little bit bigger than that, the holistic architecture of, say, cloud. So a lot of people are like, trying to clean up their house, if you will, to be ready for this new era, and I think Wikibon, your private cloud report you guys put out really amplified that by saying, "Yeah, they see these trends, "but they got to kind of get their act together." They got to look at who the staff is, what the data architecture's going to be, what apps are being developed, so doing a lot more retrenching. Given that, if we agree, what does that mean for the data plane, and then your vision of having that data architecture so that this will be a solid foundational transition? >> I think we all hit on the same point, which is it is about enabling a next generation IT architecture, of which, sort of the X and the Y axis or network, and generally what Big Data's been able to do, and Hadoop specifically, was over the last five years, enabling the existing applications architected, and I like the term that's been coined by you, is they were known processes with known technology, and that's how applications in the last 20 years have been enabled. Big Data and Hadoop generally have unlocked that ability to now be able to move all the way out to the edge and incorporate IOT, data at rest, data in motion, on-prem and cloud hybrid architecture. What that's done is said, "Now we know how to build an "application that takes advantage of an event or an "occurrence and then can drive outcome in a variety of ways. "We don't have to wait for a static programming model "to automate a function." >> And in fact, if we are wait, we're going to fail. That's one of the biggest challenges. I mean, IBM, I will tell you guys, or I'll tell you, Rob, that one of the craziest days I've ever spent is I flew from Japan to New York City for the IBM Information Architecture Announcement back in like 1994, and it was the most painful two days I've ever experienced in my entire life. That's a long time ago. It's ancient history. We can't use information architecture as a way of slowing things down. What we need to be able to do is we need to be able to introduce technology that again, allows the clarity of information architecture close to these core applications to move, and that may involve things like machine learning itself being embedded directly into how we envision data being moved, how we envision optimization, how we envision the data plane working. So, as you guys think about this data plane, everybody ends up asking themselves, "Is there a natural place for data to be?" What's going to be centralized, what's going to be decentralized, and I'm asking you, is increasingly the data going to be decentralized but the governance and securities and policies that we put in place going to be centralized and that's what's going to inform the operation of the data plane? What do you guys think? >> It's our view, very specifically from Hortonworks' perspective, that we want to give the ability for the data to exist and reside wherever the physics dictate, whether that be on-prem, whether that be in the cloud, and we want to give the ability to process and take action on an event or an occurrence or drive and outcome as early in the cycle as possible. >> Describe what you mean by "early in the cycle." >> So, as we see conditions emerge. A machine part breaking down. A customer taking an action. A supply chain inventory outage. >> So as close as possible to the event that's generating the data. >> As it's being generated, or as the processes are leading up to the natural outcome and we can maybe disintermediate for a better outcome, and so, that means that we have to be able to engage with the data irrespective of where it is in its cycle, and that's where we've enabled, with data plane, the ability to extract out the requirement of where that data is, and to be able to have a common plane, pun intended, for the operations and managing and provisioning of the environment, for being able to govern that and secure it, which are increasingly becoming intertwined, because you have to deal with it from point of origin through point at rest. >> The new phrase, "The single plane of glass." All joking aside, I want to just get your thoughts on this, Rob, too. "What's in it for me? "I'm the customer. "Right now I have a couple challenges." This is what we hear from the market. "I need data consistency because things are happening in "real time; whatever events are going on with data, we know "more data's going to be coming out from the edge and "everywhere else, faster and more volume, so I need "consistency of my data, and I don't want "to have multiple data silos," and then they got to integrate the data, so on the application developer side, a dev ops-like ethos is emerging where, "Hey, if there's data being done, I need to integrate that "into my app in real time," so those are two challenges. Does the data plane address that concern for customers? That's the question. >> Today it enables the ops world. >> So I can integrate my apps into the data plane. >> My apps and my other data assets, irrespective of where they reside, on-prem, cloud, or out to the edge, and all points in between. >> Rob, for enterprise, is this going to be the single pane of glass for data governance? Is that how the vision that you guys see this, because that's a benefit. If that could happen, that's essentially one step towards the promised land, if you will, for more data flowing through apps and app developers. >> So let me reshape a little bit. There's two main problems that collectively we have to address for enterprises: one is they want to apply machine learning and data science at scale, and they're struggling with that, and two is they want to get the cloud, and it's not talked about nearly enough, but most clients are really struggling with that. Then you fast forward on that one, we are moving to a multi-cloud world, absolutely. I don't think any enterprise is going to standardize on a single cloud, that's pretty clear. So you need things like data plane that acknowledge it's a multi-cloud world, and even as you move to multi clouds, you want a single focus for your data governance, a single strategy for your data governance, and then what we're doing together with IBM Data Science Experience with Hortonworks, let's say, whatever data you have in there, you can now do your machine learning right where that data is. You don't need to move it around. You can if you want, but you don't have to move it around, 'cause it's built in, and it's integrated right into the Hadoop ecosystem. That solves the two main enterprise pain points, which is help me get the cloud, help me apply data science and machine learning. >> Well we'll have to follow up and we'll have to do just a segment just on that. I think multi-cloud is clearly the direction, but what the hell does that mean? If I run 365 on Azure, that's one app. If I run something else on Amazon, that's multiple clouds, not necessarily moving workloads across. So the question I want to ask here is, it's clear from customers they want single code bases that run on all clouds seamlessly so I don't have to scale up on things on Amazon, Azure, and Google. Not all clouds are created equal in how they do things. Storage, through ever, inside the data factories of how they process. That's a challenge. How do you guys see that playing out of, you have on-premise activities that have been bootstrapped. Now you have multiple clouds with different ways of doing things, from pipelining, ingestion and processing, and learning. How do you see that playing out? Clouds just kind of standardizing around data plane? >> There's also the complexity of even within the multi-clouds, you're going to have multiple tiers within the clouds, if you're running in one data center in Asia, versus one in Latin America, maybe a couple across the Americas. >> But as a customer, do I need to know the cloud internals of Amazon, Azure, and Google? >> You do. In a stand-alone world, yes you do. That's where we have to bring and abstract the complexity of that out, and that's the goal with data plane, is to be able to extract, whether it's, which tier it's in, on-prem, or whether it's on, irrespective of which cloud platform. >> But Rob Thomas, I really like the way you put it. There may be some other issues that users have to worry about, certainly there are some that we think, but the two questions of, "Where am I going to run the machine learning," and "How am I going to get that to the cloud appropriately," I really like the way you put that. At the end of the day, what users need to focus on is less where the application code is, and more where the data is, so that they can move the application code or they can move the work to the data. That's fundamentally the perspective. We think that businesses don't take their business to the cloud, they bring the cloud to their business. So, when you think about this notion of increasingly looking at a set of work that needs to be performed, where the data exists, and what acts you're going to take in that data, it does suggest that data is going to become more of a centerpiece asset within the business. How does some of the things that you guys are doing lead customers to start to acknowledge data as an asset so they're making the appropriate investments in their data as their business evolves, and partly in response to data as an asset? What do you think? >> We have to do our job to build to common denominators, and that's what we're doing to make this easy for clients. So today we announced the IBM integrated analytics system. Same code base on private cloud as on a hardware system as on public cloud, all of it federates to Hortonworks through common sequel. That's what clients need, 'cause it solves their problem. Click of a button, they can get the cloud, and by the way, on private cloud it's based on Kubernetes, which is aligned with what we have on public cloud. We're working with Hortonworks to optimize Yarn and Kubernetes working together. These are the meaty issues that if we don't solve it, then clients have to deal with the bag of bolts, and so that's the kind of stuff we're solving together. So think about it: one single code base for managing your data, federates to Hadoop, machine learning is built into the system, and it's based on Kubernetes, that's what clients want. >> And the containers is just great, too. Great cloud-native trend. You guys been great, active in there. Congratulations to both of you guys. Final question, get you guys the last word: How does the relationship between Hortonworks and IBM evolve? How do you guys see this playing out? More of the same? Keep integrating in code? Is there any new thing you see on the horizon that you're going to be knocking down in the future? >> I'll take the first shot. The goal is to continue to make it simple and easy for the customer to get to the cloud, bring those machine learning and data science models to the data, and make it easy for the consumption of the new next generation of applications, and continue to make our customer successful and drive value, but to do it through transparently enabling the technology platforms together, and I think we've acknowledged the things that IBM is extraordinarily good at, the things that Hortworks is good at, and bring those two together with virtually no overlap. >> Rob, you've been very partner-centric. Your thoughts on this partnership? >> Look, it's what clients want. Since we announced this, the results and the response has been fantastic, and I think it's for one simple reason. So, Hortonworks' mission, we all know, is open source, and delivering in the community. They do a fantastic job of that. We also know that sometimes, clients need a little bit more, and so, when you bring those two things together, that's what clients want. That's very different than what other people in the industry do that say, "We're going to create a proprietary wrapper "around your Hadoop environment and lock your data in." That's the opposite of what we're doing. We're saying we're giving you full freedom of open source, but we're enabling you to augment that with machine learning, data science capabilities. This is what clients want. That's why the partnership's working. I think that's why we've gotten the response that we have. >> And you guys have been multiple years into the new operating model of being much more aggressive within the Big Data community, which has now morphed into much larger landscape. You pleased with some of the results you're seeing on the IBM side and more coding, more involvement in these projects on your end? >> Yeah, I mean, look, we were certainly early on Spark, created a lot of momentum there. I think it actually ended up helping both of our interests in the market. We built a huge community of developers at IBM, which is not something IBM had even a few years ago, but it's great to have a relationship like this where we can continue to augment our skills. We make each other better, and I think what you'll see in the future is more on the governance side; I think that's the piece that's still not quite been figured out by most enterprises yet. The need is understood. The implementation is slow, so you'll see more from us collectively there. >> Well, congratulations in the community work you guys have done. I think the community's model's evolving mainstream as well. Open source will continue to grow. Congratulations. Rob Bearden and Rob Thomas here inside theCUBE, more coverage here in Big Data NYC with theCUBE, after this short break.
SUMMARY :
Brought to you by SiliconANGLE media, of the big data movement, you know, almost from the embryonic state to now. You kind of gone from the early few years Data is still the core asset people are trying to figure out and be able to manage that all the way through its 90% of the code is going to be open source, and generally in the open source community, How does this relate to the challenge of, you know, CIO the fluid data layer for an enterprise that enables them to But the definition of an information architecture is the data we got, that will then inform a new set Well, the data plan thing, this information architecture, and that's how applications in the last 20 years of the data plane? to give the ability to process and take action on an event So, as we see conditions emerge. So as close as possible to the event and provisioning of the environment, and then they got to integrate the data, they reside, on-prem, cloud, or out to the edge, Is that how the vision that you guys see this, I don't think any enterprise is going to standardize So the question I want to ask here is, There's also the complexity of even within the of that out, and that's the goal with data plane, How does some of the things that you guys are doing and so that's the kind of stuff we're solving together. Congratulations to both of you guys. for the customer to get to the cloud, bring those machine Rob, you've been very partner-centric. and delivering in the community. on the IBM side and more coding, more involvement in these in the market. Well, congratulations in the community work
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Japan | LOCATION | 0.99+ |
Rob | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Asia | LOCATION | 0.99+ |
Jim Zemlin | PERSON | 0.99+ |
1994 | DATE | 0.99+ |
100% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Hortonworks | ORGANIZATION | 0.99+ |
Americas | LOCATION | 0.99+ |
Wikibon | ORGANIZATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Latin America | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
Hortworks | ORGANIZATION | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
two questions | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
10% | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Cloudaire | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
NYC | LOCATION | 0.99+ |
two challenges | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Midtown Manhattan | LOCATION | 0.98+ |
two days | QUANTITY | 0.98+ |
two main problems | QUANTITY | 0.98+ |
Apache Atlas | ORGANIZATION | 0.98+ |
first shot | QUANTITY | 0.98+ |
one step | QUANTITY | 0.98+ |
ibon | PERSON | 0.98+ |
one app | QUANTITY | 0.98+ |
Today | DATE | 0.97+ |
this week | DATE | 0.97+ |
two guests | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Yarn | ORGANIZATION | 0.96+ |
BigData | ORGANIZATION | 0.96+ |
SiliconANGLE media | ORGANIZATION | 0.95+ |
Hortonworks' | PERSON | 0.94+ |
single cloud | QUANTITY | 0.94+ |
Rob Bearden, Hortonworks & Rob Thomas, IBM Analytics - #DataWorks - #theCUBE
>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi, welcome to theCUBE. We are live in San Jose, in the heart of Silicon Valley at the DataWorks Summit, day one. I'm Lisa Martin, with my co-host, George Gilbert. And we're very excited to be talking to two Robs. With Rob squared on the program this morning. Rob Bearden, the CEO of Hortonworks. Welcome, Rob. >> Thank you for having us. >> And Rob Thomas, the VP, GM rather, of IBM Analytics. So, guys, we just came from this really exciting, high energy keynote. The laser show was fantastic, but one of the great things, Rob, that you kicked off with was really showing the journey that Hortonworks has been on, and in a really pretty short period of time. Tremendous inertia, and you talked about the four mega-trends that are really driving enterprises to modernize their data architecture. Cloud, IOT, streaming data, and the fourth, next leg of this is data science. Data science, you said, will be the transformational next leg in the journey. Tell our viewers a little bit more about that. What does that mean for Hortonworks and your partnership with IBM? >> Well, what I think what IBM and Hortonworks now have the ability to do is to bring all the data together across a connected data platform. The data in motion, the data at rest, now have in one common platform, irrespective of the deployment architecture, whether it's on prim across multiple data centers or whether deployed in the cloud. And now that the large volume of data and we have access to it, we can now start to begin to drive the analytics in the end as that data moves through each phase of its life cycle. And what really happens now, is now that we have visibility and access to the inclusive life cycle of the data we can now put a data science framework over that to really now understand and learn those patterns and what's the data telling us, what's the pattern behind that. And we can bring simplification to the data science and turn data science actually into a team sport. Allow them to collaborate, allow them to have access to it. And sort of take the black magic out of doing data science with the framework of the tool and the power of DSX on top of the connected data platform. Now we can advance rapidly the insights in the end of the data and what that really does is drive value really quickly back into the customer. And then we can then begin to bring smart applications via the data science back into the enterprise. So we can now do things like connected car in real time, and have connected car learn as it's moving and through all the patterns, we can now, from a retail standpoint really get smart and accurate about inventory placement and inventory management. From an industrial standpoint, we know in real time, down to the component, what's happening with the machine, and any failures that may happen and be able to eliminate downtime. Agriculture, same kind of... Healthcare, every industry, financial services, fraud detection, money laundering advances that we have but it's all going to be attributable to how machine learning is applied and the DSX platform is the best platform in the world to do that with. >> And one of the things that I thought was really interesting, was that, as we saw enterprises start to embrace Hadoop and Big Data and Segano this needs to co-exist and inter-operate with our traditional applications, our traditional technologies. Now you're saying and seeing data science is going to be strategic business differentiator. You mentioned a number of industries, and there were several of them on stage today. Give us some, maybe some, one of your favorite examples of one of your customers leveraging data science and driving a pretty significant advantage for their business. >> Sure. Yeah, well, to step back a little bit, just a little context, only ten companies have out performed the S&P 500 in each of the last five years. We start looking at what are they doing. Those are companies that have decided data science and machine learning is critical. They've made a big bet on it, and every company needs to be doing that. So a big part of our message today was, kind of, I'd say, open the eyes of everybody to say there is something happening in the market right now. And it can make a huge difference in how you're applying data analytics to improve your business. We announced our first focus on this back in February, and one of our clients that spoke at that event is a company called Argus Healthcare. And Argus has massive amounts of data, sitting on a mainframe, and they were looking for how can we unleash that to do better care of patients, better care for our hospital networks, and they did that with data they had in their mainframe. So they brought data science experience and machine learning to their mainframe, that's what they talked about. What Rob and I have announced today is there's another great trove of data in every organization which is the data inside Hadoop. HDP, leading distribution for that, is a great place to start. So the use case that I just shared, which is on the mainframe, that's going to apply anywhere where there's large amounts of data. And right now there's not a great answer for data science on Hadoop, until today, where data science experience plus HDP brings really, I'd say, an elegant approach to it. It makes it a team sport. You can collaborate, you can interact, you can get education right in the platform. So we have the opportunity to create a next generation of data scientists working with data and HDP. That's why we're excited. >> Let me follow up with this question in your intro that, in terms of sort of the data science experience as this next major building block, to extract, or to build on the value from the data lake, the two companies, your two companies have different sort of, better markets, especially at IBM, but the industry solutions and global business services, you guys can actually build semi-custom solutions around this platform, both the data and the data science experience. With Hortonworks, what are those, what's your go to market motion going to look like and what are the offerings going to look like to the customer? >> They'll be several. You just described a great example, with IBM professional services, they have the ability to take those industry templates and take these data science models and instantly be able to bring those to the data, and so as part of our joint go to market motion, we'll be able now partner, bring those templates, bring those models to not only our customer base, but also part of the new sales go to market motion in the light space, in new customer opportunities and the whole point is, now we can use the enterprise data platforms to bring the data under management in a mission critical way that then bring value to it through these kinds of use case and templates that drive the smart applications into quick time to value. And just increase that time to value for the customers. >> So, how would you look at the mix changing over time in terms of data scientists working with the data to experiment on the model development and the two hard parts that you talked about, data prep and operationalization. So in other words, custom models, the issue of deploying it 11 months later because there's no real process for that that's packaged, and then packaged enterprise apps that are going to bake these models in as part of their functionality that, you know, the way Salesforce is starting to do and Workday is starting to do. How does that change over time? >> It'll be a layering effect. So today, we now have the ability to bring through the connected data platforms all the data under management in a mission critical manner from point of origination through the entire stream till it comes at rest. Now with the data science, through DSX, we can now, then, have that data science framework to where, you know, the analogy I would say, is instead of it being a black science of how you do data access and go through and build the models and determine what the algorithms are and how that yields a result, the analogy is you don't have to be a mechanic to drive a car anymore. The common person can drive a car. So, now we really open up the community business analyst that can now participate and enable data science through collaboration and then we can take those models and build the smart apps and evolve the smart apps that go to that very rapidly and we can accelerate that process also now through the partnership with IBM and bringing their core domain and value that, drivers that they've already built and drop that into the DSX environments and so I think we can accelerate the time to value now much faster and efficient than we've ever been able to do before. >> You mentioned teamwork a number of times, and I'm curious about, you also talked about the business analyst, what's the governance like to facilitate business analysts and different lines of business that have particular access? And what is that team composed of? >> Yeah, well, so let's look at what's happening in the big enterprises in the world right now. There's two major things going one. One is everybody's recognizing this is a multi-cloud world. There's multiple public cloud options, most clients are building a private cloud. They need a way to manage data as a strategic asset across all those multiple cloud environments. The second piece is, we are moving towards, what I would call, the next generation data fabric, which is your warehousing capabilities, your database capabilities, married with Hadoop, married with other open source data repositories and doing that in a seamless fashion. So you need a governance strategy for all of that. And the way I describe governance, simple analogy, we do for data what libraries do for books. Libraries create a catalog of books, they know they have different copies of books, some they archive, but they can access all of the intelligence in the library. That's what we do for data. So when we talk about governance and working together, we're both big supporters of the Atlas project, that will continue, but the other piece, kind of this point around enterprise data fabric is what we're doing with Big SQL. Big SQL is the only 100% ANSI-SQL compliant SQL engine for data across Hadoop and other repositories. So we'll be working closely together to help enterprises evolve in a multi-cloud world to this enterprise data fabric and Big SQL's a big capability for that. >> And an immediate example of that is in our EDW optimization suite that we have today we be loading Big SQL as the platform to do the complex query sector of that. That will go to market with almost immediately. >> Follow up question on the governance, there's, to what extent is end to end governance, meaning from the point of origin through the last mile, you know, if the last mile might be some specialized analytic engine, versus having all the data management capabilities in that fabric, you mentioned operational and analytic, so, like, are customers going to be looking for a provider who can give them sort of end to end capabilities on both the governance side and on all the data management capabilities? Is that sort of a critical decision? >> I believe so. I think there's really two use cases for governance. It's either insights or it's compliance. And if you're focus is on compliance, something like GDPR, as an example, that's really about the life cycle of data from when it starts to when it can be disposed of. So for compliance use case, absolutely. When I say insights as a governance use case, that's really about self-service. The ideal world is you can make your data available to anybody in your organization, knowing that they have the right permissions, that they can access, that they can do it in a protected way and most companies don't have that advantage today. Part of the idea around data science on HDP is if you've got the right governance framework in place suddenly you can enable self-service which is any data scientist or any business analyst can go find and access the data they need. So it's a really key part of delivering on data science, is this governance piece. Now I just talked to clients, they understand where you're going. Is this about compliance or is this about insights? Because there's probably a different starting point, but the end game is similar. >> Curious about your target markets, Tyler talked about the go to market model a minute ago, are you targeting customers that are on mainframes? And you said, I think, in your keynote, 90% of transactional data is in a mainframe. Is that one of the targets, or is it the target, like you mention, Rob, with the EDW optimization solution, are you working with customers who have an existing enterprise data warehouse that needs to be modernized, is it both? >> The good news is it's both. It's about, really the opportunity and mission, is about enabling the next generation data architecture. And within that is again, back to the layering approach, is being able to bring the data under management from point of origination through point of it reg. Now if we look at it, you know, probably 90% of, at least transactional data, sits in the mainframe, so you have to be able to span all data sets and all deployment architectures on prim multi-data center as well as public cloud. And that then, is the opportunity, but for that to then drive value ultimately back, you've got to be able to have then the simplification of the data science framework and toolset to be able to then have the proper insights and basis on which you can bring the new smart applications. And drive the insights, drive the governance through the entire life cycle. >> On the value front, you know, we talk about, and Hortonworks talks about, the fact that this technology can really help a business unlock transformational value across their organization, across lines of business. This conversation, we just talked about a couple of the customer segments, is this a conversation that you're having at the C-suite initially? Where are the business leaders in terms of understanding? We know there's more value here, we probably can open up new business opportunities or are you talking more the data science level? >> Look, it's at different levels. So, data science, machined learning, that is a C-suite topic. A lot of times I'm not sure the audience knows what they're asking for, but they know it's important and they know they need to be doing something. When you go to things like a data architecture, the C-suite discussion there is, I just want to become more productive in how I'm deploying and using technology because my IT budget's probably not going up, if anything it may be going down, so I've got to become a lot more productive and efficient to do that. So it depends on who you're talking to, there's different levels of dialogue. But there's no question in my mind, I've seen, you know, just look at major press Financial Times, Wallstreet Journal last year. CEOs are talking about AI, machine learning, using data as a competitive weapon. It is happening and it's happening right now. What we're doing together, saying how do we make data simple and accessible? How do we make getting there really easy? Because right now it's pretty hard. But we think with the combination of what we're bringing, we make it pretty darn easy. >> So one quick question following up on that, and then I think we're getting close to the end. Which is when the data lakes started out, it was sort of, it seemed like, for many customers a mandate from on high, we need a big data strategy, and that translated into standing up a Hadoop cluster, and that resulted in people realizing that there's a lot to manage there. It sounds like, right now people know machine learning is hot so they need to get data science tools in place, but is there a business capability sort of like the ETL offload was for the initial Hadoop use cases, where you would go to a customer and recommend do this, bite this off as something concrete? >> I'll start and then Rob can comment. Look, the issue's not Hadoop, a lot of clients have started with it. The reason there hasn't been, in some cases, the outcomes they wanted is because just putting data into Hadoop doesn't drive an outcome. What drives an outcome is what do you do with it. How do you change your business process, how do you change what the company's doing with the data, and that's what this is about, it's kind of that next step in the evolution of Hadoop. And that's starting to happen now. It's not happening everywhere, but we think this will start to propel that discussion. Any thoughts you had, Rob? >> Spot on. Data lake was about releasing the constraints of all the silos and being able to bring those together and aggregate that data. And it was the first basis for being able to have a 360 degree or wholistic centralized insight about something and, or pattern, but what then data science does is it actually accelerates those patterns and those lessons learned and the ability to have a much more detailed and higher velocity insight that you can react to much faster, and actually accelerate the business models around this aggregate. So it's a foundational approach with Hadoop. And it's then, as I mentioned in the keynote, the data science platforms, machine learning, and AI actually is what is the thing that transformationally opens up and accelerates those insights, so then new models and patterns and applications get built to accelerate value. >> Well, speaking of transformation, thank you both so much for taking time to share your transformation and the big news and the announcements with Hortonworks and IBM this morning. Thank you Rob Bearden, CEO of Hortonworks, Rob Thomas, General Manager of IBM Analytics. I'm Lisa Martin with my co-host, George Gilbert. Stick around. We are live from day one at DataWorks Summit in the heart of Silicon Valley. We'll be right back. (tech music)
SUMMARY :
brought to you by Hortonworks. We are live in San Jose, in the heart of Silicon Valley and the fourth, next leg of this is data science. now have the ability to do And one of the things and every company needs to be doing that. and the data science experience. that drive the smart applications into quick time to value. and the two hard parts that you talked about, and drop that into the DSX environments and doing that in a seamless fashion. in our EDW optimization suite that we have today and most companies don't have that advantage today. Tyler talked about the go to market model a minute ago, but for that to then drive value ultimately back, On the value front, you know, we talk about, and they know they need to be doing something. that there's a lot to manage there. it's kind of that next step in the evolution of Hadoop. and the ability to have a much more detailed and the announcements with Hortonworks and IBM this morning.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Rob | PERSON | 0.99+ |
Argus | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
Tyler | PERSON | 0.99+ |
February | DATE | 0.99+ |
two companies | QUANTITY | 0.99+ |
second piece | QUANTITY | 0.99+ |
Argus Healthcare | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
360 degree | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
One | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
DataWorks Summit | EVENT | 0.99+ |
ten companies | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
fourth | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
two hard parts | QUANTITY | 0.98+ |
DataWorks Summit 2017 | EVENT | 0.98+ |
11 months later | DATE | 0.98+ |
each | QUANTITY | 0.98+ |
two use cases | QUANTITY | 0.97+ |
100% | QUANTITY | 0.97+ |
one quick question | QUANTITY | 0.97+ |
Segano | ORGANIZATION | 0.97+ |
SQL | TITLE | 0.96+ |
four mega-trends | QUANTITY | 0.96+ |
Big SQL | TITLE | 0.96+ |
first basis | QUANTITY | 0.94+ |
one common platform | QUANTITY | 0.94+ |
two major things | QUANTITY | 0.92+ |
Robs | PERSON | 0.92+ |
Wallstreet Journal | ORGANIZATION | 0.92+ |
Financial Times | ORGANIZATION | 0.92+ |
Rob Bearden, Hortonworks - Executive On-the-Ground #theCUBE
>> Voiceover: On the Ground, presented by The Cube. Here's your host John Furrier. (techno music) >> Hello, everyone. Welcome to a special On the Ground executive interview with Rob Bearden, the CEO of Hortonworks. I'm John Furrier with The Cube. Rob, welcome to this On the Ground. >> Thank you. >> So I got to ask you, you're five years old this year, your company Hortonworks in June, have Hadoop Summit coming up, what a magical run. You guys went public. Give us a quick update on Hortonworks and what's going on. The five-year birthday, any special plans? >> Well, we're going to actually host the 10-year birthday party of Hadoop, which is you know, started at Yahoo! and open-source community. So everyone's invited. Hopefully you'll be able to make it as well. We've accomplished a lot in the last five years. We've grown to over 1000 employees, over 900 customers. This year is our first full year of being a public company, and the street has us at $265 million dollars in billings. So tremendous progress has happened and we've seen the entire data architecture begin to re-platform around Hadoop now. >> CEOs across the globe are facing profound challenges, data, cloud, mobile, obviously this digital transformation. What are you seeing our there as you talk to your customers? >> Well they view that the digital transformation is a massive opportunity for value creation, for that enterprise. And they realize that they can really shift their business models from being very reactive post-transaction to actually being able to consolidate all of the new paradigm data with the existing transaction data and actually get to a very pro-active model pre-transaction. And so they understand their customer's patterns. They understand the kinds of things that their customers want to buy before they ever engage in the procurement process. And they can make better and more compelling offers at better price points and be able to serve their customers better, and that's really the transformation that's happening and they realize the value of that creation between them and their customer. >> And one of the exciting things about The Cube is we go to all these different industry events and you were speaking last week at an event where data is at the center of the value proposition around digital transformation, and that's really been the key trend that we've been seeing consistently, that buzz word digital transformation. What does that mean to you? Because this is coming up over and over again around this digital platform, digital weathers, digital media or digital engagement. It's all around data. What's your thoughts and what is from your perspective digital transformation? >> Well, it's about being able to derive value from your data and be able to take that value back to your customers under your supply chain, and to be able to create a completely new engagement with how you're managing your interaction with your customers and your supply chain from the data that they're generating and the data that you have about them. >> When you talk to CEOs and people in the business out in the field, how much of this digital transformation do you see as real in terms of progress, real progress? In terms of total transitions, or is it just being talked about now? What's your progress bar meter? How would you peg this trend? >> I would say we're at four and I believe we'll be at six by the end of 2016. And it's one of the biggest movements I've seen since the '90s and ERP, because it's so transformational into the business model by being able to transform the data that we have about our collective entity and our collective customer and collective supply chain, and be able to apply predictive and real-time interactions against that data as events and occurrences are happening, and to be able to quickly offer products and services, and the velocity that that creates to modernization and the value creation back is at a pace that's never been able to happen. And they've really understood the importance of doing that or being disintermediated in their existing spaces. >> You mention ERP, it kind of shows our age, but I'll ask the question. Back in the '90s ERP, CRM, these were processes that were well known, that people automated with technology which was at that time unknown. You got a riser-client server technology, local area networking, TCP IP was emerging, so you got some unknown technology stuff happening, but known processes that were being automated and hence saw that boom. Now you mention today, it's interesting because Peter Burris at Wikibon's thesis says today the processes are unknown and the technology's known, so there's now a new dynamic. It's almost flipped upside-down where this digital transformation is exact opposite. IoT is a great use case where all these unknown things are coming into the enterprise that are value opportunities. Get the technology knows, so now the challenge is how to use technology, to deploy it, and be agile to capture and automate these future and/or real-time unknown processes. Your thoughts on that premise. >> The answers are buried in the data, is the great news, and so the technology as you said is there, and you have these new, unknown processes through Internet of Things, the new paradigm data sets with sensors and clickstream and mobile data. And the good news is they generate the data and we can apply technology to the data through AI and machine learning to really make sure that we understand how to transform the value out of that, out of those data sets. >> So how does IT deal with this? 'Cause going back 30 years IT was a clear line of sight, again, automating those known processes. Now you have unknown opportunities, but you have to be in a position for that. Call that cloud, call that DevOps, call that data driven, whatever the metaphor is. People are being agile, be ready for it. How is that different now and what is the future of data in that paradigm? And how does a customer come to grips and rationalize this notion of I need a clear line of sight of the value, not knowing what the processes is about data. What should they be doing? >> Well, we don't know the processes necessarily, per se, but we do know what the data is telling us because we can bring all that data under management. We can apply the right kind of algorithms, the right kind of tools on it, to give us the outcomes that we want and have the ability to monetize and unlock that value very quickly. >> Hortonworks architecture is kind of designed now at the last Hadoop Summit in Dublin. We heard about the platform. Your architecture's going beyond Hadoop, and it says Hadoop Summit and Hadoop was the key to big data. Going beyond Hadoop means other things. What does that mean for the customer? Because now they're seeing these challenges. How does Hortonworks describe that and what value do you bring to those customers? >> Big data was about data at rest and being able to drive the transformation that it has, being able to consolidate all the transactional platforms into central data architecture. Being able to bring all the new paradigm data sets to the mobile, the clickstream, the IoT data, and bring that together and be able to really transition from being reactive post-transaction to be able to be predictive and interactive pre-transaction. And that's a very, very powerful value proposition and you create a lot of value doing that, but what's really learned through that process is in the digital transformation journey, that actually the further upstream that we can get to engaging with the data, even if we can get to it at the point of origination at the furthest edge, at the point of center, at the actual time of clickstream and we can engage with that data as those events and occurrences are happening and we can process against those events as their happening, it creates higher levels of value. So from the Hortonworks platform we have the ability to manage data at rest with Hadoop, as well as data in motion with the Hortonworks data flow platform. And our view is that we must be able to engage with all the data all the time. And so we bring the platforms to bring data under management from the point of origination all the way through as it's in motion, and to the point it comes at rest and be able to aggregate those interactions through the entire process. >> It's interesting, you mention real-time, and one of the ideas of Hadoop was it was always going to be a data warehouse killer, 'cause it makes a lot of sense. You can store the data. It's unstructured data and you can blend in structured on top of that and build on top of that. Has that happened? And does real-time kind of change that equation? Because there's still a role for a data warehouse. If someone has an investment are they being modernized? Clear that up for me because I just can't kind of rationalize that yet. Data warehouses are old, the older ones, but they're not going away any time soon from what we're hearing. Your thoughts as Hadoop as the data warehouse killer. >> Yeah, well, our strategy from day one has never been to go in and disintermediate any of the existing platforms or any of the existing applications or services. In fact, to the contrary. What we wanted to do and have done from day one is be able to leverage Hadoop as an extension of those data platforms. The DW architecture has limitations to it in terms of how much data pragmatically and economically is really viable to go into the data warehouse. And so our model says let's bring more data under management as an extension to the existing data warehouses and give the existing data warehouses the ability to have a more holistic view of data. Now I think the next generation of evolution is happening right now and the enterprise is saying that's great. We're able to get more value longer from our existing data warehouse and tools investment by bringing more data under management, leveraging a combined architecture of Hadoop and data warehouse. But now they're trying to redefine really what does the data warehouse of the future look like, and it's really about how we make decisions, right? And at what point do we make decisions because in the world of DW today it assumes that data's aggregated post-transaction, right? In the new world of data architecture that's across the IT landscape, it says we want to engage with data from the point it's originated, and we want to be able to process and make decisions as events and as occurrences and as opportunities arise before that transaction potentially ever happens. And so the data warehouse of the future is much different in terms of how and when a decision's made and when that data's processed. And in many cases it's pre-transaction versus post-transaction. >> Well also I would just add, and I want to get your thoughts on this, real-time, 'cause now in the moment at the transaction we now have cloud resources and potentially other resources that could become available. Why even go to the data warehouses? So how has real-time changed the game? 'Cause data in motion kind of implies real-time whether it's IoT or some sort of bank transaction or something else. How has real-time changed the game? >> Well, it's at what point can we engage with the customer, but what it really has established is the data has to be able to be processed whether it be on Prim, in the cloud, or in a hybrid architecture. And we can't be constrained by where the data's processed. We need to be able to take the processing to the data versus having to wait for the data to come to the processing. And I think that's the very powerful part of cloud, the on Prim, and software to find networking, and when you bring all of those platforms together, you get the ability to have a very powerful and elastic processing capability at any point in the life cycle of the data. And we've never been able to put all those pieces together on an economically viable model. >> So I got to ask you, you guys are five years old in June, Hadoop's only 10 years old. Still young, still kind of in the early days, but yet you guys are public company. How are you guys looking at the growth strategy for you guys? 'Cause the trend is for people to go private. You guys went public. You're out in the open. Certainly your competitor Cloud ARIS is private, but people can get that they're kind of behind the curtain. Some say public with a $3 billion dollar graduation, but for the most part you're public. So the question is how are you guys going to sustain the growth? What is the growth strategy? What's your innovation strategy? >> Well if you look at the companies that are going private, those are the companies that are the older platforms, the older technologies, in a very mature market that have not been able to innovate those core platforms and they sort of reached their maturity cycle, and I think going private gives them the ability to do that innovation, maybe change their licensing model, the subscription, and make some of the transformations they need to make. I have no doubt they'll be very successful doing that. Our situation's much different. As the modern IT landscape is re-architecting itself almost across every layer. If you look at what's happening in the networking layer going to SDN. Certainly in our space with data and it's moving away from just transactional siloed environments to central data architectures and next generation data platforms. And being able to go all the way out to the edge and bring data under management through the entire movement cycle. We're in a market that we're able to innovate rapidly. Not only in terms of the architecture of the data platform being able to bring batch, real-time applications together simultaneously on a central data set and consolidate all of the data, but also then be able to move out and do the data in motion and be able to control an entire life cycle. There's a tremendous amount of innovation that's going to happen there, and these are significant growth markets. Both the data in motion and the data at rest market. The data at rest market's a $50 billion dollar marketplace. The data in motion market is a $1 trillion dollar TAM. So when you look at the massive opportunity to create value in these high growth markets, in the ability to innovate and create the next generation data platforms, there's a lot of room for growth and a lot of room for scale. And that's exactly why you should be public when you're going though these large growth markets in a space that's re-platforming, because the CIO wants to understand and have transparent visibility into their platform partners. They want to know how you're doing. Are you executing the plan? Or are you hiding behind a facade of one perception or another. >> Or pivoting or some sort of re-architecture. >> Right, so I think it's very appropriate in a high growth, high innovation market where the IT platforms are going through a re-architecture that you actually are public going through that growth phase. Now it forces discipline around how you operationalize the business and how you run the business, but I think that's very healthy for both the tech and the company. >> Michael Dell told me he wanted to go private mainly because he had to do some work essentially behind the curtain. Didn't want the 90-day shot clock, the demands of Wall Street. Other companies do it because the can't stand alone. They don't have a platform and they're constantly pivoting internally to try to grope and find that groove swing, if you will. You're saying that you guys have your groove swing and as Dave Velanti always says, always get behind a growing total adjustment market or TAM, you saying that. Okay, I buy that. So the TAM's growing. What are you guys doing on the platform side that's enabling your customers to re-platform and take advantage of their current data situation as well as the upcoming IoT boom that's being forecasted? >> Well, the first thing is the genesis of which we started the company around, which is we transformed Hadoop from being a batch architecture, single data set, single application, to being able to actually manage a central data architecture where all data comes under management and be able to drive and evolve from batch to batch interactive and real-time simultaneously over that central data set. And then making sure that it's truly an enterprise viable, enterprise ready platform to manage mission critical workloads at scale. And those are the areas where we're continuing to innovate around security, around data governance, around life cycle management, the operations and the management consoles. But then we want to expand the markets that we operate in and be world class and best tech on planet Earth for that data at rest and our core Hadoop business. But as we then see the opportunities to go out to the edge and from the point of origination truly manage and bring that data under management through its entire life cycle, through the movement process and create value. And so we want to continue to extend the reach of when we have data under management and the value we bring to the data through its entire life cycle. And then what's next is you have that data in its life cycle. You then move into the modern data applications, and if you look at what we've done with cyber security and some of the offerings that we've engaged in the cyber security space, that was our first entry. And that's proven to be a significant game changer for us and our customers both. >> Cyber security certainly a big data problem. Also a cloud opportunity with the horsepower you can get with computing. Give us the update. What are you seeing there from a traction standpoint? What's some of the level of engagements your having with enterprises outside of the NSA and the big government stuff, which I'm sure they're customers don't have to disclose that, but for the most part a normal enterprise are constantly planning as if they are already attacked and they're having different schemes that they're deploying. How are they using your platform for that right now? >> Well, the nature of attacks has changed. And it's evolved from just trying to find the hole in the firewall or where we get into the gateway, to how we find a way through a back door and just hang out in your network and watch for patterns and watch for the ability to aggregate relationships and then pose as a known entity that you can then cascade in. And in the world of cyber security you have to be able to understand those anomalies and be able to detect those anomalies that sit there and watch for their patterns to change. And as you go through a whole life cycle of data management between a cloud on Prim and a hybrid architecture, it opens up many, many opportunities for the bad guys to get in and have very new schemes. And our cyber security models give the ability to really track how those anomalies are attaching, where the patterns are emerging, and to be able to detect that in real-time and we're seeing the major enterprises shift to these new models, and it's become a very big part of our growth. >> So I got to change gears and ask you about open-source. You've been an open-source really from the beginning, I would call first generation commercial. But it was not a tier one citizen at that time. It was an alternative to other privatery platforms, whether you look at the network stack or certainly from software. Now today it's tier one. Still we hear business people kind of like, well, open-source. Why should a business executive care about opens-source now? And what would you say to that person who's watching about the benefits of open-source and some of the new models that could help them. >> Well, open-source in general's going to give a number of things. One, it's going to probably provide the best tech, the most innovation in a space, whether that be at the network layer or whether that be at the middle wear layer, the tools layer or certainly the data layer. And you're going to see more innovation typically happen on those platforms much faster and you've got transparent visibility into it. And it brings an ecosystem with it and I think that's really one of the fundamental issues that someone should be concerned with is what does the ecosystem around my tech look like? An open-source really draws forward a very big ecosystem in terms of innovators of the tech, but also enablers of the tech and adopters of the tech in terms of incremental applications, incremental tool sets. And what it does and the benefit to the end customer is the best tech, the most innovation, and typically operating models that don't generate lock in for 'em, and it gives them optionality to use the tech in the most appropriate architecture in the best economic model without being locked in to a proprietary path that they end up with no optionality. >> So talk about the do-it-yourself mentality. In IT that's always been frowned upon because it's been expensive, time-consuming, yet now with organic open-source and now with cloud, you saw that first generation do-it-yourself, standing up stuff on Amazon, whatnot, is being very viable. It funded shadow IT and a variety of other great things around virtualization, visualization, and so on. Today we're seeing that same pattern swing back to do-it-yourself, is good for organic innovation but causes some complexities. So I want to get your thoughts on this because this seems to be a common thread on our Cube interviews and at Hadoop Summit and at Big Data SV as part of Big Data Week when we were in town. We heard from customers and we heard the following: It's still complex and the total cost of ownership's still too high. That seems to be the common theme for slowing down the rapid acceleration of Hadoop and its ecosystem in general. One, do you agree with that? And two, if so, or what would be than answer to make that go faster? >> Well, I think you're seeing it accelerate. I think you're seeing the complexities dwindle away through both innovation and the tech and the maturing of the tech, as well as just new tool sets and applications that are leveraging it, that take away any complexity that was there. But what I think has been acknowledged is, the value that it creates and that it's worth the do-it-yourself and bringing together the spare techs because the innovation that it brings, the new architectures and the value that it creates as these platforms move into the different use cases that they're enabling. >> So I got to ask you this question. I know you're not going to like it and all the people always say, well John, why does everyone always ask that same question? You guys have a radically different approach than Cloudera. It's the number one question. I get ask them about Cloudera. Cloudera, ask them about Hortonworks. You guys have been battling. They were first. You guys came right fast followers second. With the Yahoo! thing we've been following you guys since day one. Explain the difference between Cloudera, because now a couple things have changed over the past few years. One is, Hadoop wasn't the be all end all for big data. There's been a lot of other things certainly SPARK and some other stuff happening, but yet now enterprises are adopting and coexisting with other stuff. So we've seen Cloudera make some pivots. They certainly got some good technology, but they've had some good right answers and some wrong answers. How've you guys been managing it because you're now public, so we can see all the numbers. We know what the business is doing. But relative to the industry, how are you guys compared to Cloudera? What's the differences? And what are you guys doing differently that makes Hortonworks a better vendor than Cloudera? >> I can't speak to all the Cloudera models and strategies. What I'll tell you is the foundation of our model and strategy is based on. When we founded the company we were as you mentioned, three of four years post Cloudera's founding. We felt like we needed to evolve Hadoop in terms of the architecture, and we didn't want to adopt the batch-oriented architecture. Instead we took the core Hadoop platform and through YARN enabled it to bring a central data architecture together as well as be able to be generating batch interactive in real-time applications, leveraging YARN as the data operating system for Hadoop. And then the real strategy behind that was to open up the data sets, open up the different types of use cases, be able to do it on a central data architecture. But then as other processing engines emerged, whether it be a SPARK as you brought up or some of the other ones that we see coming down the pipe, we can then integrate those engines through YARN onto the central data platform. And we open up the number of opportunities, and that's the core basis. I think that's different than some of the other competitor's technology architecture. >> Looking back now five years, are there moves that you were going to make that others have made, that you look back and say I'm glad we didn't do that given today's landscape? >> What I'm glad we did do is open up to the most use cases and workloads and data sets as possible through YARN, and that's proven to be a very, very, fundamentally differentiation of our model and strategy for anybody in the Hadoop space certainly. And I'm also very happy that we saw the opportunity about a year ago that it needed to be more than just about data at rest on Hadoop, and that actually to truly be the next generation data architecture, that you've got to be able to provide the platforms for data at rest and data in motion and our acquisition of Onyara, to be able to get the NiFi technology so that we're truly capturing the data from the point of origination all the way through the movement cycle until it comes at rest has given us now the ability to do a complete life cycle management for an entire data supply chain. And those decisions have proven to be very, very differentiation between us and any of our other competitors and it's opened up some very, very big markets. More importantly, it's accelerated the time to value that our customers get in the use cases that they're enabling through us. >> How would you talk about the scenario that people are saying about Hadoop not being the end all be all industry? At the same time, 'cause big data, as Aroon Merkey said on the Keblan Dublin. It's bigger than Hadoop now, but Hadoop has become synonymous with big data generally. Where's the leadership coming from in your mind? Because we're certainly not seeing it on the data warehouse side, 'cause those guys still have the old technology, trying to co-exist and re=platform for the future. So question is, is Hortonworks viewing Hadoop as still leading generically as a big data industry or has it become a sidebar of the big data industry? >> Of Hadoop? Hadoop is the platform, and we believe ground zero for big data. But we believe it's bigger than that. It's about all data and being able to manage the entire life cycle of all data, and that starts from the point of origination, until it comes at rest, and be able to continue to drive that entire life cycle. Hadoop certainly is the underpinning of the platform for big data, but it's really got to be about all data. Data at rest, data in motion, and what you'll see is the next leg in this is, the modern data applications that then emerge from that. >> How has the ecosystem in the Hadoop industry, I would agree with by the way the Hadoop players are leading big data in general in terms of innovation. The ecosystem's been a big part of it. You guys have invested in it. Certainly a lot of developers and open-source. How has the ecosystem changed given the current situation from where it was? And where do you see the ecosystem going? With the re-platforming not everyone can have a platform. There's a ton of guys out there that have tools, that are looking for a home, they're trying to figure out the chessboard on what's going on with the ecosystem. What's your thoughts of the current situation and how it will evolve in your view? >> Well, I think one of the strongest statements from day one is whether it's EDW or BI or relational, none of the traditional platform players say the way you solve your big data problem is with my platform. They to a company have a Hadoop platform strategy of some form to bring all of that huge volume of big data under management, and it fits our model very well in that we're not trying to disintermediate, but extend those platforms by leveraging HDP as an extension of their platform. And what that's done is it's created pool markets. It's brought Hadoop into the enterprise with a very specific value proposition in use case, bringing more data under management for that tool, that application, or that platform. And then the enterprises has realized there's other opportunities beyond that. And new use cases and new data sets, we can also gain more leverage from. And that's what's really accelerated-- >> So you see growth in the ecosystem? >> We're actually seeing exponential acceleration of the growth around the ecosystem. Not only in terms of the existing platform and tools and applications for either adopting Hadoop, but now new start-up companies building completely from scratch applications just for the big data sets. >> Let's talk about STARS. We were talking before we sat down about the challenges being an entrepreneur. You mentioned the exponential acceleration of entrepreneurs coming into the ecosystem. That's a safe harbor right now. It seems to be across the board. And a lot of the big platforms have robust, growing ecosystems. What's the current landscape of STARS? I know you're an active investor yourself and you're involved in a lot of different start-up conversations and advisor. What's your view of the current landscape right now? Series A, B, C, growth. Stalling. What needs to be in place for these companies to be successful? What are some of the things that you're seeing? >> You have to be surgically focused right now or on a very particular problem set, maybe even by industry. And understand how to solve the problem and have an absolute correlation to a value proposition and a very well defined and clear model of how you're going to go solve that problem, monetize it, and scale. Or you have to have an incredibly well-financed and deep war chest to go after a platform play that's going after a very large TAM that is enabling a re-platforming at one of the levels and the new IT landscape. >> So laser focus in a stack or vertical, and/or a huge cash from funded benchmark or other VCs, tier one VCs, to have a differentiator. They have to have some sort of enabler. >> To enable a next generation platform and something that's very transformational as a platform that really evolves the IT stack. >> What strategies would you advise entrepreneurs in terms of either white spaces to attack and/or their orientation to this new data layer? Because if this plays out as we were talking about, you're going to have a horizontal data layer where you need eye dropper ability. Need to have data in motion, but data aware. Smart data you integrate into disparate systems. Breaking down the siloed concept. How should an entrepreneur develop or look at that? Is there a certain model you've seen work successfully? Is there a certain open-source group they can jump into? What thoughts would you share? 'Cause this seems to be the toughest nut to crack for entrepreneurs. >> Right now you're seeing a massive shift in the IT data architecture, is one example. You're seeing another massive shift in the network architecture. For example, the SDN, right? You're seeing I think a big shift in the kinds of applications getting away from application functionality to data enabled applications. And I think it's important for the entrepreneur to understand where in the landscape do they really want to position? Where do they bring intellectual capital that can be monetized? Some of the areas that I think you'll see emerge very quickly in the next four, six, eight quarters are the new optimization engines, and so things around AI and machine learning. And now that we have all of the data under management through its entire life cycle, how do I now optimize both where that data's processed, in the cloud or on Prim, or as it's in motion. And there's a massive opportunity through software defined networking to actually come in and now optimize at the purest price point and/or efficiency where that data's managed, where that data's stored, and let it continue to reap the benefits. Just as Amazon's done in retail, if you like this, you should look at that. Just as Yahoo! did, I'll point out with Hadoop, it's advertising models and strategies of being able to put specific content in front of you. Those kinds of opportunities are now available for the processing and storage of data through the entire life cycle across any architectural strategy. >> Are you seeing data from a developer's standpoint being instrumental in their use cases? Meaning as I'm developing on top a data platforms like Hortonworks or others, where there's disparate data, what's their interaction? What's their relationship to the data? How are they using it? What do they need to know? Where's the line in terms of their involvement in the data? >> Well, what we're seeing is very big movement with the developed community that they now want to be able to just let the data tell them where the application service needs to be. Because in the new world of data they understand what the entity relationships are with their customers and the patterns that their customers happening. They now can highly optimize when their customers are about to cross over into from one event to the other, and what that typically means and therefore what the inverted action should be to create the best experience with their customer, to create a higher level of service, to be able to create a better packaged price point at a better margin. They also have the ability to understand it in real-time based on what the data trend is flowing, how well their product's performing. Any obstacles or issues that are happening with their product. So they don't want to have to have application logic that then they run a report on three days, three weeks after some events happened. They now are taking the data and as that data and events are happening in the data and it's telling them what to do and they're able to prescriptively act on whatever event or circumstance unfold from that. >> So they want the data now. They want real-time data embedded in the apps as on the front line developer. >> And they want to optimize what that data is doing as it's unfolding through its natural life cycle. >> Let's talk with your customer base and what their expectations are. What questions should a customer or potential customer ask to their big data vendor as they look at the future? What are the key questions they should ask? >> They should really be comparing what is your architectural strategy, first and foremost. For managing data. And what kinds of data can I manage? What are the limitations in your architecture? What workloads and data sets can't I manage? What are the latency issues that your architecture would create for me? What's your business model that's associated with us engaging together? How much of the life cycle can you enable of my data? How secure are you making my data? What kind of long tail of visibility and chain of custody can I have around the governance? What kind of governance standards are you applying to the data? How much of my governance standards can you help me automate? How easy is it to operate and how intuitive is it? How big is your ecosystem? What's your road map and your strategy? What's next in your application stack? >> So enterprises are looking at simplicity. They're looking for total cost of ownership. How is big data innovation going to solve that problem? Because with IoT, again, a lot of new stuff's happening really, really fast. How do they get their arms around this simplicity question in this total cost of ownership? How should they be thinking about it? >> Well, what the Hadoop platforms have to do and the data in motion platforms have to do is to be able to bring the data under management and bring all of the enterprise services that they have in their existing data platforms, in the areas of security, in the areas of management, in the areas of data governance, so they can truly run mission critical workloads at scale with all the same levels of predictability that they have in isolation, in their existing proprietary platforms. And be able to do it in a way that's very intuitive for their existing platforms to be able to access it, very intuitive for their operations teams to be able to manage it, and very clean and easy for their existing tools and platforms investments to leverage it. >> On the industry landscape right now what are you seeing if a consolidation? Some are saying we're seeing some consolidation. Lot of companies going private. You're seeing people buckle down. It's almost a line. If you weren't born before a certain date for the company, you might have the wrong architecture. Certainly enterprises re-platform, I would agree with that, but as a supplier to customers, you're one of the young guys. You were born in the cloud. You were born in open-source, Hortonworks. Not everyone else is like that, and certainly Oracle's like one of the big guys that keep on doing well. IBM's been around. But they're all changing, as well. And certainly a lot of these growth companies pre-IPO are kind of being sold off. What's your take on the current situation with the bubble, the softening, whatever people calling it. What's your thoughts? >> I think you see some companies who got caught up and if we sort of unpack that to the ones who are going private now, those are the companies that have operated in a very mature market space. They were able to not innovate as much as they would probably have liked to, they're probably locked into a proprietary technology in a non-subscription model of some sort. Maybe a perpetual license model. And those are very different models than the enterprise wants to adopt today and their ability to innovate and grow because the market shrank, forced them to go into very constrained environments. And ultimately, they can be great companies. They have great value propositions, but they need to go through transformations that don't include a 90-day shot clock in the public market. In the markets where there's maybe, I was in the B round or the C round and I was focused on providing a niche offering into one of those mature spaces that's becoming disintermediated or evolve quickly because an open-source company has come into the space or that section of IT stack has morphed into more of a cloud-centric or SAP-centric or an open-source centric environment. They got cut short. Their market's gone away. Their market shrunk. They can't innovate their way out of it. And they then ultimately have to find a different approach, and they may or may not be able to get the financing to do that. We're in a much different position. >> Certainly the down round. We're seeing down rounds from the high valuations. That's the first sign of trouble. >> That's the first sign. I've gotten three calls this week from companies that are liquidating and have two weeks to find a new home. >> Great, we'll look for some furniture for our new growing SiliconANGLE office. >> I think you'll have some good values. >> You personally, looking back over five year now in this journey, what an incredible run you guys have had and fun to watch you guys. What's the biggest thing that surprised you and what's the biggest thing that's happened? If you can talk about those two things 'cause again, a lots happened. The markets changed significantly. You guys went public. You got a big office here. What surprised you and what was the biggest thing that you think was the catalyst of the current trajectory? >> How quickly the market grew. We saw from day one when we started the company that this was a billion dollar opportunity, and that was the bar for starting whatever we did. We were looking for new opportunities. We had to see a billion dollar opportunity. How quickly we have seen the growth and the formation of the market in general. And then how quickly some of the new opportunities have opened up, in particular around streaming, Internet of Things, the new paradigm data sets, and how quickly the enterprises have seen the ability to create a next generation data architecture and the aggressiveness in which their moving to do that with Hadoop. And then how quickly in the last year it swung to also being able to want to bring data in motion under management, as well. >> If you could talk to a customer right here, right now, and they asked you the following question, Rob, look around the corner five years out. Tell me something that someone else can't see that you see, that I should be aware of in my business. And why should I go with Hortonworks? >> It's going to be a table stake requirement to be able to understand from whether it be your customer or your supply chain from the point they begin to engage and the first step towards engaging with your product or your service, what they're trying to accomplish, and to be able to interact with them from that first inception point. It's also going to be table stakes to understand to be able to monitor your product in real-time, and be able to understand how well it's performing, down to the component level so that you can make real-time corrections, improvements, and be able to do that on the fly. The other thing that you're going to see is that it's going to be a table stake requirement to be able to aggregate the data that's happened in that life cycle and give your customer the ability to monetize the data about them. But you as the enterprise will be responsible for creating anonymity, confidentiality and security of the data. But you're going to have to be able to provide the data about your customers and give them the ability to if they choose to monetize the data about them, that the ability to do so. >> So I get that correct, you're basically saying 100% digital. >> Oh, it's by far, within the next five years, absolutely. If you do not have a full digital model, in most industries you'll be disintermediated. >> Final question. What's the big bet that you're making right now at Hortonworks? That you say we're pinning the company on blank, fill in the blank. >> It's not about big data. It's about all data under management. >> Rob, thanks so much for spending the time here On the Ground. Rob Bearden, CEO of Hortonworks here for an executive On the Ground. I'm John for The Cube. Thanks for watching. (techno music)
SUMMARY :
Voiceover: On the Ground, Welcome to a special On the Ground executive interview So I got to ask you, and the street has us at $265 million dollars in billings. CEOs across the globe are facing profound challenges, and that's really the transformation that's happening and that's really been the key trend and the data that you have about them. and the value creation back is at a pace so now the challenge is how to use technology, and so the technology as you said is there, line of sight of the value, and have the ability to monetize and unlock What does that mean for the customer? the ability to manage data at rest with Hadoop, and one of the ideas of Hadoop was it was And so the data warehouse of the future So how has real-time changed the game? the data has to be able to be processed whether it be So the question is how are you guys going to of the data platform being able to bring batch, for both the tech and the company. So the TAM's growing. and the value we bring to the data What's some of the level of engagements for the bad guys to get in and have very new schemes. and some of the new models that could help them. and adopters of the tech in terms of So talk about the do-it-yourself mentality. and the tech and the maturing of the tech, and all the people always say, and that's the core basis. it's accelerated the time to value that our customers get or has it become a sidebar of the big data industry? and that starts from the point of origination, How has the ecosystem in the Hadoop industry, say the way you solve your big data problem acceleration of the growth around the ecosystem. And a lot of the big platforms have robust, and have an absolute correlation to a value proposition They have to have some sort of enabler. that really evolves the IT stack. 'Cause this seems to be the toughest nut and let it continue to reap the benefits. They also have the ability to understand it as on the front line developer. And they want to optimize what that data is doing What are the key questions they should ask? How much of the life cycle can you How is big data innovation going to solve that problem? and the data in motion platforms have to do and certainly Oracle's like one of the big guys and their ability to innovate and grow We're seeing down rounds from the high valuations. That's the first sign. for our new growing SiliconANGLE office. and fun to watch you guys. have seen the ability to create and they asked you the following question, that the ability to do so. So I get that correct, If you do not have a full digital model, What's the big bet that you're making right now It's about all data under management. for an executive On the Ground.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rob Bearden | PERSON | 0.99+ |
Dave Velanti | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Michael Dell | PERSON | 0.99+ |
$3 billion | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
two weeks | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
Aroon Merkey | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
90-day | QUANTITY | 0.99+ |
three days | QUANTITY | 0.99+ |
June | DATE | 0.99+ |
two things | QUANTITY | 0.99+ |
TAM | ORGANIZATION | 0.99+ |
first sign | QUANTITY | 0.99+ |
first entry | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
Dublin | LOCATION | 0.99+ |
both | QUANTITY | 0.99+ |
$1 trillion dollar | QUANTITY | 0.99+ |
over 900 customers | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
over 1000 employees | QUANTITY | 0.99+ |
$50 billion dollar | QUANTITY | 0.99+ |
three calls | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
last year | DATE | 0.99+ |
six | QUANTITY | 0.99+ |
$265 million dollars | QUANTITY | 0.98+ |
Big Data Week | EVENT | 0.98+ |
three weeks | QUANTITY | 0.98+ |
one example | QUANTITY | 0.98+ |
Series A | OTHER | 0.98+ |
Keblan Dublin | ORGANIZATION | 0.98+ |
this week | DATE | 0.98+ |
first step | QUANTITY | 0.98+ |
Hadoop Summit | EVENT | 0.98+ |
Yahoo! | ORGANIZATION | 0.98+ |
Both | QUANTITY | 0.97+ |
first generation | QUANTITY | 0.97+ |
this year | DATE | 0.97+ |
One | QUANTITY | 0.97+ |
four | QUANTITY | 0.96+ |
This year | DATE | 0.96+ |
Today | DATE | 0.96+ |
10-year birthday | QUANTITY | 0.96+ |
Hadoop | ORGANIZATION | 0.95+ |
end of 2016 | DATE | 0.95+ |
MAIN STAGE INDUSTRY EVENT 1
>>Have you ever wondered how we sequence the human genome, how your smartphone is so well smart, how we will ever analyze all the patient data for the new vaccines or even how we plan to send humans to Mars? Well, at Cloudera, we believe that data can make what is impossible today possible tomorrow we are the enterprise data cloud company. In fact, we provide analytics and machine learning technology that does everything from making your smartphone smarter, to helping scientists ensure that new vaccines are both safe and effective, big data, no problem out era, the enterprise data cloud company. >>So I think for a long time in this country, we've known that there's a great disparity between minority populations and the majority of population in terms of disease burden. And depending on where you live, your zip code has more to do with your health than almost anything else. But there are a lot of smaller, um, safety net facilities, as well as small academic medical colleges within the United States. And those in those smaller environments don't have the access, you know, to the technologies that the larger ones have. And, you know, I call that, uh, digital disparity. So I'm, Harry's in academic scientist center and our mission is to train diverse health care providers and researchers, but also provide services to underserved populations. As part of the reason that I think is so important for me hearing medical college, to do data science. One of the things that, you know, both Cloudera and Claire sensor very passionate about is bringing those height in technologies to, um, to the smaller organizations. >>It's very expensive to go to the cloud for these small organizations. So now with the partnership with Cloudera and Claire sets a clear sense, clients now enjoy those same technologies and really honestly have a technological advantage over some of the larger organizations. The reason being is they can move fast. So we were able to do this on our own without having to, um, hire data scientists. Uh, we probably cut three to five years off of our studies. I grew up in a small town in Arkansas and is one of those towns where the railroad tracks divided the blacks and the whites. My father died without getting much healthcare at all. And as an 11 year old, I did not understand why my father could not get medical attention because he was very sick. >>Since we come at my Harry are looking to serve populations that reflect themselves or affect the population. He came from. A lot of the data you find or research you find health is usually based on white men. And obviously not everybody who needs a medical provider is going to be a white male. >>One of the things that we're concerned about in healthcare is that there's bias in treatment already. We want to make sure those same biases do not enter into the algorithms. >>The issue is how do we get ahead of them to try to prevent these disparities? >>One of the great things about our dataset is that it contains a very diverse group of patients. >>Instead of just saying, everyone will have these results. You can break it down by race, class, cholesterol, level, other kinds of factors that play a role. So you can make the treatments in the long run. More specifically, >>Researchers are now able to use these technologies and really take those hypotheses from, from bench to bedside. >>We're able to overall improve the health of not just the person in front of you, but the population that, yeah, >>Well, the future is now. I love a quote by William Gibson who said the future is already here. It's just not evenly distributed. If we think hard enough and we apply things properly, uh, we can again take these technologies to, you know, underserved environments, um, in healthcare. Nobody should be technologically disadvantage. >>When is a car not just a car when it's a connected data driven ecosystem, dozens of sensors and edge devices gathering up data from just about anything road, infrastructure, other vehicles, and even pedestrians to create safer vehicles, smarter logistics, and more actionable insights. All the data from the connected car supports an entire ecosystem from manufacturers, building safer vehicles and fleet managers, tracking assets to insurers monitoring, driving behaviors to make roads safer. Now you can control the data journey from edge to AI. With Cloudera in the connected car, data is captured, consolidated and enriched with Cloudera data flow cloud Dara's data engineering, operational database and data warehouse provide the foundation to develop service center applications, sales reports, and engineering dashboards. With data science workbench data scientists can continuously train AI models and use data flow to push the models back to the edge, to enhance the car's performance as the industry's first enterprise data cloud Cloudera supports on-premise public and multi-cloud deployments delivering multifunction analytics on data anywhere with common security governance and metadata management powered by Cloudera SDX, an open platform built on open source, working with open compute architectures and open data stores all the way from edge to AI powering the connected car. >>The future has arrived. >>The Dawn of a retail Renaissance is here and shopping will never be the same again. Today's connected. Consumers are always on and didn't control. It's the era of smart retail, smart shelves, digital signage, and smart mirrors offer an immersive customer experience while delivering product information, personalized offers and recommendations, video analytics, capture customer emotions and gestures to better understand and respond to in-store shopping experiences. Beacons sensors, and streaming video provide valuable data into in-store traffic patterns, hotspots and dwell times. This helps retailers build visual heat maps to better understand custom journeys, conversion rates, and promotional effectiveness in our robots automate routine tasks like capturing inventory levels, identifying out of stocks and alerting in store personnel to replenish shelves. When it comes to checking out automated e-commerce pickup stations and frictionless checkouts will soon be the norm making standing in line. A thing of the past data and analytics are truly reshaping. >>The everyday shopping experience outside the store, smart trucks connect the supply chain, providing new levels of inventory visibility, not just into the precise location, but also the condition of those goods. All in real time, convenience is key and customers today have the power to get their goods delivered at the curbside to their doorstep, or even to their refrigerators. Smart retail is indeed here. And Cloudera makes all of this possible using Cloudera data can be captured from a variety of sources, then stored, processed, and analyzed to drive insights and action. In real time, data scientists can continuously build and train new machine learning models and put these models back to the edge for delivering those moment of truth customer experiences. This is the enterprise data cloud powered by Cloudera enabling smart retail from the edge to AI. The future has arrived >>For is a global automotive supplier. We have three business groups, automotive seating in studios, and then emission control technologies or biggest automotive customers are Volkswagen for the NPSA. And we have, uh, more than 300 sites. And in 75 countries >>Today, we are generating tons of data, more and more data on the manufacturing intelligence. We are trying to reduce the, the defective parts or anticipate the detection of the, of the defective part. And this is where we can get savings. I would say our goal in manufacturing is zero defects. The cost of downtime in a plant could be around the a hundred thousand euros. So with predictive maintenance, we are identifying correlations and patterns and try to anticipate, and maybe to replace a component before the machine is broken. We are in the range of about 2000 machines and we can have up to 300 different variables from pressure from vibration and temperatures. And the real-time data collection is key, and this is something we cannot achieve in a classical data warehouse approach. So with the be data and with clouded approach, what we are able to use really to put all the data, all the sources together in the classical way of working with that at our house, we need to spend weeks or months to set up the model with the Cloudera data lake. We can start working on from days to weeks. We think that predictive or machine learning could also improve on the estimation or NTC patient forecasting of what we'll need to brilliance with all this knowledge around internet of things and data collection. We are applying into the predictive convene and the cockpit of the future. So we can work in the self driving car and provide a better experience for the driver in the car. >>The Cloudera data platform makes it easy to say yes to any analytic workload from the edge to AI, yes. To enterprise grade security and governance, yes. To the analytics your people want to use yes. To operating on any cloud. Your business requires yes to the future with a cloud native platform that flexes to meet your needs today and tomorrow say yes to CDP and say goodbye to shadow it, take a tour of CDP and see how it's an easier, faster and safer enterprise analytics and data management platform with a new approach to data. Finally, a data platform that lets you say yes, >>Welcome to transforming ideas into insights, presented with the cube and made possible by cloud era. My name is Dave Volante from the cube, and I'll be your host for today. And the next hundred minutes, you're going to hear how to turn your best ideas into action using data. And we're going to share the real world examples and 12 industry use cases that apply modern data techniques to improve customer experience, reduce fraud, drive manufacturing, efficiencies, better forecast, retail demand, transform analytics, improve public sector service, and so much more how we use data is rapidly evolving as is the language that we use to describe data. I mean, for example, we don't really use the term big data as often as we used to rather we use terms like digital transformation and digital business, but you think about it. What is a digital business? How is that different from just a business? >>Well, digital business is a data business and it differentiates itself by the way, it uses data to compete. So whether we call it data, big data or digital, our belief is we're entering the next decade of a world that puts data at the core of our organizations. And as such the way we use insights is also rapidly evolving. You know, of course we get value from enabling humans to act with confidence on let's call it near perfect information or capitalize on non-intuitive findings. But increasingly insights are leading to the development of data, products and services that can be monetized, or as you'll hear in our industry, examples, data is enabling machines to take cognitive actions on our behalf. Examples are everywhere in the forms of apps and products and services, all built on data. Think about a real-time fraud detection, know your customer and finance, personal health apps that monitor our heart rates. >>Self-service investing, filing insurance claims and our smart phones. And so many examples, IOT systems that communicate and act machine and machine real-time pricing actions. These are all examples of products and services that drive revenue cut costs or create other value. And they all rely on data. Now while many business leaders sometimes express frustration that their investments in data, people, and process and technologies haven't delivered the full results they desire. The truth is that the investments that they've made over the past several years should be thought of as a step on the data journey. Key learnings and expertise from these efforts are now part of the organizational DNA that can catapult us into this next era of data, transformation and leadership. One thing is certain the next 10 years of data and digital transformation, won't be like the last 10. So let's get into it. Please join us in the chat. >>You can ask questions. You can share your comments, hit us up on Twitter right now. It's my pleasure to welcome Mick Holliston in he's the president of Cloudera mic. Great to see you. Great to see you as well, Dave, Hey, so I call it the new abnormal, right? The world is kind of out of whack offices are reopening again. We're seeing travel coming back. There's all this pent up demand for cars and vacations line cooks at restaurants. Everything that we consumers have missed, but here's the one thing. It seems like the algorithms are off. Whether it's retail's fulfillment capabilities, airline scheduling their pricing algorithms, you know, commodity prices we don't know is inflation. Transitory. Is it a long-term threat trying to forecast GDP? It's just seems like we have to reset all of our assumptions and make a feel a quality data is going to be a key here. How do you see the current state of the industry and the role data plays to get us into a more predictable and stable future? Well, I >>Can sure tell you this, Dave, uh, out of whack is definitely right. I don't know if you know or not, but I happen to be coming to you live today from Atlanta and, uh, as a native of Atlanta, I can, I can tell you there's a lot to be known about the airport here. It's often said that, uh, whether you're going to heaven or hell, you got to change planes in Atlanta and, uh, after 40 minutes waiting on algorithm to be right for baggage claim when I was not, I finally managed to get some bag and to be able to show up dressed appropriately for you today. Um, here's one thing that I know for sure though, Dave, clean, consistent, and safe data will be essential to getting the world and businesses as we know it back on track again, um, without well-managed data, we're certain to get very inconsistent outcomes, quality data will the normalizing factor because one thing really hasn't changed about computing since the Dawn of time. Back when I was taking computer classes at Georgia tech here in Atlanta, and that's what we used to refer to as garbage in garbage out. In other words, you'll never get quality data-driven insights from a poor data set. This is especially important today for machine learning and AI, you can build the most amazing models and algorithms, but none of it will matter if the underlying data isn't rock solid as AI is increasingly used in every business app, you must build a solid data foundation mic. Let's >>Talk about hybrid. Every CXO that I talked to, they're trying to get hybrid, right? Whether it's hybrid work hybrid events, which is our business hybrid cloud, how are you thinking about the hybrid? Everything, what's your point of view with >>All those descriptions of hybrid? Everything there, one item you might not have quite hit on Dave and that's hybrid data. >>Oh yeah, you're right. Mick. I did miss that. What, what do you mean by hybrid data? Well, >>David in cloud era, we think hybrid data is all about the juxtaposition of two things, freedom and security. Now every business wants to be more agile. They want the freedom to work with their data, wherever it happens to work best for them, whether that's on premises in a private cloud and public cloud, or perhaps even in a new open data exchange. Now this matters to businesses because not all data applications are created equal. Some apps are best suited to be run in the cloud because of their transitory nature. Others may be more economical if they're running a private cloud, but either way security, regulatory compliance and increasingly data sovereignty are playing a bigger and more important role in every industry. If you don't believe me, just watch her read a recent news story. Data breaches are at an all time high. And the ethics of AI applications are being called into question every day and understanding the lineage of machine learning algorithms is now paramount for every business. So how in the heck do you get both the freedom and security that you're looking for? Well, the answer is actually pretty straightforward. The key is developing a hybrid data strategy. And what do you know Dave? That's the business cloud era? Is it on a serious note from cloud era's perspective? Adopting a hybrid data strategy is central to every business's digital transformation. It will enable rapid adoption of new technologies and optimize economic models while ensuring the security and privacy of every bit of data. What can >>Make, I'm glad you brought in that notion of hybrid data, because when you think about things, especially remote work, it really changes a lot of the assumptions. You talked about security, the data flows are going to change. You've got the economics, the physics, the local laws come into play. So what about the rest of hybrid? Yeah, >>It's a great question, Dave and certainly cloud era itself as a business and all of our customers are feeling this in a big way. We now have the overwhelming majority of our workforce working from home. And in other words, we've got a much larger surface area from a security perspective to keep in mind the rate and pace of data, just generating a report that might've happened very quickly and rapidly on the office. Uh, ether net may not be happening quite so fast in somebody's rural home in, uh, in, in the middle of Nebraska somewhere. Right? So it doesn't really matter whether you're talking about the speed of business or securing data, any way you look at it. Uh, hybrid I think is going to play a more important role in how work is conducted and what percentage of people are working in the office and are not, I know our plans, Dave, uh, involve us kind of slowly coming back to work, begin in this fall. And we're looking forward to being able to shake hands and see one another again for the first time in many cases for more than a year and a half, but, uh, yes, hybrid work, uh, and hybrid data are playing an increasingly important role for every kind of business. >>Thanks for that. I wonder if we could talk about industry transformation for a moment because it's a major theme of course, of this event. So, and the case. Here's how I think about it. It makes, I mean, some industries have transformed. You think about retail, for example, it's pretty clear, although although every physical retail brand I know has, you know, not only peaked up its online presence, but they also have an Amazon war room strategy because they're trying to take greater advantage of that physical presence, uh, and ended up reverse. We see Amazon building out physical assets so that there's more hybrid going on. But when you look at healthcare, for example, it's just starting, you know, with such highly regulated industry. It seems that there's some hurdles there. Financial services is always been data savvy, but you're seeing the emergence of FinTech and some other challenges there in terms of control, mint control of payment systems in manufacturing, you know, the pandemic highlighted America's reliance on China as a manufacturing partner and, and supply chain. Uh it's so my point is it seems that different industries they're in different stages of transformation, but two things look really clear. One, you've got to put data at the core of the business model that's compulsory. It seems like embedding AI into the applications, the data, the business process that's going to become increasingly important. So how do you see that? >>Wow, there's a lot packed into that question there, Dave, but, uh, yeah, we, we, uh, you know, at Cloudera I happened to be leading our own digital transformation as a technology company and what I would, what I would tell you there that's been arresting for us is the shift from being largely a subscription-based, uh, model to a consumption-based model requires a completely different level of instrumentation and our products and data collection that takes place in real, both for billing, for our, uh, for our customers. And to be able to check on the health and wellness, if you will, of their cloud era implementations. But it's clearly not just impacting the technology industry. You mentioned healthcare and we've been helping a number of different organizations in the life sciences realm, either speed, the rate and pace of getting vaccines, uh, to market, uh, or we've been assisting with testing process. >>That's taken place because you can imagine the quantity of data that's been generated as we've tried to study the efficacy of these vaccines on millions of people and try to ensure that they were going to deliver great outcomes and, and healthy and safe outcomes for everyone. And cloud era has been underneath a great deal of that type of work and the financial services industry you pointed out. Uh, we continue to be central to the large banks, meeting their compliance and regulatory requirements around the globe. And in many parts of the world, those are becoming more stringent than ever. And Cloudera solutions are really helping those kinds of organizations get through those difficult challenges. You, you also happened to mention, uh, you know, public sector and in public sector. We're also playing a key role in working with government entities around the world and applying AI to some of the most challenging missions that those organizations face. >>Um, and while I've made the kind of pivot between the industry conversation and the AI conversation, what I'll share with you about AI, I touched upon a little bit earlier. You can't build great AI, can't grow, build great ML apps, unless you've got a strong data foundation underneath is back to that garbage in garbage out comment that I made previously. And so in order to do that, you've got to have a great hybrid dated management platform at your disposal to ensure that your data is clean and organized and up to date. Uh, just as importantly from that, that's kind of the freedom side of things on the security side of things. You've got to ensure that you can see who just touched, not just the data itself, Dave, but actually the machine learning models and organizations around the globe are now being challenged. It's kind of on the topic of the ethics of AI to produce model lineage. >>In addition to data lineage. In other words, who's had access to the machine learning models when and where, and at what time and what decisions were made perhaps by the humans, perhaps by the machines that may have led to a particular outcome. So every kind of business that is deploying AI applications should be thinking long and hard about whether or not they can track the full lineage of those machine learning models just as they can track the lineage of data. So lots going on there across industries, lots going on as those various industries think about how AI can be applied to their businesses. Pretty >>Interesting concepts. You bring it into the discussion, the hybrid data, uh, sort of new, I think, new to a lot of people. And th this idea of model lineage is a great point because people want to talk about AI, ethics, transparency of AI. When you start putting those models into, into machines to do real time inferencing at the edge, it starts to get really complicated. I wonder if we could talk about you still on that theme of industry transformation? I felt like coming into the pandemic pre pandemic, there was just a lot of complacency. Yeah. Digital transformation and a lot of buzz words. And then we had this forced March to digital, um, and it's, but, but people are now being more planful, but there's still a lot of sort of POC limbo going on. How do you see that? Can you help accelerate that and get people out of that state? It definitely >>Is a lot of a POC limbo or a, I think some of us internally have referred to as POC purgatory, just getting stuck in that phase, not being able to get from point a to point B in digital transformation and, um, you know, for every industry transformation, uh, change in general is difficult and it takes time and money and thoughtfulness, but like with all things, what we found is small wins work best and done quickly. So trying to get to quick, easy successes where you can identify a clear goal and a clear objective and then accomplish it in rapid fashion is sort of the way to build your way towards those larger transformative efforts set. Another way, Dave, it's not wise to try to boil the ocean with your digital transformation efforts as it relates to the underlying technology here. And to bring it home a little bit more practically, I guess I would say at cloud era, we tend to recommend that companies begin to adopt cloud infrastructure, for example, containerization. >>And they begin to deploy that on-prem and then they start to look at how they may move those containerized workloads into the public cloud. That'll give them an opportunity to work with the data and the underlying applications themselves, uh, right close to home in place. They can kind of experiment a little bit more safely and economically, and then determine which workloads are best suited for the public cloud and which ones should remain on prem. That's a way in which a hybrid data strategy can help get a digital transformation accomplish, but kind of starting small and then drawing fast from there on customer's journey to the we'll make we've >>Covered a lot of ground. Uh, last question. Uh, w what, what do you want people to leave this event, the session with, and thinking about sort of the next era of data that we're entering? >>Well, it's a great question, but, uh, you know, I think it could be summed up in, uh, in two words. I want them to think about a hybrid data, uh, strategy. So, uh, you know, really hybrid data is a concept that we're bringing forward on this show really for the, for the first time, arguably, and we really do think that it enables customers to experience what we refer to Dave as the power of, and that is freedom, uh, and security, and in a world where we're all still trying to decide whether each day when we walk out each building, we walk into, uh, whether we're free to come in and out with a mask without a mask, that sort of thing, we all want freedom, but we also also want to be safe and feel safe, uh, for ourselves and for others. And the same is true of organizations. It strategies. They want the freedom to choose, to run workloads and applications and the best and most economical place possible. But they also want to do that with certainty, that they're going to be able to deploy those applications in a safe and secure way that meets the regulatory requirements of their particular industry. So hybrid data we think is key to accomplishing both freedom and security for your data and for your business as a whole, >>Nick, thanks so much great conversation and really appreciate the insights that you're bringing to this event into the industry. Really thank you for your time. >>You bet Dave pleasure being with you. Okay. >>We want to pick up on a couple of themes that Mick discussed, you know, supercharging your business with AI, for example, and this notion of getting hybrid, right? So right now we're going to turn the program over to Rob Bearden, the CEO of Cloudera and Manny veer, DAS. Who's the head of enterprise computing at Nvidia. And before I hand it off to Robin, I just want to say for those of you who follow me at the cube, we've extensively covered the transformation of the semiconductor industry. We are entering an entirely new era of computing in the enterprise, and it's being driven by the emergence of data, intensive applications and workloads no longer will conventional methods of processing data suffice to handle this work. Rather, we need new thinking around architectures and ecosystems. And one of the keys to success in this new era is collaboration between software companies like Cloudera and semiconductor designers like Nvidia. So let's learn more about this collaboration and what it means to your data business. Rob, thanks, >>Mick and Dave, that was a great conversation on how speed and agility is everything in a hyper competitive hybrid world. You touched on AI as essential to a data first strategy and accelerating the path to value and hybrid environments. And I want to drill down on this aspect today. Every business is facing accelerating everything from face-to-face meetings to buying groceries has gone digital. As a result, businesses are generating more data than ever. There are more digital transactions to track and monitor. Now, every engagement with coworkers, customers and partners is virtual from website metrics to customer service records, and even onsite sensors. Enterprises are accumulating tremendous amounts of data and unlocking insights from it is key to our enterprises success. And with data flooding every enterprise, what should the businesses do? A cloud era? We believe this onslaught of data offers an opportunity to make better business decisions faster. >>And we want to make that easier for everyone, whether it's fraud, detection, demand, forecasting, preventative maintenance, or customer churn, whether the goal is to save money or produce income every day that companies don't gain deep insight from their data is money they've lost. And the reason we're talking about speed and why speed is everything in a hybrid world and in a hyper competitive climate, is that the faster we get insights from all of our data, the faster we grow and the more competitive we are. So those faster insights are also combined with the scalability and cost benefit they cloud provides and with security and edge to AI data intimacy. That's why the partnership between cloud air and Nvidia together means so much. And it starts with the shared vision making data-driven, decision-making a reality for every business and our customers will now be able to leverage virtually unlimited quantities of varieties, of data, to power, an order of magnitude faster decision-making and together we turbo charge the enterprise data cloud to enable our customers to work faster and better, and to make integration of AI approaches a reality for companies of all sizes in the cloud. >>We're joined today by NVIDIA's Mandy veer dos, and to talk more about how our technologies will deliver the speed companies need for innovation in our hyper competitive environment. Okay, man, you're veer. Thank you for joining us over the unit. >>Thank you, Rob, for having me. It's a pleasure to be here on behalf of Nvidia. We are so excited about this partnership with Cloudera. Uh, you know, when, when, uh, when Nvidia started many years ago, we started as a chip company focused on graphics, but as you know, over the last decade, we've really become a full stack accelerated computing company where we've been using the power of GPU hardware and software to accelerate a variety of workloads, uh, AI being a prime example. And when we think about Cloudera, uh, and your company, a great company, there's three things we see Rob. Uh, the first one is that for the companies that will already transforming themselves by the use of data, Cloudera has been a trusted partner for them. The second thing seen is that when it comes to using your data, you want to use it in a variety of ways with a powerful platform, which of course you have built over time. >>And finally, as we've heard already, you believe in the power of hybrid, that data exists in different places and the compute needs to follow the data. Now, if you think about in various mission, going forward to democratize accelerated computing for all companies, our mission actually aligns very well with exactly those three things. Firstly, you know, we've really worked with a variety of companies today who have been the early adopters, uh, using the power acceleration by changing the technology in their stacks. But more and more, we see the opportunity of meeting customers, where they are with tools that they're familiar with with partners that they trust. And of course, Cloudera being a great example of that. Uh, the second, uh, part of NVIDIA's mission is we focused a lot in the beginning on deep learning where the power of GPU is really shown through, but as we've gone forward, we found that GPU's can accelerate a variety of different workloads from machine learning to inference. >>And so again, the power of your platform, uh, is very appealing. And finally, we know that AI is all about data, more and more data. We believe very strongly in the idea that customers put their data, where they need to put it. And the compute, the AI compute the machine learning compute needs to meet the customer where their data is. And so that matches really well with your philosophy, right? And Rob, that's why we were so excited to do this partnership with you. It's come to fruition. We have a great combined stack now for the customer and we already see people using it. I think the IRS is a fantastic example where literally they took the workflow. They had, they took the servers, they had, they added GPS into those servers. They did not change anything. And they got an eight times performance improvement for their fraud detection workflows, right? And that's the kind of success we're looking forward to with all customers. So the team has actually put together a great video to show us what the IRS is doing with this technology. Let's take a look. >>My name's Joanne salty. I'm the branch chief of the technical branch and RAs. It's actually the research division research and statistical division of the IRS. Basically the mission that RAs has is we do statistical and research on all things related to taxes, compliance issues, uh, fraud issues, you know, anything that you can think of. Basically we do research on that. We're running into issues now that we have a lot of ideas to actually do data mining on our big troves of data, but we don't necessarily have the infrastructure or horsepower to do it. So it's our biggest challenge is definitely the, the infrastructure to support all the ideas that the subject matter experts are coming up with in terms of all the algorithms they would like to create. And the diving deeper within the algorithm space, the actual training of those Agra algorithms, the of parameters each of those algorithms have. >>So that's, that's really been our challenge. Now the expectation was that with Nvidia in cloud, there is help. And with the cluster, we actually build out the test this on the actual fraud, a fraud detection algorithm on our expectation was we were definitely going to see some speed up in prom, computational processing times. And just to give you context, the size of the data set that we were, uh, the SMI was actually working, um, the algorithm against Liz around four terabytes. If I recall correctly, we'd had a 22 to 48 times speed up after we started tweaking the original algorithm. My expectations, quite honestly, in that sphere, in terms of the timeframe to get results, was it that you guys actually exceeded them? It was really, really quick. Uh, the definite now term short term what's next is going to be the subject matter expert is actually going to take our algorithm run with that. >>So that's definitely the now term thing we want to do going down, go looking forward, maybe out a couple of months, we're also looking at curing some, a 100 cards to actually test those out. As you guys can guess our datasets are just getting bigger and bigger and bigger, and it demands, um, to actually do something when we get more value added out of those data sets is just putting more and more demands on our infrastructure. So, you know, with the pilot, now we have an idea with the infrastructure, the infrastructure we need going forward. And then also just our in terms of thinking of the algorithms and how we can approach these problems to actually code out solutions to them. Now we're kind of like the shackles are off and we can just run them, you know, come onto our art's desire, wherever imagination takes our skis to actually develop solutions, know how the platforms to run them on just kind of the close out. >>I rarely would be very missed. I've worked with a lot of, you know, companies through the year and most of them been spectacular. And, uh, you guys are definitely in that category. The, the whole partnership, as I said, a little bit early, it was really, really well, very responsive. I would be remiss if I didn't. Thank you guys. So thank you for the opportunity to, and fantastic. And I'd have to also, I want to thank my guys. My, uh, my staff, David worked on this Richie worked on this Lex and Tony just, they did a fantastic job and I want to publicly thank him for all the work they did with you guys and Chev, obviously also. Who's fantastic. So thank you everyone. >>Okay. That's a real great example of speed and action. Now let's get into some follow up questions guys, if I may, Rob, can you talk about the specific nature of the relationship between Cloudera and Nvidia? Is it primarily go to market or you do an engineering work? What's the story there? >>It's really both. It's both go to market and engineering and engineering focus is to optimize and take advantage of invidious platform to drive better price performance, lower cost, faster speeds, and better support for today's emerging data intensive applications. So it's really both >>Great. Thank you. Many of Eric, maybe you could talk a little bit more about why can't we just existing general purpose platforms that are, that are running all this ERP and CRM and HCM and you know, all the, all the Microsoft apps that are out there. What, what do Nvidia and cloud era bring to the table that goes beyond the conventional systems that we've known for many years? >>Yeah. I think Dave, as we've talked about the asset that the customer has is really the data, right? And the same data can be utilized in many different ways. Some machine learning, some AI, some traditional data analytics. So the first step here was really to take a general platform for data processing, Cloudera data platform, and integrate with that. Now Nvidia has a software stack called rapids, which has all of the primitives that make different kinds of data processing go fast on GPU's. And so the integration here has really been taking rapids and integrating it into a Cloudera data platform. So that regardless of the technique, the customer's using to get insight from that data, the acceleration will apply in all cases. And that's why it was important to start with a platform like Cloudera rather than a specific application. >>So I think this is really important because if you think about, you know, the software defined data center brought in, you know, some great efficiencies, but at the same time, a lot of the compute power is now going toward doing things like networking and storage and security offloads. So the good news, the reason this is important is because when you think about these data intensive workloads, we can now put more processing power to work for those, you know, AI intensive, uh, things. And so that's what I want to talk about a little bit, maybe a question for both of you, maybe Rob, you could start, you think about the AI that's done today in the enterprise. A lot of it is modeling in the cloud, but when we look at a lot of the exciting use cases, bringing real-time systems together, transaction systems and analytics systems and real time, AI inference, at least even at the edge, huge potential for business value and a consumer, you're seeing a lot of applications with AI biometrics and voice recognition and autonomous vehicles and the like, and so you're putting AI into these data intensive apps within the enterprise. >>The potential there is enormous. So what can we learn from sort of where we've come from, maybe these consumer examples and Rob, how are you thinking about enterprise AI in the coming years? >>Yeah, you're right. The opportunity is huge here, but you know, 90% of the cost of AI applications is the inference. And it's been a blocker in terms of adoption because it's just been too expensive and difficult from a performance standpoint and new platforms like these being developed by cloud air and Nvidia will dramatically lower the cost, uh, of enabling this type of workload to be done. Um, and what we're going to see the most improvements will be in the speed and accuracy for existing enterprise AI apps like fraud detection, recommendation, engine chain management, drug province, and increasingly the consumer led technologies will be bleeding into the enterprise in the form of autonomous factory operations. An example of that would be robots that AR VR and manufacturing. So driving quality, better quality in the power grid management, automated retail IOT, you know, the intelligent call centers, all of these will be powered by AI, but really the list of potential use cases now are going to be virtually endless. >>I mean, this is like your wheelhouse. Maybe you could add something to that. >>Yeah. I mean, I agree with Rob. I mean he listed some really good use cases. You know, the way we see this at Nvidia, this journey is in three phases or three steps, right? The first phase was for the early adopters. You know, the builders who assembled, uh, use cases, particular use cases like a chat bot, uh, uh, from the ground up with the hardware and the software almost like going to your local hardware store and buying piece parts and constructing a table yourself right now. I think we are in the first phase of the democratization, uh, for example, the work we did with Cloudera, which is, uh, for a broader base of customers, still building for a particular use case, but starting from a much higher baseline. So think about, for example, going to Ikea now and buying a table in a box, right. >>And you still come home and assemble it, but all the parts are there. The instructions are there, there's a recipe you just follow and it's easy to do, right? So that's sort of the phase we're in now. And then going forward, the opportunity we really look forward to for the democratization, you talked about applications like CRM, et cetera. I think the next wave of democratization is when customers just adopt and deploy the next version of an application they already have. And what's happening is that under the covers, the application is infused by AI and it's become more intelligent because of AI and the customer just thinks they went to the store and bought, bought a table and it showed up and somebody placed it in the right spot. Right. And they didn't really have to learn, uh, how to do AI. So these are the phases. And I think they're very excited to be going there. Yeah. You know, >>Rob, the great thing about for, for your customers is they don't have to build out the AI. They can, they can buy it. And, and just in thinking about this, it seems like there are a lot of really great and even sometimes narrow use cases. So I want to ask you, you know, staying with AI for a minute, one of the frustrations and Mick and I talked about this, the guy go problem that we've all studied in college, uh, you know, garbage in, garbage out. Uh, but, but the frustrations that users have had is really getting fast access to quality data that they can use to drive business results. So do you see, and how do you see AI maybe changing the game in that regard, Rob over the next several years? >>So yeah, the combination of massive amounts of data that have been gathered across the enterprise in the past 10 years with an open API APIs are dramatically lowering the processing costs that perform at much greater speed and efficiency, you know, and that's allowing us as an industry to democratize the data access while at the same time, delivering the federated governance and security models and hybrid technologies are playing a key role in making this a reality and enabling data access to be hybridized, meaning access and treated in a substantially similar way, your respect to the physical location of where that data actually resides. >>That's great. That is really the value layer that you guys are building out on top of that, all this great infrastructure that the hyperscalers have have given us, I mean, a hundred billion dollars a year that you can build value on top of, for your customers. Last question, and maybe Rob, you could, you can go first and then manufacture. You could bring us home. Where do you guys want to see the relationship go between cloud era and Nvidia? In other words, how should we, as outside observers be, be thinking about and measuring your project specifically and in the industry's progress generally? >>Yeah, I think we're very aligned on this and for cloud era, it's all about helping companies move forward, leverage every bit of their data and all the places that it may, uh, be hosted and partnering with our customers, working closely with our technology ecosystem of partners means innovation in every industry and that's inspiring for us. And that's what keeps us moving forward. >>Yeah. And I agree with Robin and for us at Nvidia, you know, we, this partnership started, uh, with data analytics, um, as you know, a spark is a very powerful technology for data analytics, uh, people who use spark rely on Cloudera for that. And the first thing we did together was to really accelerate spark in a seamless manner, but we're accelerating machine learning. We accelerating artificial intelligence together. And I think for Nvidia it's about democratization. We've seen what machine learning and AI have done for the early adopters and help them make their businesses, their products, their customer experience better. And we'd like every company to have the same opportunity. >>Okay. Now we're going to dig into the data landscape and cloud of course. And talk a little bit more about that with drew Allen. He's a managing director at Accenture drew. Welcome. Great to see you. Thank you. So let's talk a little bit about, you know, you've been in this game for a number of years. Uh, you've got particular expertise in, in data and finance and insurance. I mean, you know, you think about it within the data and analytics world, even our language is changing. You know, we don't say talk about big data so much anymore. We talk more about digital, you know, or, or, or data driven when you think about sort of where we've come from and where we're going. What are the puts and takes that you have with regard to what's going on in the business today? >>Well, thanks for having me. Um, you know, I think some of the trends we're seeing in terms of challenges and puts some takes are that a lot of companies are already on this digital journey. Um, they focused on customer experience is kind of table stakes. Everyone wants to focus on that and kind of digitizing their channels. But a lot of them are seeing that, you know, a lot of them don't even own their, their channels necessarily. So like we're working with a big cruise line, right. And yes, they've invested in digitizing what they own, but a lot of the channels that they sell through, they don't even own, right. It's the travel agencies or third party, real sellers. So having the data to know where, you know, where those agencies are, that that's something that they've discovered. And so there's a lot of big focus on not just digitizing, but also really understanding your customers and going across products because a lot of the data has built, been built up in individual channels and in digital products. >>And so bringing that data together is something that customers that have really figured out in the last few years is a big differentiator. And what we're seeing too, is that a big trend that the data rich are getting richer. So companies that have really invested in data, um, are having, uh, an outside market share and outside earnings per share and outside revenue growth. And it's really being a big differentiator. And I think for companies just getting started in this, the thing to think about is one of the missteps is to not try to capture all the data at once. The average company has, you know, 10,000, 20,000 data elements individually, when you want to start out, you know, 500, 300 critical data elements, about 5% of the data of a company drives 90% of the business value. So focusing on those key critical data elements is really what you need to govern first and really invest in first. And so that's something we, we tell companies at the beginning of their data strategy is first focus on those critical data elements, really get a handle on governing that data, organizing that data and building data products around >>That day. You can't boil the ocean. Right. And so, and I, I feel like pre pandemic, there was a lot of complacency. Oh yeah, we'll get to that. You know, not on my watch, I'll be retired before that, you know, is it becomes a minute. And then of course the pandemic was, I call it sometimes a forced March to digital. So in many respects, it wasn't planned. It just ha you know, you had to do it. And so now I feel like people are stepping back and saying, okay, let's now really rethink this and do it right. But is there, is there a sense of urgency, do you think? Absolutely. >>I think with COVID, you know, we were working with, um, a retailer where they had 12,000 stores across the U S and they had didn't have the insights where they could drill down and understand, you know, with the riots and with COVID was the store operational, you know, with the supply chain of the, having multiple distributors, what did they have in stock? So there are millions of data points that you need to drill down at the cell level, at the store level to really understand how's my business performing. And we like to think about it for like a CEO and his leadership team of it, like, think of it as a digital cockpit, right? You think about a pilot, they have a cockpit with all these dials and, um, dashboards, essentially understanding the performance of their business. And they should be able to drill down and understand for each individual, you know, unit of their work, how are they performing? That's really what we want to see for businesses. Can they get down to that individual performance to really understand how their business >>Is performing good, the ability to connect those dots and traverse those data points and not have to go in and come back out and go into a new system and come back out. And that's really been a lot of the frustration. W where does machine intelligence and AI fit in? Is that sort of a dot connector, if you will, and an enabler, I mean, we saw, you know, decades of the, the AI winter, and then, you know, there's been a lot of talk about it, but it feels like with the amount of data that we've collected over the last decade and the, the, the low costs of processing that data now, it feels like it's, it's real. Where do you see AI fitting? Yeah, >>I mean, I think there's been a lot of innovation in the last 10 years with, um, the low cost of storage and computing and these algorithms in non-linear, um, you know, knowledge graphs, and, um, um, a whole bunch of opportunities in cloud where what I think the, the big opportunity is, you know, you can apply AI in areas where a human just couldn't have the scale to do that alone. So back to the example of a cruise lines, you know, you may have a ship being built that has 4,000 cabins on the single cruise line, and it's going to multiple deaths that destinations over its 30 year life cycle. Each one of those cabins is being priced individually for each individual destination. It's physically impossible for a human to calculate the dynamic pricing across all those destinations. You need a machine to actually do that pricing. And so really what a machine is leveraging is all that data to really calculate and assist the human, essentially with all these opportunities where you wouldn't have a human being able to scale up to that amount of data >>Alone. You know, it's interesting. One of the things we talked to Nicolson about earlier was just the everybody's algorithms are out of whack. You know, you look at the airline pricing, you look at hotels it's as a consumer, you would be able to kind of game the system and predict that they can't even predict these days. And I feel as though that the data and AI are actually going to bring us back into some kind of normalcy and predictability, uh, what do you see in that regard? Yeah, I think it's, >>I mean, we're definitely not at a point where, when I talked to, you know, the top AI engineers and data scientists, we're not at a point where we have what they call broad AI, right? You can get machines to solve general knowledge problems, where they can solve one problem and then a distinctly different problem, right? That's still many years away, but narrow why AI, there's still tons of use cases out there that can really drive tons of business performance challenges, tons of accuracy challenges. So for example, in the insurance industry, commercial lines, where I work a lot of the time, the biggest leakage of loss experience in pricing for commercial insurers is, um, people will go in as an agent and they'll select an industry to say, you know what, I'm a restaurant business. Um, I'll select this industry code to quote out a policy, but there's, let's say, you know, 12 dozen permutations, you could be an outdoor restaurant. >>You could be a bar, you could be a caterer and all of that leads to different loss experience. So what this does is they built a machine learning algorithm. We've helped them do this, that actually at the time that they're putting in their name and address, it's crawling across the web and predicting in real time, you know, is this a address actually, you know, a business that's a restaurant with indoor dining, does it have a bar? Is it outdoor dining? And it's that that's able to accurately more price the policy and reduce the loss experience. So there's a lot of that you can do even with narrow AI that can really drive top line of business results. >>Yeah. I liked that term, narrow AI, because getting things done is important. Let's talk about cloud a little bit because people talk about cloud first public cloud first doesn't necessarily mean public cloud only, of course. So where do you see things like what's the right operating model, the right regime hybrid cloud. We talked earlier about hybrid data help us squint through the cloud landscape. Yeah. I mean, I think for most right, most >>Fortune 500 companies, they can't just snap their fingers and say, let's move all of our data centers to the cloud. They've got to move, you know, gradually. And it's usually a journey that's taking more than two to three plus years, even more than that in some cases. So they're have, they have to move their data, uh, incrementally to the cloud. And what that means is that, that they have to move to a hybrid perspective where some of their data is on premise and some of it is publicly on the cloud. And so that's the term hybrid cloud essentially. And so what they've had to think about is from an intelligence perspective, the privacy of that data, where is it being moved? Can they reduce the replication of that data? Because ultimately you like, uh, replicating the data from on-premise to the cloud that introduces, you know, errors and data quality issues. So thinking about how do you manage, uh, you know, uh on-premise and, um, public as a transition is something that Accenture thinks, thinks, and helps our clients do quite a bit. And how do you move them in a manner that's well-organized and well thought of? >>Yeah. So I've been a big proponent of sort of line of business lines of business becoming much more involved in, in the data pipeline, if you will, the data process, if you think about our major operational systems, they all have sort of line of business context in them. And then the salespeople, they know the CRM data and, you know, logistics folks there they're very much in tune with ERP, almost feel like for the past decade, the lines of business have been somewhat removed from the, the data team, if you will. And that, that seems to be changing. What are you seeing in terms of the line of line of business being much more involved in sort of end to end ownership, if you will, if I can use that term of, uh, of the data and sort of determining things like helping determine anyway, the data quality and things of that nature. Yeah. I >>Mean, I think this is where thinking about your data operating model and thinking about ideas of a chief data officer and having data on the CEO agenda, that's really important to get the lines of business, to really think about data sharing and reuse, and really getting them to, you know, kind of unlock the data because they do think about their data as a fiefdom data has value, but you've got to really get organizations in their silos to open it up and bring that data together because that's where the value is. You know, data doesn't operate. When you think about a customer, they don't operate in their journey across the business in silo channels. They don't think about, you know, I use only the web and then I use the call center, right? They think about that as just one experience and that data is a single journey. >>So we like to think about data as a product. You know, you should think about a data in the same way. You think about your products as, as products, you know, data as a product, you should have the idea of like every two weeks you have releases to it. You have an operational resiliency to it. So thinking about that, where you can have a very product mindset to delivering your data, I think is very important for the success. And that's where kind of, there's not just the things about critical data elements and having the right platform architecture, but there's a soft stuff as well, like a, a product mindset to data, having the right data, culture, and business adoption and having the right value set mindset for, for data, I think is really >>Important. I think data as a product is a very powerful concept and I think it maybe is uncomfortable to some people sometimes. And I think in the early days of big data, if you will, people thought, okay, data is a product going to sell my data and that's not necessarily what you mean, thinking about products or data that can fuel products that you can then monetize maybe as a product or as a, as, as a service. And I like to think about a new metric in the industry, which is how long does it take me to get from idea I'm a business person. I have an idea for a data product. How long does it take me to get from idea to monetization? And that's going to be something that ultimately as a business person, I'm going to use to determine the success of my data team and my data architecture. Is that kind of thinking starting to really hit the marketplace? Absolutely. >>I mean, I insurers now are working, partnering with, you know, auto manufacturers to monetize, um, driver usage data, you know, on telematics to see, you know, driver behavior on how, you know, how auto manufacturers are using that data. That's very important to insurers, you know, so how an auto manufacturer can monetize that data is very important and also an insurance, you know, cyber insurance, um, are there news new ways we can look at how companies are being attacked with viruses and malware. And is there a way we can somehow monetize that information? So companies that are able to agily, you know, think about how can we collect this data, bring it together, think about it as a product, and then potentially, you know, sell it as a service is something that, um, company, successful companies, you're doing great examples >>Of data products, and it might be revenue generating, or it might be in the case of, you know, cyber, maybe it reduces my expected loss and exactly. Then it drops right to my bottom line. What's the relationship between Accenture and cloud era? Do you, I presume you guys meet at the customer, but maybe you could give us some insight. >>Yeah. So, um, I, I'm in the executive sponsor for, um, the Accenture Cloudera partnership on the Accenture side. Uh, we do quite a lot of business together and, um, you know, Cloudera has been a great partner for us. Um, and they've got a great product in terms of the Cloudera data platform where, you know, what we do is as a big systems integrator for them, we help, um, you know, configure and we have a number of engineers across the world that come in and help in terms of, um, engineer architects and install, uh, cloud errors, data platform, and think about what are some of those, you know, value cases where you can really think about organizing data and bringing it together for all these different types of use cases. And really just as the examples we thought about. So the telematics, you know, um, in order to realize something like that, you're bringing in petabytes and huge scales of data that, you know, you just couldn't bring on a normal, uh, platform. You need to think about cloud. You need to think about speed of, of data and real-time insights and cloud era is the right data platform for that. So, um, >>Having a cloud Cloudera ushered in the modern big data era, we kind of all know that, and it was, which of course early on, it was very services intensive. You guys were right there helping people think through there weren't enough data scientists. We've sort of all, all been through that. And of course in your wheelhouse industries, you know, financial services and insurance, they were some of the early adopters, weren't they? Yeah, absolutely. >>Um, so, you know, an insurance, you've got huge amounts of data with loss history and, um, a lot with IOT. So in insurance, there's a whole thing of like sensorized thing in, uh, you know, taking the physical world and digitizing it. So, um, there's a big thing in insurance where, um, it's not just about, um, pricing out the risk of a loss experience, but actual reducing the loss before it even happens. So it's called risk control or loss control, you know, can we actually put sensors on oil pipelines or on elevators and, you know, reduce, um, you know, accidents before they happen. So we're, you know, working with an insurer to actually, um, listen to elevators as they move up and down and are there signals in just listening to the audio of an elevator over time that says, you know what, this elevator is going to need maintenance, you know, before a critical accident could happen. So there's huge applications, not just in structured data, but in unstructured data like voice and audio and video where a partner like Cloudera has a huge role to play. >>Great example of it. So again, narrow sort of use case for machine intelligence, but, but real value. True. We'll leave it like that. Thanks so much for taking some time. Yes. Thank you so much. Okay. We continue now with the theme of turning ideas into insights. So ultimately you can take action. We heard earlier that public cloud first doesn't mean public cloud only, and a winning strategy comprises data, irrespective of physical location on prem, across multiple clouds at the edge where real time inference is going to drive a lot of incremental value. Data is going to help the world come back to normal. We heard, or at least semi normal as we begin to better understand and forecast demand and supply and balances and economic forces. AI is becoming embedded into every aspect of our business, our people, our processes, and applications. And now we're going to get into some of the foundational principles that support the data and insights centric processes, which are fundamental to digital transformation initiatives. And it's my pleasure to welcome two great guests, Michelle Goetz. Who's a Kuba woman, VP and principal analyst at Forrester, and doing some groundbreaking work in this area. And Cindy, Mikey, who is the vice president of industry solutions and value management at Cloudera. Welcome to both of >>You. Welcome. Thank you. Thanks Dave. >>All right, Michelle, let's get into it. Maybe you could talk about your foundational core principles. You start with data. What are the important aspects of this first principle that are achievable today? >>It's really about democratization. If you can't make your data accessible, um, it's not usable. Nobody's able to understand what's happening in the business and they don't understand, um, what insights can be gained or what are the signals that are occurring that are going to help them with decisions, create stronger value or create deeper relationships, their customers, um, due to their experiences. So it really begins with how do you make data available and bring it to where the consumer of the data is rather than trying to hunt and Peck around within your ecosystem to find what it is that's important. Great. >>Thank you for that. So, Cindy, I wonder in hearing what Michelle just said, what are your thoughts on this? And when you work with customers at Cloudera, does, are there any that stand out that perhaps embody the fundamentals that Michelle just shared? >>Yeah, there's, there's quite a few. And especially as we look across, um, all the industries that we're actually working with customers in, you know, a few that stand out in top of mind for me is one is IQ via and what they're doing with real-world evidence and bringing together data across the entire, um, healthcare and life sciences ecosystems, bringing it together in different shapes and formats, making the ed accessible by both internally, as well as for their, um, the entire extended ecosystem. And then for SIA, who's working to solve some predictive maintenance issues within, there are a European car manufacturer and how do they make sure that they have, you know, efficient and effective processes when it comes to, uh, fixing equipment and so forth. And then also, um, there's, uh, an Indonesian based, um, uh, telecommunications company tech, the smell, um, who's bringing together, um, over the last five years, all their data about their customers and how do they enhance our customer experience? How do they make information accessible, especially in these pandemic and post pandemic times, um, uh, you know, just getting better insights into what customers need and when do they need it? >>Cindy platform is another core principle. How should we be thinking about data platforms in this day and age? I mean, where does, where do things like hybrid fit in? Um, what's cloud era's point >>Of view platforms are truly an enabler, um, and data needs to be accessible in many different fashions. Um, and also what's right for the business. When, you know, I want it in a cost and efficient and effective manner. So, you know, data needs to be, um, data resides everywhere. Data is developed and it's brought together. So you need to be able to balance both real time, you know, our batch historical information. It all depends upon what your analytical workloads are. Um, and what types of analytical methods you're going to use to drive those business insights. So putting and placing data, um, landing it, making it accessible, analyzing it needs to be done in any accessible platform, whether it be, you know, a public cloud doing it on-prem or a hybrid of the two is typically what we're seeing, being the most successful. >>Great. Thank you, Michelle. Let's move on a little bit and talk about practices and practices and processes as the next core principles. Maybe you could provide some insight as to how you think about balancing practices and processes while at the same time managing agility. >>Yeah, it's a really great question because it's pretty complex. When you have to start to connect your data to your business, the first thing to really gravitate towards is what are you trying to do? And what Cindy was describing with those customer examples is that they're all based off of business goals off of very specific use cases that helps kind of set the agenda about what is the data and what are the data domains that are important to really understanding and recognizing what's happening within that business activity and the way that you can affect that either in, you know, near time or real time, or later on, as you're doing your strategic planning, what that's balancing against is also being able to not only see how that business is evolving, but also be able to go back and say, well, can I also measure the outcomes from those processes and using data and using insight? >>Can I also get intelligence about the data to know that it's actually satisfying my objectives to influence my customers in my market? Or is there some sort of data drift or detraction in my, um, analytic capabilities that are allowing me to be effective in those environments, but everything else revolves around that and really thinking succinctly about a strategy that isn't just data aware, what data do I have and how do I use it, but coming in more from that business perspective to then start to be, data-driven recognizing that every activity you do from a business perspective leads to thinking about information that supports that and supports your decisions, and ultimately getting to the point of being insight driven, where you're able to both, uh, describe what you want your business to be with your data, using analytics, to then execute on that fluidly and in real time. And then ultimately bringing that back with linking to business outcomes and doing that in a continuous cycle where you can test and you can learn, you can improve, you can optimize, and you can innovate because you can see your business as it's happening. And you have the right signals and intelligence that allow you to make great decisions. >>I like how you said near time or real time, because it is a spectrum. And you know, one of the spectrum, autonomous vehicles, you've got to make a decision in real time, but, but, but near real-time, or real-time, it's, it's in the eyes of the holder, if you will, it's it might be before you lose the customer before the market changes. So it's really defined on a case by case basis. Um, I wonder Michelle, if you could talk about in working with a number of organizations, I see folks, they sometimes get twisted up and understanding the dependencies that technology generally, and the technologies around data specifically can have on critical business processes. Can you maybe give some guidance as to where customers should start, where, you know, where can we find some of the quick wins and high return, it >>Comes first down to how does your business operate? So you're going to take a look at the business processes and value stream itself. And if you can understand how people and customers, partners, and automation are driving that step by step approach to your business activities, to realize those business outcomes, it's way easier to start thinking about what is the information necessary to see that particular step in the process, and then take the next step of saying what information is necessary to make a decision at that current point in the process, or are you collecting information asking for information that is going to help satisfy a downstream process step or a downstream decision. So constantly making sure that you are mapping out your business processes and activities, aligning your data process to that helps you now rationalize. Do you need that real time near real time, or do you want to start grading greater consistency by bringing all of those signals together, um, in a centralized area to eventually oversee the entire operations and outcomes as they happen? It's the process and the decision points and acting on those decision points for the best outcome that really determines are you going to move in more of a real-time, uh, streaming capacity, or are you going to push back into more of a batch oriented approach? Because it depends on the amount of information and the aggregate of which provides the best insight from that. >>Got it. Let's, let's bring Cindy back into the conversation in your city. We often talk about people process and technology and the roles they play in creating a data strategy. That's that's logical and sound. Can you speak to the broader ecosystem and the importance of creating both internal and external partners within an organization? Yeah. >>And that's, uh, you know, kind of building upon what Michelle was talking about. If you think about datas and I hate to use the phrase almost, but you know, the fuel behind the process, um, and how do you actually become insight-driven? And, you know, you look at the capabilities that you're needing to enable from that business process, that insight process, um, you're extended ecosystem on, on how do I make that happen? You know, partners, um, and, and picking the right partner is important because a partner is one that actually helps under or helps you implement what your decisions are. Um, so, um, looking for a partner that has the capability that believes in being insight-driven and making sure that when you're leveraging data, um, you know, for within process on that, if you need to do it in a time fashion, that they can actually meet those needs of the business, um, and enabling on those, those process activities. So the ecosystem looking at how you, um, look at, you know, your vendors are, and fundamentally they need to be that trusted partner. Um, do they bring those same principles of value of being insight driven? So they have to have those core values themselves in order to help you as a, um, an end of business person enable those capabilities. So, so yeah, I'm >>Cool with fuel, but it's like super fuel when you talk about data, cause it's not scarce, right? You're never going to run out. So Michelle, let's talk about leadership. W w who leads, what does so-called leadership look like in an organization that's insight driven? >>So I think the really interesting thing that is starting to evolve as late is that organizations enterprises are really recognizing that not just that data is an asset and data has value, but exactly what we're talking about here, data really does drive what your business outcomes are going to be data driving into the insight or the raw data itself has the ability to set in motion. What's going to happen in your business processes and your customer experiences. And so, as you kind of think about that, you're now starting to see your CEO, your CMO, um, your CRO coming back and saying, I need better data. I need information. That's representative of what's happening in my business. I need to be better adaptive to what's going on with my customers. And ultimately that means I need to be smarter and have clearer forecasting into what's about ready to come, not just, you know, one month, two months, three months or a year from now, but in a week or tomorrow. >>And so that's, how is having a trickle down effect to then looking at two other types of roles that are elevating from technical capacity to more business capacity, you have your chief data officer that is shaping the exp the experiences, uh, with data and with insight and reconciling, what type of information is necessary with it within the context of answering these questions and creating a future fit organization that is adaptive and resilient to things that are happening. And you also have a chief digital officer who is participating because they're providing the experience and shaping the information and the way that you're going to interact and execute on those business activities, and either running that autonomously or as part of an assistance for your employees and for your customers. So really to go from not just data aware to data driven, but ultimately to be insight driven, you're seeing way more, um, participation, uh, and leadership at that C-suite level. And just underneath, because that's where the subject matter expertise is coming in to know how to create a data strategy that is tightly connected to your business strategy. >>Right. Thank you. Let's wrap. And I've got a question for both of you, maybe Cindy, you could start and then Michelle bring us home. You know, a lot of customers, they want to understand what's achievable. So it's helpful to paint a picture of a, of a maturity model. Uh, you know, I'd love to go there, but I'm not going to get there anytime soon, but I want to take some baby steps. So when you're performing an analysis on, on insight driven organization, city, what do you see as the major characteristics that define the differences between sort of the, the early, you know, beginners, the sort of fat middle, if you will, and then the more advanced, uh, constituents. >>Yeah, I'm going to build upon, you know, what Michelle was talking about as data as an asset. And I think, you know, also being data where, and, you know, trying to actually become, you know, insight driven, um, companies can also have data and they can have data as a liability. And so when you're data aware, sometimes data can still be a liability to your organization. If you're not making business decisions on the most recent and relevant data, um, you know, you're not going to be insight driven. So you've got to move beyond that, that data awareness, where you're looking at data just from an operational reporting, but data's fundamentally driving the decisions that you make. Um, as a business, you're using data in real time. You're, um, you're, you know, leveraging data to actually help you make and drive those decisions. So when we use the term you're, data-driven, you can't just use the term, you know, tongue in cheek. It actually means that I'm using the recent, the relevant and the accuracy of data to actually make the decisions for me, because we're all advancing upon. We're talking about, you know, artificial intelligence and so forth. Being able to do that, if you're just data where I would not be embracing on leveraging artificial intelligence, because that means I probably haven't embedded data into my processes. It's data could very well still be a liability in your organization. So how do you actually make it an asset? Yeah, I think data >>Where it's like cable ready. So, so Michelle, maybe you could, you could, you could, uh, add to what Cindy just said and maybe add as well, any advice that you have around creating and defining a data strategy. >>So every data strategy has a component of being data aware. This is like building the data museum. How do you capture everything that's available to you? How do you maintain that memory of your business? You know, bringing in data from your applications, your partners, third parties, wherever that information is available, you want to ensure that you're capturing and you're managing and you're maintaining it. And this is really where you're starting to think about the fact that it is an asset. It has value, but you may not necessarily know what that value is. Yet. If you move into a category of data driven, what starts to shift and change there is you're starting to classify label, organize the information in context of how you're making decisions and how you do business. It could start from being more, um, proficient from an analytic purpose. You also might start to introduce some early stages of data science in there. >>So you can do some predictions and some data mining to start to weed out some of those signals. And you might have some simple types of algorithms that you're deploying to do a next next best action for example. And that's what data-driven is really about. You're starting to get value out of it. The data itself is starting to make sense in context of your business, but what you haven't done quite yet, which is what insight driven businesses are, is really starting to take away. Um, the gap between when you see it, know it and then get the most value and really exploit what that insight is at the time when it's right. So in the moment we talk about this in terms of perishable insights, data and insights are ephemeral. And we want to ensure that the way that we're managing that and delivering on that data and insights is in time with our decisions and the highest value outcome we're going to have, that that insight can provide us. >>So are we just introducing it as data-driven organizations where we could see, you know, spreadsheets and PowerPoint presentations and lots of mapping to help make sort of longer strategic decisions, or are those insights coming up and being activated in an automated fashion within our business processes that are either assisting those human decisions at the point when they're needed, or an automated decisions for the types of digital experiences and capabilities that we're driving in our organization. So it's going from, I'm a data hoarder. If I'm data aware to I'm interested in what's happening as a data-driven organization and understanding my data. And then lastly being insight driven is really where light between business, data and insight. There is none it's all coming together for the best outcomes, >>Right? So people are acting on perfect or near perfect information or machines or, or, uh, doing so with a high degree of confidence, great advice and insights. And thank you both for sharing your thoughts with our audience today. It's great to have you. Thank you. Thank you. Okay. Now we're going to go into our industry. Deep dives. There are six industry breakouts, financial services, insurance, manufacturing, retail communications, and public sector. Now each breakout is going to cover two distinct use cases for a total of essentially 12 really detailed segments that each of these is going to be available on demand, but you can scan the calendar on the homepage and navigate to your breakout session for choice of choice or for more information, click on the agenda page and take a look to see which session is the best fit for you. And then dive in, join the chat and feel free to ask questions or contribute your knowledge, opinions, and data. Thanks so much for being part of the community and enjoy the rest of the day.
SUMMARY :
Have you ever wondered how we sequence the human genome, One of the things that, you know, both Cloudera and Claire sensor very and really honestly have a technological advantage over some of the larger organizations. A lot of the data you find or research you find health is usually based on white men. One of the things that we're concerned about in healthcare is that there's bias in treatment already. So you can make the treatments in the long run. Researchers are now able to use these technologies and really take those you know, underserved environments, um, in healthcare. provide the foundation to develop service center applications, sales reports, It's the era of smart but also the condition of those goods. biggest automotive customers are Volkswagen for the NPSA. And the real-time data collection is key, and this is something we cannot achieve in a classical data Finally, a data platform that lets you say yes, and digital business, but you think about it. And as such the way we use insights is also rapidly evolving. the full results they desire. Great to see you as well, Dave, Hey, so I call it the new abnormal, I finally managed to get some bag and to be able to show up dressed appropriately for you today. events, which is our business hybrid cloud, how are you thinking about the hybrid? Everything there, one item you might not have quite hit on Dave and that's hybrid data. What, what do you mean by hybrid data? So how in the heck do you get both the freedom and security You talked about security, the data flows are going to change. in the office and are not, I know our plans, Dave, uh, involve us kind of mint control of payment systems in manufacturing, you know, the pandemic highlighted America's we, uh, you know, at Cloudera I happened to be leading our own digital transformation of that type of work and the financial services industry you pointed out. You've got to ensure that you can see who just touched, perhaps by the humans, perhaps by the machines that may have led to a particular outcome. You bring it into the discussion, the hybrid data, uh, sort of new, I think, you know, for every industry transformation, uh, change in general is And they begin to deploy that on-prem and then they start Uh, w what, what do you want people to leave Well, it's a great question, but, uh, you know, I think it could be summed up in, uh, in two words. Really thank you for your time. You bet Dave pleasure being with you. And before I hand it off to Robin, I just want to say for those of you who follow me at the cube, we've extensively covered the a data first strategy and accelerating the path to value and hybrid environments. And the reason we're talking about speed and why speed Thank you for joining us over the unit. chip company focused on graphics, but as you know, over the last decade, that data exists in different places and the compute needs to follow the data. And that's the kind of success we're looking forward to with all customers. the infrastructure to support all the ideas that the subject matter experts are coming up with in terms And just to give you context, know how the platforms to run them on just kind of the close out. the work they did with you guys and Chev, obviously also. Is it primarily go to market or you do an engineering work? and take advantage of invidious platform to drive better price performance, lower cost, purpose platforms that are, that are running all this ERP and CRM and HCM and you So that regardless of the technique, So the good news, the reason this is important is because when you think about these data intensive workloads, maybe these consumer examples and Rob, how are you thinking about enterprise AI in The opportunity is huge here, but you know, 90% of the cost of AI Maybe you could add something to that. You know, the way we see this at Nvidia, this journey is in three phases or three steps, And you still come home and assemble it, but all the parts are there. uh, you know, garbage in, garbage out. perform at much greater speed and efficiency, you know, and that's allowing us as an industry That is really the value layer that you guys are building out on top of that, And that's what keeps us moving forward. this partnership started, uh, with data analytics, um, as you know, So let's talk a little bit about, you know, you've been in this game So having the data to know where, you know, And I think for companies just getting started in this, the thing to think about is one of It just ha you know, I think with COVID, you know, we were working with, um, a retailer where they had 12,000 the AI winter, and then, you know, there's been a lot of talk about it, but it feels like with the amount the big opportunity is, you know, you can apply AI in areas where some kind of normalcy and predictability, uh, what do you see in that regard? and they'll select an industry to say, you know what, I'm a restaurant business. And it's that that's able to accurately So where do you see things like They've got to move, you know, more involved in, in the data pipeline, if you will, the data process, and really getting them to, you know, kind of unlock the data because they do where you can have a very product mindset to delivering your data, I think is very important data is a product going to sell my data and that's not necessarily what you mean, thinking about products or that are able to agily, you know, think about how can we collect this data, Of data products, and it might be revenue generating, or it might be in the case of, you know, cyber, maybe it reduces my expected So the telematics, you know, um, in order to realize something you know, financial services and insurance, they were some of the early adopters, weren't they? this elevator is going to need maintenance, you know, before a critical accident could happen. So ultimately you can take action. Thanks Dave. Maybe you could talk about your foundational core principles. are the signals that are occurring that are going to help them with decisions, create stronger value And when you work with customers at Cloudera, does, are there any that stand out that perhaps embody um, uh, you know, just getting better insights into what customers need and when do they need it? I mean, where does, where do things like hybrid fit in? whether it be, you know, a public cloud doing it on-prem or a hybrid of the two is typically what we're to how you think about balancing practices and processes while at the same time activity and the way that you can affect that either in, you know, near time or Can I also get intelligence about the data to know that it's actually satisfying guidance as to where customers should start, where, you know, where can we find some of the quick wins a decision at that current point in the process, or are you collecting and technology and the roles they play in creating a data strategy. and I hate to use the phrase almost, but you know, the fuel behind the process, Cool with fuel, but it's like super fuel when you talk about data, cause it's not scarce, ready to come, not just, you know, one month, two months, three months or a year from now, And you also have a chief digital officer who is participating the early, you know, beginners, the sort of fat middle, And I think, you know, also being data where, and, you know, trying to actually become, any advice that you have around creating and defining a data strategy. How do you maintain that memory of your business? Um, the gap between when you see you know, spreadsheets and PowerPoint presentations and lots of mapping to to be available on demand, but you can scan the calendar on the homepage and navigate to your breakout
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Mick Holliston | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Cindy | PERSON | 0.99+ |
William Gibson | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
Michelle | PERSON | 0.99+ |
Arkansas | LOCATION | 0.99+ |
Michelle Goetz | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Atlanta | LOCATION | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Mars | LOCATION | 0.99+ |
Volkswagen | ORGANIZATION | 0.99+ |
Nebraska | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
22 | QUANTITY | 0.99+ |
Mick | PERSON | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
Robin | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
12 | QUANTITY | 0.99+ |
4,000 cabins | QUANTITY | 0.99+ |
10,000 | QUANTITY | 0.99+ |
two words | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
Ikea | ORGANIZATION | 0.99+ |
Eric | PERSON | 0.99+ |
five years | QUANTITY | 0.99+ |
one month | QUANTITY | 0.99+ |
Nick | PERSON | 0.99+ |
100 cards | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Cloudera Transform Innovative Ideas Promo
>>Speed is everything in a hyper competitive climate. The faster we get insights from data and get data products to market. The faster we grow and the more competitive we become, this is Dave Volante from the cutie inviting you to join us on Thursday, August 5th, for cloud areas, industry insights. We'll look at the biggest challenges facing businesses today, especially the need to access and leverage data at an accelerated velocity. You'll hear from industry leaders like Nick Collison, whose cloud era's president, Rob Bearden, the CEO of Cloudera, Michelle Goetz from Forrester. You'll hear from Nvidia and industry experts in insurance, manufacturing, retail, and public sector. Who can address your biggest concerns? Like how do I remove constraints and put data at the core of my business, streaming begins at 9:00 AM Pacific on the Q3 65. You're a leader in global enterprise tech coverage.
SUMMARY :
You'll hear from industry leaders like Nick Collison,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michelle Goetz | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Nick Collison | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Thursday, August 5th | DATE | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
9:00 AM Pacific | DATE | 0.97+ |
today | DATE | 0.89+ |
Forrester | ORGANIZATION | 0.75+ |
Q3 | LOCATION | 0.44+ |
65 | EVENT | 0.42+ |
Brent Compton, Red Hat | theCUBE NYC 2018
>> Live from New York, it's theCUBE, covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello, everyone, welcome back. This is theCUBE live in New York City for theCUBE NYC, #CUBENYC. This is our ninth year covering the big data ecosystem, which has now merged into cloud. All things coming together. It's really about AI, it's about developers, it's about operations, it's about data scientists. I'm John Furrier, my co-host Dave Vellante. Our next guest is Brent Compton, Technical Marketing Director for Storage Business at Red Hat. As you know, we cover Red Hat Summit and great to have the conversation. Open source, DevOps is the theme here. Brent, thanks for joining us, thanks for coming on. >> My pleasure, thank you. >> We've been talking about the role of AI and AI needs data and data needs storage, which is what you do, but if you look at what's going on in the marketplace, kind of an architectural shift. It's harder to find a cloud architect than it is to find diamonds these days. You can't find a good cloud architect. Cloud is driving a lot of the action. Data is a big part of that. What's Red Hat doing in this area and what's emerging for you guys in this data landscape? >> Really, the days of specialists are over. You mentioned it's more difficult to find a cloud architect than find diamonds. What we see is the infrastructure, it's become less about compute as storage and networking. It's the architect that can bring the confluence of those specialties together. One of the things that we see is people bringing their analytics workloads onto the common platforms where they've been running the rest of their enterprise applications. For instance, if they're running a lot of their enterprise applications on AWS, of course, they want to run their analytics workloads in AWS and that's EMRs long since in the history books. Likewise, if they're running a lot of their enterprise applications on OpenStack, it's natural that they want to run a lot of their analytics workloads on the same type of dynamically provisioned infrastructure. Emerging, of course, we just announced on Monday this week with Hortonworks and IBM, if they're running a lot of their enterprise applications on a Kubernetes substrate like OpenShift, they want to run their analytics workloads on that same kind of agile infrastructure. >> Talk about the private cloud impact and hybrid cloud because obviously we just talked to the CEO of Hortonworks. Normally it's about early days, about Hadoop, data legs and then data planes. They had a good vision. They're years into it, but I like what Hortonworks is doing. But he said Kubernetes, on a data show Kubernetes. Kubernetes is a multi-cloud, hybrid cloud concept, containers. This is really enabling a lot of value and you guys have OpenShift which became very successful over the past few years, the growth has been phenomenal. So congratulations, but it's pointing to a bigger trend and that is that the infrastructure software, the platform as a service is becoming the middleware, the glue, if you will, and Kubernetes and containers are facilitating a new architecture for developers and operators. How important is that with you guys, and what's the impact of the customer when they think, okay I'm going to have an agile DevOps environment, workload portability, but do I have to build that out? You mentioned people don't have to necessarily do that anymore. The trend has become on-premise. What's the impact of the customer as they hear Kubernetese and containers and the data conversation? >> You mentioned agile DevOps environment, workload portability so one of the things that customers come to us for is having that same thing, but infrastructure agnostic. They say, I don't want to be locked in. Love AWS, love Azure, but I don't want to be locked into those platforms. I want to have an abstraction layer for my Kubernetese layer that sits on top of those infrastructure platforms. As I bring my workloads, one-by-one, custom DevOps from a lift and shift of legacy apps onto that substrate, I want to have it be independent, private cloud or public cloud and, time permitting, we'll go into more details about what we've seen happening in the private cloud with analytics as well, which is effectively what brought us here today. The pattern that we've discovered with a lot of our large customers who are saying, hey, we're running OpenStack, they're large institutions that for lots of reasons they store a lot of their data on-premises saying, we want to use the utility compute model that OpenStack gives us as well as the shared data context that Ceph gives us. We want to use that same thing for our analytics workload. So effectively some of our large customers taught us this program. >> So they're building infrastructure for analytics essentially. >> That's what it is. >> One of the challenges with that is the data is everywhere. It's all in silos, it's locked in some server somewhere. First of all, am I overstating that problem and how are you seeing customers deal with that? What are some of the challenges that they're having and how are you guys helping? >> Perfect lead in, in fact, one of our large government customers, they recently sent us an unsolicited email after they deployed the first 10 petabytes in a deca petabyte solution. It's OpenStack based as well as Ceph based. Three taglines in their email. The first was releasing the lock on data. The second was releasing the lock on compute. And the third was releasing the lock on innovation. Now, that sounds a bit buzzword-y, but when it comes from a customer to you. >> That came from a customer? Sounds like a marketing department wrote that. >> In the details, as you know, traditional HDFS clusters, traditional Hadoop clusters, sparklers or whatever, HDFS is not shared between clusters. One of our large customers has 50 plus analytics clusters. Their data platforms team employ a maze of scripts to copy data from one cluster to the other. And if you are a scientist or an engineer, you'd say, I'm trying to obtain these types of answers, but I need access to data sets A, B, C, and D, but data sets A and B are only on this cluster. I've got to go contact the data platforms team and have them copy it over and ensure that it's up-to-date and in sync so it's messy. >> It's a nightmare. >> Messy. So that's why the one customer said releasing the lock on data because now it's in a shared. Similar paradigm as AWS with EMR. The data's in a shared context, an S3. You spin up your analytics workloads on AC2. Same paradigm discussion as with OpenStack. Your spinning up your analytics workloads via OpenStack virtualization and their sourcing is shared data context inside of Ceph, S3 compatible Ceph so same architecture. I love his last bit, the one that sounds the most buzzword-y which was releasing lock on innovation. And this individual, English was not this person's first language so love the word. He said, our developers no longer fear experimentation because it's so easy. In minutes they can spin up an analytics cluster with a shared data context, they get the wrong mix of things they shut it down and spin it up again. >> In previous example you used HDFS clusters. There's so many trip wires, right. You can break something. >> It's fragile. >> It's like scripts. You don't want to tinker with that. Developers don't want to get their hand slapped. >> The other thing is also the recognition that innovation comes from data. That's what my takeaway is. The customer saying, okay, now we can innovate because we have access to the data, we can apply intelligence to that data whether it's machine intelligence or analytics, et cetera. >> This the trend in infrastructure. You mentioned the shared context. What other observations and learnings have you guys come to as Red Hat starts to get more customer interactions around analytical infrastructure. Is it an IT problem? You mentioned abstracting the way different infrastructures, and that means multi-cloud's probably setup for you guys in a big way. But what does that mean for a customer? If you had to explain infrastructure analytics, what needs to get done, what does the customer need to do? How do you describe that? >> I love the term that industry uses of multi-tenant workload isolation with shared data context. That's such a concise term to describe what we talk to our customers about. And most of them, that's what they're looking for. They've got their data scientist teams that don't want their workloads mixed in with the long running batch workloads. They say, listen, I'm on deadline here. I've got an hour to get these answers. They're working with Impala. They're working with Presto. They iterate, they don't know exactly the pattern they're looking for. So having to take a long time because their jobs are mixed in with these long MapReduce jobs. They need to be able to spin up infrastructure, workload isolation meaning they have their own space, shared context, they don't want to be placing calls over to the platform team saying, I need data sets C, D, and E. Could you please send them over? I'm on deadline here. That phrase, I think, captures so nicely what customers are really looking to do with their analytics infrastructure. Analytics tools, they'll still do their thing, but the infrastructure underneath analytics delivering this new type of agility is giving that multi-tenant workload isolation with shared data context. >> You know what's funny is we were talking at the kickoff. We were looking back nine years. We've been at this event for nine years now. We made prediction there will be no Red Hat of big data. John, years ago said, unless it's Red Hat. You guys got dragged into this by your customers really is how it came about. >> Customers and partners, of course with your recent guest from Hortonworks, the announcement that Red Hat, Hortonworks, and IBM had on Monday of this week. Dialing up even further taking the agility, okay, OpenStack is great for agility, private cloud, utility based computing and storage with OpenStack and Ceph, great. OpenShift dials up that agility another notch. Of course, we heard from the CEO of Hortonworks how much they love the agility that a Kubernetes based substrate provides their analytics customers. >> That's essentially how you're creating that sort of same-same experience between on-prem and multi-cloud, is that right? >> Yeah, OpenShift is deployed pervasively on AWS, on-premises, on Azure, on GCE. >> It's a multi-cloud world, we see that for sure. Again, the validation was at VMworld. AWS CEO, Andy Jassy announced RDS which is their product on VMware on-premises which they've never done. Amazon's never done any product on-premises. We were speculating it would be a hardware device. We missed that one, but it's a software. But this is the validation, seamless cloud operations on-premise in the cloud really is what people want. They want one standard operating model and they want to abstract away the infrastructure, as you were saying, as the big trend. The question that we have is, okay, go to the next level. From a developer standpoint, what is this modern developer using for tools in the infrastructure? How can they get that agility and spinning up isolated, multi-tenant infrastructure concept all the time? This is the demand we're seeing, that's an evolution. Question for Red Hat is, how does that change your partnership strategy because you mentioned Rob Bearden. They've been hardcore enterprise and you guys are hardcore enterprise. You kind of know the little things that customers want that might not be obvious to people: compliance, certification, a decade of support. How is Red Hat's partnership model changing with this changing landscape, if you will? You mentioned IBM and Hortonworks release this week, but what in general, how does the partnership strategy look for you? >> The more it changes, the more it looks the same. When you go back 20 years ago, what Red Hat has always stood for is any application on any infrastructure. But back in the day it was we had n-thousand of applications that were certified on Red Hat Linux and we ran on anybody's server. >> Box. >> Running on a box, exactly. It's a similar play, just in 2018 in the world of hybrid, multi-cloud architectures. >> Well, you guys have done some serious heavy lifting. Don't hate me for saying this, but you're kind of like the mules of the industry. You do a lot of stuff that nobody either wants to do or knows how to do and it's really paid off. You just look at the ascendancy of the company, it's been amazing. >> Well, multi-cloud is hard. Look at what it takes to do multi-cloud in DevOps. It's not easy and a lot of pretenders will fall out of the way, you guys have done well. What's next for you guys? What's on the horizon? What's happening for you guys this next couple months for Red Hat and technology? Any new announcements coming? What's the vision, what's happening? >> One of the announcements that you saw last week, was Red Hat, Cloudera, and Eurotech as analytics in the data center is great. Increasingly, the world's businesses run on data-driven decisions. That's great, but analytics at the edge for more realtime industrial automation, et cetera. Per the announcements we did with Cloudera and Eurotech about the use of, we haven't even talked about Red Hat's middleware platforms, such as AMQ Streams now based on Kafka, a Kafka distribution, Fuze, an integration master effectively bringing Red Hat technology to the edge of analytics so that you have the ability to do some processing in realtime before back calling all the way back to the data center. That's an area that you'll also see is pushing some analytics to the edge through our partnerships such as announced with Cloudera and Eurotech. >> You guys got the Red Hat Summit coming up next year. theCUBE will be there, as usual. It's great to cover Red Hat. Thanks for coming on theCUBE, Brent. Appreciate it, thanks for spending the time. We're here in New York City live. I'm John Furrier, Dave Vallante, stay with us. All day coverage today and tomorrow in New York City. We'll be right back. (upbeat music)
SUMMARY :
Brought to you by SiliconANGLE Media Open source, DevOps is the theme here. Cloud is driving a lot of the action. One of the things that we see is people and that is that the infrastructure software, the shared data context that Ceph gives us. So they're building infrastructure One of the challenges with that is the data is everywhere. And the third was releasing the lock on innovation. That came from a customer? In the details, as you know, I love his last bit, the one that sounds the most buzzword-y In previous example you used HDFS clusters. You don't want to tinker with that. that innovation comes from data. You mentioned the shared context. I love the term that industry uses of You guys got dragged into this from Hortonworks, the announcement that Yeah, OpenShift is deployed pervasively on AWS, You kind of know the little things that customers want But back in the day it was we had n-thousand of applications in the world of hybrid, multi-cloud architectures. You just look at the ascendancy of the company, What's on the horizon? One of the announcements that you saw last week, You guys got the Red Hat Summit coming up next year.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vallante | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Brent Compton | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Eurotech | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Brent | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
2018 | DATE | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
nine years | QUANTITY | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
last week | DATE | 0.99+ |
first language | QUANTITY | 0.99+ |
Three taglines | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
second | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
third | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
Impala | ORGANIZATION | 0.99+ |
Monday this week | DATE | 0.99+ |
VMworld | ORGANIZATION | 0.98+ |
one cluster | QUANTITY | 0.98+ |
Red Hat Summit | EVENT | 0.98+ |
ninth year | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
OpenStack | TITLE | 0.98+ |
today | DATE | 0.98+ |
NYC | LOCATION | 0.97+ |
20 years ago | DATE | 0.97+ |
Kubernetese | TITLE | 0.97+ |
Kafka | TITLE | 0.97+ |
First | QUANTITY | 0.96+ |
this week | DATE | 0.96+ |
Red Hat | TITLE | 0.95+ |
English | OTHER | 0.95+ |
Monday of this week | DATE | 0.94+ |
OpenShift | TITLE | 0.94+ |
one standard | QUANTITY | 0.94+ |
50 plus analytics clusters | QUANTITY | 0.93+ |
Ceph | TITLE | 0.92+ |
Azure | TITLE | 0.92+ |
GCE | TITLE | 0.9+ |
Presto | ORGANIZATION | 0.9+ |
agile DevOps | TITLE | 0.89+ |
theCUBE | ORGANIZATION | 0.88+ |
DevOps | TITLE | 0.87+ |
John Kreisa, Hortonworks | Dataworks Summit EU 2018
>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)
SUMMARY :
Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alan | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Kreisa | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
John | PERSON | 0.99+ |
Asia | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Berlin | LOCATION | 0.99+ |
yesterday | DATE | 0.99+ |
Africa | LOCATION | 0.99+ |
South America | LOCATION | 0.99+ |
SiliconAngle Media | ORGANIZATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
1,250 | QUANTITY | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
1,300 | QUANTITY | 0.99+ |
Berlin, Germany | LOCATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
six and a half years | QUANTITY | 0.99+ |
Japan | LOCATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
Asian | LOCATION | 0.99+ |
second | QUANTITY | 0.98+ |
over 2,300 partners | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
two-thirds | QUANTITY | 0.98+ |
19 different countries | QUANTITY | 0.98+ |
Dataworks Summit | EVENT | 0.98+ |
more than 51 countries | QUANTITY | 0.98+ |
Hadoop 3.0 | TITLE | 0.98+ |
first | QUANTITY | 0.98+ |
James | PERSON | 0.98+ |
Data Steward Studio | ORGANIZATION | 0.98+ |
Dataworks Summit EU 2018 | EVENT | 0.98+ |
Dataworks Summit 2018 | EVENT | 0.97+ |
Cloudera | ORGANIZATION | 0.97+ |
MapR | ORGANIZATION | 0.96+ |
GDPR | TITLE | 0.96+ |
DataPlane Services | ORGANIZATION | 0.96+ |
Singapore | LOCATION | 0.96+ |
year six | QUANTITY | 0.95+ |
2018 | EVENT | 0.95+ |
Wikibon SiliconAngle Media | ORGANIZATION | 0.94+ |
India | LOCATION | 0.94+ |
Hadoop | ORGANIZATION | 0.94+ |
APAC | ORGANIZATION | 0.93+ |
Big Data Analytics | ORGANIZATION | 0.93+ |
3.1 | TITLE | 0.93+ |
Wall Street Journal | TITLE | 0.93+ |
one | QUANTITY | 0.93+ |
Apache | ORGANIZATION | 0.92+ |
Wikibon | ORGANIZATION | 0.92+ |
NiFi | TITLE | 0.92+ |
Scott Gnau, Hortonworks | Dataworks Summit EU 2018
(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)
SUMMARY :
Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Europe | LOCATION | 0.99+ |
Scott | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Berlin | LOCATION | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Last year | DATE | 0.99+ |
May 25th | DATE | 0.99+ |
five weeks | QUANTITY | 0.99+ |
Mandy Chessell | PERSON | 0.99+ |
GDPR | TITLE | 0.99+ |
Munich | LOCATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
second service | QUANTITY | 0.99+ |
30 years | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
first | QUANTITY | 0.99+ |
Berlin, Germany | LOCATION | 0.99+ |
second | QUANTITY | 0.99+ |
DataPlane | ORGANIZATION | 0.99+ |
sixth year | QUANTITY | 0.98+ |
three times | QUANTITY | 0.98+ |
first interviewee | QUANTITY | 0.98+ |
Dataworks Summit | EVENT | 0.98+ |
one | QUANTITY | 0.97+ |
this morning | DATE | 0.97+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
MapReduce | ORGANIZATION | 0.96+ |
Hadoop | TITLE | 0.96+ |
Hadoop | ORGANIZATION | 0.96+ |
one time | QUANTITY | 0.96+ |
35 | QUANTITY | 0.96+ |
single pane | QUANTITY | 0.96+ |
NiFi | ORGANIZATION | 0.96+ |
today | DATE | 0.94+ |
DataWorks Summit Europe 2018 | EVENT | 0.93+ |
Data Steward Studio | ORGANIZATION | 0.93+ |
Dataworks Summit EU 2018 | EVENT | 0.92+ |
about seven years ago | DATE | 0.91+ |
a year or | DATE | 0.88+ |
years | DATE | 0.87+ |
Storm | ORGANIZATION | 0.87+ |
Wikibon | ORGANIZATION | 0.86+ |
Apache NiFi | ORGANIZATION | 0.85+ |
The Cube | PERSON | 0.84+ |
North American | OTHER | 0.84+ |
DataWorks | ORGANIZATION | 0.84+ |
Data Plane | ORGANIZATION | 0.76+ |
Data Steward Studio | TITLE | 0.75+ |
Kafka | ORGANIZATION | 0.75+ |
Rob Thomas, IBM | Big Data NYC 2017
>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)
SUMMARY :
Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
O'Reilly Media | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
10 products | QUANTITY | 0.99+ |
O'Reilly | ORGANIZATION | 0.99+ |
two days | QUANTITY | 0.99+ |
first book | QUANTITY | 0.99+ |
two books | QUANTITY | 0.99+ |
a day | QUANTITY | 0.99+ |
Rob | PERSON | 0.99+ |
Today | DATE | 0.99+ |
yesterday | DATE | 0.99+ |
New York City | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
San Francisco Bay | LOCATION | 0.99+ |
five products | QUANTITY | 0.99+ |
second book | QUANTITY | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
this week | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
first one | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
eight years | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
SQL | TITLE | 0.99+ |
Common SQL | TITLE | 0.98+ |
datascience.ibm.com | OTHER | 0.98+ |
eighth year | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
one issue | QUANTITY | 0.97+ |
Hortonworks Dataplane | ORGANIZATION | 0.97+ |
three platforms | QUANTITY | 0.97+ |
Strata Hadoop | TITLE | 0.97+ |
today | DATE | 0.97+ |
The Data Revolution | TITLE | 0.97+ |
Cloudera | ORGANIZATION | 0.97+ |
second | QUANTITY | 0.96+ |
NYC | LOCATION | 0.96+ |
two big problems | QUANTITY | 0.96+ |
Analytics University | ORGANIZATION | 0.96+ |
step two | QUANTITY | 0.96+ |
one way | QUANTITY | 0.96+ |
November first | DATE | 0.96+ |
Big Data Revolution | TITLE | 0.95+ |
one | QUANTITY | 0.94+ |
Every Company is a Tech Company | TITLE | 0.94+ |
CUBE | ORGANIZATION | 0.93+ |
this year | DATE | 0.93+ |
two different concepts | QUANTITY | 0.92+ |
one system | QUANTITY | 0.92+ |
step one | QUANTITY | 0.92+ |
Arun Murthy, Hortonworks | BigData NYC 2017
>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)
SUMMARY :
the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Asia | LOCATION | 0.99+ |
France | LOCATION | 0.99+ |
Arun | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Germany | LOCATION | 0.99+ |
Arun Murthy | PERSON | 0.99+ |
Japan | LOCATION | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
Tokyo | LOCATION | 0.99+ |
2014 | DATE | 0.99+ |
California | LOCATION | 0.99+ |
12 | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Frank Quattrone | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Onyara | ORGANIZATION | 0.99+ |
$64 million | QUANTITY | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
San Jose | LOCATION | 0.99+ |
O'Reilly Media | ORGANIZATION | 0.99+ |
each | QUANTITY | 0.99+ |
Morgan Stanley | ORGANIZATION | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
fifth year | QUANTITY | 0.99+ |
Atlas | ORGANIZATION | 0.99+ |
20 | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
three days | QUANTITY | 0.99+ |
eighth year | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
six years | QUANTITY | 0.99+ |
Equifax | ORGANIZATION | 0.99+ |
next year | DATE | 0.99+ |
NYC | LOCATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
second part | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Ranger | ORGANIZATION | 0.99+ |
50 | QUANTITY | 0.98+ |
30 | QUANTITY | 0.98+ |
Yahoo | ORGANIZATION | 0.98+ |
Strata Conference | EVENT | 0.98+ |
DataWorks Summit | EVENT | 0.98+ |
Hadoop | TITLE | 0.98+ |
'15 | DATE | 0.97+ |
20 years ago | DATE | 0.97+ |
Forrester | ORGANIZATION | 0.97+ |
GDPR | TITLE | 0.97+ |
second one | QUANTITY | 0.97+ |
one data center | QUANTITY | 0.97+ |
Github | ORGANIZATION | 0.96+ |
about 12 years | QUANTITY | 0.96+ |
three ways | QUANTITY | 0.96+ |
Manhattan | LOCATION | 0.95+ |
day two | QUANTITY | 0.95+ |
this week | DATE | 0.95+ |
NiFi | ORGANIZATION | 0.94+ |
Dataplane | ORGANIZATION | 0.94+ |
BigData | ORGANIZATION | 0.94+ |
Hadoop World | EVENT | 0.93+ |
billions | QUANTITY | 0.93+ |
Day One Kickoff | BigData NYC 2017
(busy music) >> Announcer: Live from Midtown Manhattan, it's the Cube, covering Big Data New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello, and welcome to the special Cube presentation here in New York City for Big Data NYC, in conjunction with all the activity going on with Strata, Hadoop, Strata Data Conference right around the corner. This is the Cube's special annual event in New York City where we highlight all the trends, technology experts, thought leaders, entrepreneurs here inside the Cube. We have our three days of wall to wall coverage, evening event on Wednesday. I'm John Furrier, the co-host of the Cube, with Jim Kobielus, and Peter Burris will be here all week as well. Kicking off day one, Jim, the monster week of Big Data NYC, which now has turned into, essentially, the big data industry is a huge industry. But now, subsumed within a larger industry of AI, IoT, security. A lot of things have just sucked up the big data world that used to be the Hadoop world, and it just kept on disrupting, and creative disruption of the old guard data warehouse market, which now, looks pale in comparison to the disruption going on right now. >> The data warehouse market is very much vibrant and alive, as is the big data market continuing to innovate. But the innovations, John, have moved up the stack to artificial intelligence and deep learning, as you've indicated, driving more of the Edge applications in the new generation of mobile and smart appliances and things that are coming along like smart, self-driving vehicles and so forth. What we see is data professionals and developers are moving towards new frameworks, like TensorFlow and so forth, for development of the truly disruptive applications. But big data is the foundation. >> I mean, the developers are the key, obviously, open source is growing at an enormous rate. We just had the Linux Foundation, we now have the Open Source Summit, they have kind of rebranded that. They're going to see explosion from code from 64 million lines of code to billions of lines of code, exponential growth. But the bigger picture is that it's not just developers, it's the enterprises now who want hybrid cloud, they want cloud technology. I want to get your reaction to a couple of different threads. One is the notion of community based software, which is open source, extending into the enterprise. We're seeing things like blockchain is hot right now, security, two emerging areas that are overlapping in with big data. You obviously have classic data market, and then you've got AI. All these things kind of come in together, kind of just really putting at the center of all that, this core industry around community and software AI, particular. It's not just about machine learning anymore and data, it's a bigger picture. >> Yeah, in terms of a community, development with open source, much of what we see in the AI arena, for example, with the up and coming, they're all open source tools. There's TensorFlow, there's Cafe, there's Theano and so forth. What we're seeing is not just the frameworks for developing AI that are important, but the entire ecosystem of community based development of capabilities to automate the acquisition of training data, which is so critically important for tuning AI, for its designated purpose, be it doing predictions and abstractions. DevOps, what are coming into being are DevOps frameworks to span the entire life cycle of the creation and the training and deployment and iteration of AI. What we're going to see is, like at the last Spark Summit, there was a very interesting discussion from a Stanford researcher, new open source tools that they're developing out in, actually, in Berkeley, I understand, for, related to development of training data in a more automated fashion for these new challenges. The communities are evolving up the stack to address these requirements with fairly bleeding edge capabilities that will come in the next few years into the mainstream. >> I had a chat with a big time CTO last night, he worked at some of the big web scale company, I won't say the name, give it away. But basically, he asked me a question about IoT, how real is it, and obviously, it's hyped up big time, though. But the issue in all this new markets like IoT and AI is the role of security, because a lot of enterprises are looking at the IoT, certainly in the industrial side has the most relevant low hanging fruit, but at the end of the day, the data modeling, as you're pointing out, becomes a critical thing. Connecting IoT devices to, say, an IP network sounds trivial in concept, but at the end of the day, the surface area for security is oak expose, that's causing people to stop what they're doing, not deploying it as fast. You're seeing kind of like people retrenching and replatforming at the core data centers, and then leveraging a lot of cloud, which is why Azure is hot, Microsoft Ignite Event is pretty hot this week. Role of cloud, role of data in IoT. Is IoT kind of stalled in your mind? Or is it bloating? >> I wouldn't say it's stalled or that it's bloating, but IoT is definitely coming along as the new development focus. For the more disruptive applications that can derive more intelligence directly to the end points that can take varying degrees of automated action to achieve results, but also to very much drive decision support in real time to people on their mobiles or in whatever. What I'm getting at is that IoT is definitely a reality in the real world in terms of our lives. It's definitely a reality in terms of the index generation of data applications. But there's a lot of the back end in terms of readying algorithms and in training data for deployment of really high quality IoT applications, Edge applications, that hasn't come together yet in any coherent practice. >> It's emerging, it's emerging. >> It's emerging. >> It's a lot more work to do. OK, we're going to kick off day one, we've got some great guests, we see Rob Bearden in the house, Rob Thomas from IBM. >> Rob Bearden from Hortonworks. >> Rob Bearden from Hortonworks, and Rob Thomas from IBM. I want to bring up, Rob wrote a book just recently. He wrote Big Data Revolution, but he also wrote a new book called, Every Company is a Tech Company. But he mentions, he kind of teases out this concept of a renaissance, so I want to get your thoughts on this. If you look at Strata, Hadoop, Strata Data, the O'Reilly Conference, which has turned into like a marketing machine, right. A lot of hype there. But as the community model grows up, you're starting to see a renaissance of real creative developers, you're starting to see, not just open source, pure, full stack developers doing all the heavy lifting, but real creative competition, in a renaissance, that's really the key. You're seeing a lot more developer action, tons outside of the, what was classically called the data space. The role of data and how it relates to the developer phenomenon that's going on right now. >> Yeah, it's the maker culture. Rob, in fact, about a year or more ago, IBM, at one of their events, they held a very maker oriented event, I think they called it Datapalooza at one point. What it's looking at, what's going on is it's more than just classic software developers are coming to the fore. When you're looking at IoT or Edge applications, it's hardware developers, it's UX developers, it's developers and designers who are trying to change and drive data driven applications into changing the very fabric of how things are done in the real world. What Peter Burris, we had a wiki about him called Programming in the Real World. What that all involves is there's a new set of skill sets that are coming together to develop these applications. It's well beyond just simply software development, it's well beyond simply data scientists. Maker culture. >> Programming in the real world is a great concept, because you need real time, which comes back down to this. I'm looking for this week from the guests we talked to, what their view is of the data market right now. Because if you want to get real time, you've got to move from that batch world to the real time world. I'm not saying batch is over, you've still got to store data, and that's growing at an exponential rate as well. But real time data, how do you use data in real time, how do the modelings work, how do you scale that. How do you take a DevOps culture to the data world is what I'm looking for. What are you looking for this week? >> What I'm looking for this week, I'm looking for DevOps solutions or platforms or environments for teams of data scientists who are building and training and deploying and evaluating, iterating deep learning and machine learning and natural language processing applications in a continuous release pipeline, and productionizing them. At Wikibon, we are going deeper in that whole notion of DevOps for data science. I mean, IBM's called it inside ops, others call it data ops. What we're seeing across the board is that more and more of our customers are focusing on how do we bring it all together, so the maker culture. >> Operationalizing it. >> Operationalizing it, so that the maker cultures that they have inside their value chain can come together and there's a standard pattern workflow of putting this stuff out and productionizing it, AI productionized in the real world. >> Moving in from the proof of concept notion to actually just getting things done, putting it out in the network, and then bringing it to the masses with operational support. >> Right, like the good folks at IBM with Watson data platform, on some levels, is a DevOPs for data science platform, but it's a collaborative environment. That's what I'm looking to see, and there's a lot of other solution providers who are going down that road. >> I mean, to me, if people have the community traction, that is the new benchmark, in my opinion. You heard it here on the Cube. Community continues to scale, you can start seeing it moving out of open source, you're seeing things like blockchain, you're seeing a decentralized Internet now happening everywhere, not just distributed but decentralized. When you have decentralization, community and software really shine. It's the Cube here in New York City all week. Stay with us for wall to wall coverage through Thursday here in New York City for Big Data NYC, in conjunction with Strata Data, this is the Cube, we'll be back with more coverage after this short break. (busy music) (serious electronic music) (peaceful music) >> Hi, I'm John Furrier, the Co-founder of SiliconANGLE Media, and Co-host of the Cube. I've been in the tech business since I was 19, first programming on mini computers in a large enterprise, and then worked at IBM and Hewlett Packard, a total of nine years in the enterprise, various jobs from programming, training, consulting, and ultimately, as an executive sales person, and then started my first company in 1997, and moved to Silicon Valley in 1999. I've been here ever since. I've always loved technology, and I love covering emerging technology. I was trained as a software developer and love business. I love the impact of software and technology to business. To me, creating technology that starts a company and creates value and jobs is probably one of the most rewarding things I've ever been involved in. I bring that energy to the Cube, because the Cube is where all the ideas are, and where the experts are, where the people are. I think what's most exciting about the Cube is that we get to talk to people who are making things happen, entrepreneurs, CEO of companies, venture capitalists, people who are really, on a day in and day out basis, building great companies. In the technology business, there's just not a lot real time live TV coverage, and the Cube is a non-linear TV operation. We do everything that the TV guys on cable don't do. We do longer interviews, we ask tougher questions. We ask, sometimes, some light questions. We talk about the person and what they feel about. It's not prompted and scripted, it's a conversation, it's authentic. For shows that have the Cube coverage, it makes the show buzz, it creates excitement. More importantly, it creates great content, great digital assets that can be shared instantaneously to the world. Over 31 million people have viewed the Cube, and that is the result of great content, great conversations. I'm so proud to be part of the Cube with a great team. Hi, I'm John Furrier, thanks for watching the Cube. >> Announcer: Coming up on the Cube, Tekan Sundar, CTO of Wine Disco. Live Cube coverage from Big Data NYC 2017 continues in a moment. >> Announcer: Coming up on the Cube, Donna Prlich, Chief Product Officer at Pentaho. Live Cube coverage from Big Data New York City 2017 continues in a moment. >> Announcer: Coming up on the Cube, Amit Walia, Executive Vice President and Chief Product Officer at Informatica. Live Cube coverage from Big Data New York City continues in a moment. >> Announcer: Coming up on the Cube, Prakash Nodili, Co-founder and CEO of Pexif. Live Cube coverage from Big Data New York City continues in a moment. (serious electronic music)
SUMMARY :
it's the Cube, covering Big Data New York City 2017, and creative disruption of the old guard as is the big data market continuing to innovate. kind of just really putting at the center of all that, and the training and deployment and iteration of AI. and replatforming at the core data centers, in the real world in terms of our lives. It's a lot more work to do. in a renaissance, that's really the key. in the real world. Programming in the real world is a great concept, so the maker culture. Operationalizing it, so that the maker cultures Moving in from the proof of concept notion Right, like the good folks at IBM that is the new benchmark, in my opinion. and that is the result of great content, continues in a moment. continues in a moment. continues in a moment. Prakash Nodili, Co-founder and CEO of Pexif.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Donna Prlich | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Amit Walia | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Prakash Nodili | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
1997 | DATE | 0.99+ |
Berkeley | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
1999 | DATE | 0.99+ |
Hewlett Packard | ORGANIZATION | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Thursday | DATE | 0.99+ |
New York City | LOCATION | 0.99+ |
John | PERSON | 0.99+ |
nine years | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Wednesday | DATE | 0.99+ |
Rob | PERSON | 0.99+ |
Pexif | ORGANIZATION | 0.99+ |
Tekan Sundar | PERSON | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
first company | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
three days | QUANTITY | 0.99+ |
Wikibon | ORGANIZATION | 0.99+ |
Datapalooza | EVENT | 0.99+ |
64 million lines | QUANTITY | 0.98+ |
NYC | LOCATION | 0.98+ |
Midtown Manhattan | LOCATION | 0.98+ |
Big Data | ORGANIZATION | 0.98+ |
19 | QUANTITY | 0.98+ |
this week | DATE | 0.97+ |
Over 31 million people | QUANTITY | 0.97+ |
Spark Summit | EVENT | 0.97+ |
last night | DATE | 0.97+ |
Open Source Summit | EVENT | 0.97+ |
Strata | EVENT | 0.96+ |
One | QUANTITY | 0.96+ |
Programming in the Real World | TITLE | 0.96+ |
Big Data | EVENT | 0.96+ |
Informatica | ORGANIZATION | 0.96+ |
day one | QUANTITY | 0.96+ |
Strata Data | ORGANIZATION | 0.95+ |
two emerging areas | QUANTITY | 0.95+ |
billions of lines | QUANTITY | 0.93+ |
Microsoft | ORGANIZATION | 0.93+ |
TensorFlow | TITLE | 0.92+ |
Strata Data Conference | EVENT | 0.92+ |
Day One | QUANTITY | 0.92+ |
Live Cube | COMMERCIAL_ITEM | 0.92+ |
Cube | ORGANIZATION | 0.91+ |
Every Company is a Tech Company | TITLE | 0.9+ |
Azure | TITLE | 0.9+ |
about a year or more ago | DATE | 0.9+ |
Cube | COMMERCIAL_ITEM | 0.9+ |
2017 | EVENT | 0.89+ |
Wine Disco | ORGANIZATION | 0.89+ |
Big Data Revolution | TITLE | 0.88+ |
Strata | ORGANIZATION | 0.88+ |
Theano | TITLE | 0.88+ |
Watson | ORGANIZATION | 0.85+ |
DevOps | TITLE | 0.84+ |
Ignite Event | EVENT | 0.84+ |
Jagane Sundar, WANdisco | BigData NYC 2017
>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone here live in New York City. This is theCUBE special presentation of our annual event with theCUBE and Wikibon Research called BigData NYC, it's our own event that we have every year, celebrating what's going on in the big data world now. It's evolving to all data, cloud applications, AI, you name it, it's happening. In the enterprise, the impact is huge for developers, the impact is huge. I'm John Furrier, cohost of the theCUBE, with Peter Burris, Head of Research, SiliconANGLE Media and General Manager of Wikibon Research. Our next guest is Jagane Sundar, who's the CTO of WANdisco, Cube alumni, great to see you again as usual here on theCUBE. >> Thank you John, thank you Peter, it's great to be back on theCUBE. >> So we've been talking the big data for many years, certainly with you guys, and it's been a great evolution. I don't want to get into the whole backstory and history, we covered that before, but right now is a really, really important time, we see you know the hurricanes come through, we see the floods in Texas, we've seen Florida, and Puerto Rico now on the main conversation. You're seeing it, you're seeing disasters happen. Disaster recovery's been the low hanging fruit for you guys, and we talked about this when New York City got flooded years and years ago. This is a huge issue for IT, because they have to have disaster recovery. But now it's moving more beyond just disaster recovery. It's cloud. What's the update from WANdisco? You guys have a unique perspective on this. >> Yes, absolutely. So we have capabilities to replicate between the cloud and Hadoop multi data centers across geos, so disasters are not a problem for us. And we have some unique technologies we use. One of the things we do is we can replicate in an active-active mode between different cloud vendors, between cloud and on-prem Hadoop, and we are the only game in town. Nobody else can do that. >> So okay let me just stop right there. When you say the only game in town I got a little skeptic here. Are you saying that nobody does active-active replication at all? >> That is exactly what I'm saying. We had some wonderful announcements from Hortonworks, they have a great product called the Dataplane. But if you dig deep, you'll find that it's actually an active-passive architecture, because to do active-active, you need this capability called the Paxos algorithm for resolving conflict. That's a very hard algorithm to implement. We have over 10 years' experience in that. That's what gives us our ability to do this active-active replication, between clouds, between on-prem and cloud. >> All right so just to take that a step further, I know we're having a CTO conversation, but the classic cliche is skate to where the puck is going to be. So you kind of didn't just decide one morning you're going to be the active-active for cloud. You kind of backed into this. You know the world spun in your direction, the puck came to you guys. Is that a fair statement? >> That is a very fair statement. We've always known there's tremendous value in this technology we own, and with the global infrastructure trends, we knew that this was coming. It wasn't called the cloud when we started out, but that's exactly what it is now, and we're benefiting from it. >> And the cloud is just a data center, it's just, you don't own it. (mumbles) Peter, what's your reaction to this? Because when he says only game in town, implies some scarcity. >> Well, WANdisco has a patent, and it actually is very interesting technology, if I can summarize very quickly. You do continuous replication based on writes that are performed against the database, so that you can have two writers and two separate databases and you guarantee that they will be synchronized at some point in time because you guarantee that the writing of the logs and the messaging to both locations >> Absolutely. >> in order, which is a big issue. You guys put a stamp on the stuff, and it actually writes to the different locations with order guaranteed, and that's not the way most replication software works. >> Yes, that's exactly right. That's very hard to do, and that's the only way for you to allow your clients in different data centers to write to the same data store, whether it's a database, a Hadoop folder, whether it's a bucket in a cloud object store, it doesn't matter. The core fact remains, the Paxos algorithm is the only way for you to do active-active replication, and ours is the only Paxos implementation that can work over the >> John: And that's patented by you guys? >> Yes, it's patented. >> And so someone to replicate that, they'd have to essentially reverse engineer and have a little twist on it to not get around the patents. Are you licensing the technology or are you guys hoarding it for yourselves? >> We have different ways of engaging with partners. We are very reasonable with that, and we work with several powerful partners >> So you partner with the technology. >> Yes. >> But the key thing, John, in answer to your question is that it's unassailable. I mean there's no argument, that is, companies move more towards a digital way of doing things, largely driven by what customers want, your data becomes more of an asset. As you data becomes more of an asset, you make money by using that data in more places, more applications and more times. That is possible with data, but the problem you end up with consistency issues, and for certain applications, it's not an issue, you're basically writing, or if you're basically reading data it's not an issue. But the minute that you're trying to write on behalf of a particular business event or a particular value proposition, then now you have a challenge, you are limited in how you can do it unless you have this kind of a technology. And so this notion of continuous replication in a world that's going to become increasingly dependent upon data, data that is increasingly distributed, data that you want to ensure has common governance and policy in place, technologies like WANdisco provides are going to be increasingly important to the overall way that a business organizes itself, institutes its work and makes sure it takes care of its data assets. >> Okay, so my next question then, thanks for the clarification, it's good input there and thanks for summarizing it like that, 'cause I couldn't have done that. But when we last talked, I always was enamored by the fact that you guys have the data center replication thing down. I always saw that as a great thing for you guys. Okay, I get that, that's an on-premise situation, you have active-active, good for disaster recovery, lot of use cases, people should be beating down your door 'cause you have a better mousetrap, I get that. Now how does that translate to the cloud? So take me through why the cloud now fits nicely with that same paradigm. >> So, I mean, these are industry trends, right. What we've found is that the cloud object stores are very, very cost effective and efficient, so customers are moving towards that. They're using their Hadoop applications but on cloud object stores. Now it's trivial for us to add plugins that enable us to replicate between a cloud object store on one side, and a Hadoop on the other side. It could also be another cloud object store from a different cloud provider on the other side. Once you have that capability, now customers are freed from lock-in from either a cloud vendor or a Hadoop vendor, and they love that, they're looking at it as another way to leverage their data assets. And we enable them to do that without fear of lock-in from any of these vendors. >> So on the cloud side, the regions have always been a big thing. So we've heard Amazon have a region down here, and there was fix it. We saw at VMworld push their VMware solution to only one western region. What's the geo landscape look like in the cloud? Does that relate to anything in your tech? >> So yes, it does relate, and one of the things that people forget is that when you create an Amazon S3 bucket, for example, you specify a region. Well, but this is the cloud, isn't it worldwide? Turns out that object store actually resides in one region, and you can use some shaky technologies like cross-region replication to eventually get the data to the other region. >> Peter: Which just boosts the prices you pay. >> Yes, not just boost the price. >> Well they're trying to save price but then they're exposed on reliability. >> Reliability, exactly. You don't know when the data's going to be there, there are no guarantees. What we offer is, take your cloud storage, but we'll guarantee that we can replicate it in a synchronous fashion to another region. Could be the same provider, could be another provider. That gives tremendous benefits to the customers. >> So you actually have a guarantee when you go to customers, say with an SLA guarantee? Do you back it up with like money back, what's the guarantee? >> So the guarantees are, you know we are willing to back it up with contracts and such like, and our customers put us through rigorous testing procedures, naturally. But we stand up to every one of those. We can scale and maintain the consistency guarantees that they need for modern businesses. >> Okay, so take me through the benefits. Who wants this? Because you can almost get kind of sucked into the complexities of it, and the nuances of cloud and everything as Peter laid out, it's pretty complex even as he simplified it. Who buys this? (laughs) I mean, who's the guy, is it the IT department, is it the ops guy, is it the facilities, who... >> So we sell to the IT departments, and they absolutely love the technology. But to go back to your initial statement, we have all these disasters happening, you know, hopefully people are all doing reasonably okay at the end of these horrible disasters, but if you're an enterprise of any size, it doesn't have to be a big enterprise, you cannot go back to your users or customers and say that because of a hurricane you cannot have access to your data. That's sometimes legally not allowed, and other times it's just suicide for a business >> And HPE in Houston, it's a huge plant down there. >> Jagane: Indeed. >> They got hit hard. >> Yep, in those sort of circumstances, you want to make sure that your data is available in multiple data centers spread throughout the world, and we give you that capability. >> Okay, what are some of the successes? Let's talk through now, obviously you've got the technology, I get that. Where's the stakes in the ground? Who's adopting it? I know you do a lot of biz dev deals. I don't know if they're actually OEM-type deals, or they're just licensing deals. Take us through to where your successes are with this technology. >> So, biz dev wise, we have a mix of OEM deals and licenses and co-selling agreements. The strong ones are all OEMs, of course. We have great partnerships with IBM, Amazon, Microsoft, just wonderful partnerships. The actual end customers, we started off selling mostly to the financial industry because they have a legal mandate, so they were the first to look into this sort of a thing. But now we've expanded into automobile companies. A lot of the auto companies are generating vast amounts of data from their cars, and you can't push all that data into a single data center, that's just not reasonable. You want to push that data into a single data store that's distributed across the world in just wherever the car is closest to. We offer that capability that nobody else can, so that we've got big auto manufacturers signed up, we've got big retailers signed up for exactly the same capability. You cannot imagine ingesting all that data into a single location. You want this replicated across, you want it available no matter what happens to any single region or a data center. So we've got tremendous success in retail, banking, and a lot of this is through partnerships again. >> Well congratulations, I got to ask, you know, what's new with you guys? Obviously you have success with the active-active. We'll dig into the Hortonworks things to check your comment around them not having it, so we'll certainly look with the Dataplane, which we like. We interviewed Rob Bearden. Love the announcement, but they don't have the active-active, we're going to document that, and get that on the record. But you guys are doing well. What's new here, what's in New York, what are some of your wins, can you just give a quick update on what's going on at WANdisco? >> Okay, so quick recap, we love the Hortonworks Dataplane as well. We think that we can build value into that ecosystem by building a plugin for them. And we love the whole technology. I have wonderful friends there as well. As for our own company, we see all of our, a lot of our business coming from cloud and hybrid environments. It's just the reality of the situation. You had, you know, 20 years ago, you had NFS, which was the great appender of all storage, but turned out to be very expensive, and you had 10 years, seven years ago you had HDFS come along, and that appended the cost model of NFS and SANs, which those industries were still working their way through. And now we have cloud object stores, which have appended the HDFS model, it's much more cost-efficient to operate using cloud object stores. So we will be there, we have replication products for that. >> John: And you're in the major clouds, you in Azure? >> Yes, we are in Azure. >> Google? >> Jagane: Yes, absolutely. >> AWS? >> AWS, of course. >> Oracle? >> Oracle, of course. >> So you got all the top four companies. >> We're in all of them. >> All right, so here's the next question is, >> And you're also in IBM stuff too. >> Yes, we're built tightly into IBM >> So you've got a pretty strong legacy >> And a monopoly. >> On the mainframe. >> Like the fiber channel of replication. (John and Jagane laugh) That was a bad analogy. I mean it's like... Well, I mean fiber channel has only limited suppliers 'cause they have unique technology, it was highly important. >> But the basic proposition is look, any customer that wants to ensure that a particular data source is going to be available in a distributed way, and you're going to have some degree of consistency, is going to look at this as an option. >> Yes. >> Well you guys certainly had a great team under your leadership, it's got great tech. The final question I have for you here is, you know, we've had many conversations about the industry, we like to pontificate, I certainly like to speculate, but now we have eight years of history now in the big data world, we look back, you know, we're doing our own event in New York City, you know, thanks to great support from you guys and other great friends in the community. Appreciate everyone out there supporting theCUBE, that's awesome. But the world's changed. So I got to ask you, you're a student of the industry, I know that and knowing you personally. What's been the success formula that keeps the winners around today, and what do people need to do going forward? 'Cause we've seen the train wreck, we've seen the dead bodies in the industry, we've kind of seen what's happened, there've been some survivors. Why did the current list of characters and companies survive, and what's the winning formula in your opinion to stay relevant as big data grows in a huge way from IoT to AI cloud and everything in between? >> I'll quote Stephen Hawking in this. Intelligence is the capability to adapt to changes. That's what keeps industries, that's what keeps companies, that what keeps executives around. If you can adapt to change, if you can see things coming, and adapt your core values, your core technology to that, you can offer customers a value proposition that's going to last a long time. >> And in a big data space, what is that adaptive key focus, what should they be focused on? >> I think at this point, it's extracting information from this volume of data, whether you use machine learning in the modern days, or whether it was simple hive queries, that's the value proposition, and making sure the data's available everywhere so you can do that processing on it, that remains the strength. >> So the whole concept of digital business suggests that increasingly we're going to see our assets rendered in some form as data. >> Yes. >> And we want to be able to ensure that that data is able to be where it needs to be when it needs to be there for any number of reasons. It's a very, very interesting world we're entering into. >> Peter, I think you have a good grasp on this, and I love the narrative of programming the world in real time. What's the phrase you use? It's real time but it's programming the world... Programming the real world. >> Yeah, programming the real world. >> That's a huge, that means something completely, it's not a tech, it's a not a speed or feed. >> Well the way we think about it, is that we look at IoT as a big information transducer, where information's in one form, and then you turn it into another form to do different kinds of work. And that big data's a crucial feature in how you take data from one form and turn it into another form so that it can perform work. But then you have to be able to turn that around and have it perform work back in the real world. There's a lot of new development, a lot of new technology that's coming on to help us do that. But any way you look at it, we're going to have to move data with some degree of consistency, we're still going to have to worry about making sure that if our policy says that that action needs to take place there, and that action needs to take place there, that it actually happens the way we want it to, and that's going to require a whole raft of new technologies. We're just at the very beginning of this. >> And active-active, things like active-active in what you're talking about really is about value creation. >> Well the thing that makes active-active interesting is, again, borrowing from your terms, it's a new term to both of us, I think, today. I like it actually. But the thing that makes it interesting is the idea that you can have a source here that is writing things, and you can have a source over there that are writing things, and as a consequence, you can nonetheless look at a distributed database and keep it consistent. >> Consistent, yeah. >> And that is a major, major challenge that's going to become increasingly a fundamental feature of our digital business as well. >> It's an enabling technology for the value creation and you call it work. >> Yeah, that's right. >> Transformation of work. Jagane, congratulations on the active-active, and WANdiscos's technology and all your deals you're doing, got all the cloud locked up. What's next? Well you going to lock up the edge? You're going to lock up the edge too, the cloud. >> We do like this notion of the edge cloud and all the intermediate steps. We think that replicating data between those systems or running consistent compute across those systems is an interesting problem for us to solve. We've got all the ingredients to solve that problem. We will be on that. >> Jagane Sundar, CTO of WANdisco, back on theCUBE, bringing it down. New tech, whole new generation of modern apps and infrastructure happening in distributed and decentralized networks. Of course theCUBE's got it covered for you, and more live coverage here in New York City for BigData NYC, our annual event, theCUBE and Wikibon here in Hell's Kitchen in Manhattan, more live coverage after this short break.
SUMMARY :
brought to you by SiliconANGLE Media great to see you again as usual here on theCUBE. Thank you John, thank you Peter, Disaster recovery's been the low hanging fruit for you guys, One of the things we do is we can replicate Are you saying that nobody does because to do active-active, you need this capability the puck came to you guys. and with the global infrastructure trends, And the cloud is just a data center, and the messaging to both locations You guys put a stamp on the stuff, is the only way for you to do active-active replication, or are you guys hoarding it for yourselves? and we work with several powerful partners But the key thing, John, in answer to your question that you guys have the data center replication thing down. Once you have that capability, Does that relate to anything in your tech? and you can use some shaky technologies but then they're exposed on reliability. Could be the same provider, could be another provider. So the guarantees are, you know we are willing to is it the ops guy, is it the facilities, who... you cannot have access to your data. And HPE in Houston, and we give you that capability. I know you do a lot of biz dev deals. and you can't push all that data into a single data center, and get that on the record. and that appended the cost model of NFS and SANs, So you got all Like the fiber channel of replication. But the basic proposition is look, in the big data world, we look back, you know, Intelligence is the capability to adapt to changes. and making sure the data's available everywhere So the whole concept of digital business is able to be where it needs to be What's the phrase you use? That's a huge, that means something completely, that it actually happens the way we want it to, in what you're talking about really is about is the idea that you can have a source here that's going to become increasingly and you call it work. Well you going to lock up the edge? We've got all the ingredients to solve that problem. and more live coverage here in New York City
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Jagane Sundar | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Jagane | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
WANdisco | ORGANIZATION | 0.99+ |
Stephen Hawking | PERSON | 0.99+ |
two writers | QUANTITY | 0.99+ |
Houston | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Puerto Rico | LOCATION | 0.99+ |
Texas | LOCATION | 0.99+ |
New York | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Wikibon Research | ORGANIZATION | 0.99+ |
VMworld | ORGANIZATION | 0.99+ |
Florida | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
eight years | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
two separate databases | QUANTITY | 0.99+ |
20 years ago | DATE | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Cube | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
WANdiscos | ORGANIZATION | 0.98+ |
over 10 years' | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
SiliconANGLE Media | ORGANIZATION | 0.98+ |
one form | QUANTITY | 0.97+ |
Wikibon | ORGANIZATION | 0.97+ |
One | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
seven years ago | DATE | 0.96+ |
one | QUANTITY | 0.96+ |
one region | QUANTITY | 0.96+ |
Hadoop | TITLE | 0.96+ |
Hortonworks Dataplane | ORGANIZATION | 0.95+ |
NYC | LOCATION | 0.95+ |
four companies | QUANTITY | 0.94+ |
single region | QUANTITY | 0.94+ |
years | DATE | 0.93+ |
Dataplane | ORGANIZATION | 0.91+ |
single location | QUANTITY | 0.91+ |
single data center | QUANTITY | 0.91+ |
HPE | ORGANIZATION | 0.9+ |
one side | QUANTITY | 0.9+ |
one western | QUANTITY | 0.89+ |
Paxos | TITLE | 0.89+ |
Paxos | OTHER | 0.88+ |
both locations | QUANTITY | 0.88+ |
10 years | QUANTITY | 0.88+ |
BigData | EVENT | 0.87+ |
Azure | TITLE | 0.86+ |
Rob Thomas, IBM Analytics | IBM Fast Track Your Data 2017
>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM: Fast Track Your Data. Brought to you by IBM. >> Welcome, everybody, to Munich, Germany. This is Fast Track Your Data brought to you by IBM, and this is theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise. My name is Dave Vellante, and I'm here with my co-host Jim Kobielus. Rob Thomas is here, he's the General Manager of IBM Analytics, and longtime CUBE guest, good to see you again, Rob. >> Hey, great to see you. Thanks for being here. >> Dave: You're welcome, thanks for having us. So we're talking about, we missed each other last week at the Hortonworks DataWorks Summit, but you came on theCUBE, you guys had the big announcement there. You're sort of getting out, doing a Hadoop distribution, right? TheCUBE gave up our Hadoop distributions several years ago so. It's good that you joined us. But, um, that's tongue-in-cheek. Talk about what's going on with Hortonworks. You guys are now going to be partnering with them essentially to replace BigInsights, you're going to continue to service those customers. But there's more than that. What's that announcement all about? >> We're really excited about that announcement, that relationship, just to kind of recap for those that didn't see it last week. We are making a huge partnership with Hortonworks, where we're bringing data science and machine learning to the Hadoop community. So IBM will be adopting HDP as our distribution, and that's what we will drive into the market from a Hadoop perspective. Hortonworks is adopting IBM Data Science Experience and IBM machine learning to be a core part of their Hadoop platform. And I'd say this is a recognition. One is, companies should do what they do best. We think we're great at data science and machine learning. Hortonworks is the best at Hadoop. Combine those two things, it'll be great for clients. And, we also talked about extending that to things like Big SQL, where they're partnering with us on Big SQL, around modernizing data environments. And then third, which relates a little bit to what we're here in Munich talking about, is governance, where we're partnering closely with them around unified governance, Apache Atlas, advancing Atlas in the enterprise. And so, it's a lot of dimensions to the relationship, but I can tell you since I was on theCUBE a week ago with Rob Bearden, client response has been amazing. Rob and I have done a number of client visits together, and clients see the value of unlocking insights in their Hadoop data, and they love this, which is great. >> Now, I mean, the Hadoop distro, I mean early on you got into that business, just, you had to do it. You had to be relevant, you want to be part of the community, and a number of folks did that. But it's really sort of best left to a few guys who want to do that, and Apache open source is really, I think, the way to go there. Let's talk about Munich. You guys chose this venue. There's a lot of talk about GDPR, you've got some announcements around unified government, but why Munich? >> So, there's something interesting that I see happening in the market. So first of all, you look at the last five years. There's only 10 companies in the world that have outperformed the S&P 500, in each of those five years. And we started digging into who those companies are and what they do. They are all applying data science and machine learning at scale to drive their business. And so, something's happening in the market. That's what leaders are doing. And I look at what's happening in Europe, and I say, I don't see the European market being that aggressive yet around data science, machine learning, how you apply data for competitive advantage, so we wanted to come do this in Munich. And it's a bit of a wake-up call, almost, to say hey, this is what's happening. We want to encourage clients across Europe to think about how do they start to do something now. >> Yeah, of course, GDPR is also a hook. The European Union and you guys have made some talk about that, you've got some keynotes today, and some breakout sessions that are discussing that, but talk about the two announcements that you guys made. There's one on DB2, there's another one around unified governance, what do those mean for clients? >> Yeah, sure, so first of all on GDPR, it's interesting to me, it's kind of the inverse of Y2K, which is there's very little hype, but there's huge ramifications. And Y2K was kind of the opposite. So look, it's coming, May 2018, clients have to be GDPR-compliant. And there's a misconception in the market that that only impacts companies in Europe. It actually impacts any company that does any type of business in Europe. So, it impacts everybody. So we are announcing a platform for unified governance that makes sure clients are GDPR-compliant. We've integrated software technology across analytics, IBM security, some of the assets from the Promontory acquisition that IBM did last year, and we are delivering the only platform for unified governance. And that's what clients need to be GDPR-compliant. The second piece is data has to become a lot simpler. As you think about my comment, who's leading the market today? Data's hard, and so we're trying to make data dramatically simpler. And so for example, with DB2, what we're announcing is you can download and get started using DB2 in 15 minutes or less, and anybody can do it. Even you can do it, Dave, which is amazing. >> Dave: (laughs) >> For the first time ever, you can-- >> We'll test that, Rob. >> Let's go test that. I would love to see you do it, because I guarantee you can. Even my son can do it. I had my son do it this weekend before I came here, because I wanted to see how simple it was. So that announcement is really about bringing, or introducing a new era of simplicity to data and analytics. We call it Download And Go. We started with SPSS, we did that back in March. Now we're bringing Download And Go to DB2, and to our governance catalog. So the idea is make data really simple for enterprises. >> You had a community edition previous to this, correct? There was-- >> Rob: We did, but it wasn't this easy. >> Wasn't this simple, okay. >> Not anybody could do it, and I want to make it so anybody can do it. >> Is simplicity, the rate of simplicity, the only differentiator of the latest edition, or I believe you have Kubernetes support now with this new addition, can you describe what that involves? >> Yeah, sure, so there's two main things that are new functionally-wise, Jim, to your point. So one is, look, we're big supporters of Kubernetes. And as we are helping clients build out private clouds, the best answer for that in our mind is Kubernetes, and so when we released Data Science Experience for Private Cloud earlier this quarter, that was on Kubernetes, extending that now to other parts of the portfolio. The other thing we're doing with DB2 is we're extending JSON support for DB2. So think of it as, you're working in a relational environment, now just through SQL you can integrate with non-relational environments, JSON, documents, any type of no-SQL environment. So we're finally bringing to fruition this idea of a data fabric, which is I can access all my data from a single interface, and that's pretty powerful for clients. >> Yeah, more cloud data development. Rob, I wonder if you can, we can go back to the machine learning, one of the core focuses of this particular event and the announcements you're making. Back in the fall, IBM made an announcement of Watson machine learning, for IBM Cloud, and World of Watson. In February, you made an announcement of IBM machine learning for the z platform. What are the machine learning announcements at this particular event, and can you sort of connect the dots in terms of where you're going, in terms of what sort of innovations are you driving into your machine learning portfolio going forward? >> I have a fundamental belief that machine learning is best when it's brought to the data. So, we started with, like you said, Watson machine learning on IBM Cloud, and then we said well, what's the next big corpus of data in the world? That's an easy answer, it's the mainframe, that's where all the world's transactional data sits, so we did that. Last week with the Hortonworks announcement, we said we're bringing machine learning to Hadoop, so we've kind of covered all the landscape of where data is. Now, the next step is about how do we bring a community into this? And the way that you do that is we don't dictate a language, we don't dictate a framework. So if you want to work with IBM on machine learning, or in Data Science Experience, you choose your language. Python, great. Scala or Java, you pick whatever language you want. You pick whatever machine learning framework you want, we're not trying to dictate that because there's different preferences in the market, so what we're really talking about here this week in Munich is this idea of an open platform for data science and machine learning. And we think that is going to bring a lot of people to the table. >> And with open, one thing, with open platform in mind, one thing to me that is conspicuously missing from the announcement today, correct me if I'm wrong, is any indication that you're bringing support for the deep learning frameworks like TensorFlow into this overall machine learning environment. Am I wrong? I know you have Power AI. Is there a piece of Power AI in these announcements today? >> So, stay tuned on that. We are, it takes some time to do that right, and we are doing that. But we want to optimize so that you can do machine learning with GPU acceleration on Power AI, so stay tuned on that one. But we are supporting multiple frameworks, so if you want to use TensorFlow, that's great. If you want to use Caffe, that's great. If you want to use Theano, that's great. That is our approach here. We're going to allow you to decide what's the best framework for you. >> So as you look forward, maybe it's a question for you, Jim, but Rob I'd love you to chime in. What does that mean for businesses? I mean, is it just more automation, more capabilities as you evolve that timeline, without divulging any sort of secrets? What do you think, Jim? Or do you want me to ask-- >> What do I think, what do I think you're doing? >> No, you ask about deep learning, like, okay, that's, I don't see that, Rob says okay, stay tuned. What does it mean for a business, that, if like-- >> Yeah. >> If I'm planning my roadmap, what does that mean for me in terms of how I should think about the capabilities going forward? >> Yeah, well what it means for a business, first of all, is what they're going, they're using deep learning for, is doing things like video analytics, and speech analytics and more of the challenges involving convolution of neural networks to do pattern recognition on complex data objects for things like connected cars, and so forth. Those are the kind of things that can be done with deep learning. >> Okay. And so, Rob, you're talking about here in Europe how the uptick in some of the data orientation has been a little bit slower, so I presume from your standpoint you don't want to over-rotate, to some of these things. But what do you think, I mean, it sounds like there is difference between certainly Europe and those top 10 companies in the S&P, outperforming the S&P 500. What's the barrier, is it just an understanding of how to take advantage of data, is it cultural, what's your sense of this? >> So, to some extent, data science is easy, data culture is really hard. And so I do think that culture's a big piece of it. And the reason we're kind of starting with a focus on machine learning, simplistic view, machine learning is a general-purpose framework. And so it invites a lot of experimentation, a lot of engagement, we're trying to make it easier for people to on-board. As you get to things like deep learning as Jim's describing, that's where the market's going, there's no question. Those tend to be very domain-specific, vertical-type use cases and to some extent, what I see clients struggle with, they say well, I don't know what my use case is. So we're saying, look, okay, start with the basics. A general purpose framework, do some tests, do some iteration, do some experiments, and once you find out what's hunting and what's working, then you can go to a deep learning type of approach. And so I think you'll see an evolution towards that over time, it's not either-or. It's more of a question of sequencing. >> One of the things we've talked to you about on theCUBE in the past, you and others, is that IBM obviously is a big services business. This big data is complicated, but great for services, but one of the challenges that IBM and other companies have had is how do you take that service expertise, codify it to software and scale it at large volumes and make it adoptable? I thought the Watson data platform announcement last fall, I think at the time you called it Data Works, and then so the name evolved, was really a strong attempt to do that, to package a lot of expertise that you guys had developed over the years, maybe even some different software modules, but bring them together in a scalable software package. So is that the right interpretation, how's that going, what's the uptake been like? >> So, it's going incredibly well. What's interesting to me is what everybody remembers from that announcement is the Watson Data Platform, which is a decomposable framework for doing these types of use cases on the IBM cloud. But there was another piece of that announcement that is just as critical, which is we introduced something called the Data First method. And that is the recipe book to say to a client, so given where you are, how do you get to this future on the cloud? And that's the part that people, clients, struggle with, is how do I get from step to step? So with Data First, we said, well look. There's different approaches to this. You can start with governance, you can start with data science, you can start with data management, you can start with visualization, there's different entry points. You figure out the right one for you, and then we help clients through that. And we've made Data First method available to all of our business partners so they can go do that. We work closely with our own consulting business on that, GBS. But that to me is actually the thing from that event that has had, I'd say, the biggest impact on the market, is just helping clients map out an approach, a methodology, to getting on this journey. >> So that was a catalyst, so this is not a sequential process, you can start, you can enter, like you said, wherever you want, and then pick up the other pieces from majority model standpoint? Exactly, because everybody is at a different place in their own life cycle, and so we want to make that flexible. >> I have a question about the clients, the customers' use of Watson Data Platform in a DevOps context. So, are more of your customers looking to use Watson Data Platform to automate more of the stages of the machine learning development and the training and deployment pipeline, and do you see, IBM, do you see yourself taking the platform and evolving it into a more full-fledged automated data science release pipelining tool? Or am I misunderstanding that? >> Rob: No, I think that-- >> Your strategy. >> Rob: You got it right, I would just, I would expand a little bit. So, one is it's a very flexible way to manage data. When you look at the Watson Data Platform, we've got relational stores, we've got column stores, we've got in-memory stores, we've got the whole suite of open-source databases under the composed-IO umbrella, we've got cloud in. So we've delivered a very flexible data layer. Now, in terms of how you apply data science, we say, again, choose your model, choose your language, choose your framework, that's up to you, and we allow clients, many clients start by building models on their private cloud, then we say you can deploy those into the Watson Data Platform, so therefore then they're running on the data that you have as part of that data fabric. So, we're continuing to deliver a very fluid data layer which then you can apply data science, apply machine learning there, and there's a lot of data moving into the Watson Data Platform because clients see that flexibility. >> All right, Rob, we're out of time, but I want to kind of set up the day. We're doing CUBE interviews all morning here, and then we cut over to the main tent. You can get all of this on IBMgo.com, you'll see the schedule. Rob, you've got, you're kicking off a session. We've got Hilary Mason, we've got a breakout session on GDPR, maybe set up the main tent for us. >> Yeah, main tent's going to be exciting. We're going to debunk a lot of misconceptions about data and about what's happening. Marc Altshuller has got a great segment on what he calls the death of correlations, so we've got some pretty engaging stuff. Hilary's got a great piece that she was talking to me about this morning. It's going to be interesting. We think it's going to provoke some thought and ultimately provoke action, and that's the intent of this week. >> Excellent, well Rob, thanks again for coming to theCUBE. It's always a pleasure to see you. >> Rob: Thanks, guys, great to see you. >> You're welcome; all right, keep it right there, buddy, We'll be back with our next guest. This is theCUBE, we're live from Munich, Fast Track Your Data, right back. (upbeat electronic music)
SUMMARY :
Brought to you by IBM. This is Fast Track Your Data brought to you by IBM, Hey, great to see you. It's good that you joined us. and machine learning to the Hadoop community. You had to be relevant, you want to be part of the community, So first of all, you look at the last five years. but talk about the two announcements that you guys made. Even you can do it, Dave, which is amazing. I would love to see you do it, because I guarantee you can. but it wasn't this easy. and I want to make it so anybody can do it. extending that now to other parts of the portfolio. What are the machine learning announcements at this And the way that you do that is we don't dictate I know you have Power AI. We're going to allow you to decide So as you look forward, maybe it's a question No, you ask about deep learning, like, okay, that's, and speech analytics and more of the challenges But what do you think, I mean, it sounds like And the reason we're kind of starting with a focus One of the things we've talked to you about on theCUBE And that is the recipe book to say to a client, process, you can start, you can enter, and deployment pipeline, and do you see, IBM, models on their private cloud, then we say you can deploy and then we cut over to the main tent. and that's the intent of this week. It's always a pleasure to see you. This is theCUBE, we're live from Munich,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Jim | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Rob | PERSON | 0.99+ |
Marc Altshuller | PERSON | 0.99+ |
Hilary | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
February | DATE | 0.99+ |
Dave | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
May 2018 | DATE | 0.99+ |
March | DATE | 0.99+ |
Munich | LOCATION | 0.99+ |
Scala | TITLE | 0.99+ |
Apache | ORGANIZATION | 0.99+ |
second piece | QUANTITY | 0.99+ |
Last week | DATE | 0.99+ |
Java | TITLE | 0.99+ |
last year | DATE | 0.99+ |
two announcements | QUANTITY | 0.99+ |
10 companies | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
DB2 | TITLE | 0.99+ |
15 minutes | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
European Union | ORGANIZATION | 0.99+ |
five years | QUANTITY | 0.99+ |
JSON | TITLE | 0.99+ |
Watson Data Platform | TITLE | 0.99+ |
third | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
this week | DATE | 0.98+ |
today | DATE | 0.98+ |
a week ago | DATE | 0.98+ |
two things | QUANTITY | 0.98+ |
SQL | TITLE | 0.98+ |
last fall | DATE | 0.98+ |
2017 | DATE | 0.98+ |
Munich, Germany | LOCATION | 0.98+ |
each | QUANTITY | 0.98+ |
Y2K | ORGANIZATION | 0.98+ |
Raj Verma, Hortonworks - DataWorks Summit 2017
>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to by Hortonworks. >> Welcome back to theCUBE, we are live, on day two of the DataWorks Summit. I'm Lisa Martin. #DWS17, join the conversation. We've had a great day and a half. We have learned from a ton of great influencers and leaders about really what's going on with big data, data science, how things are changing. My cohost is George Gilbert. We're joined by my old buddy, the COO of Hortonworks, Rajnish Verma. Raj, it's great to have you on theCUBE. >> It's great to be here, Lisa. Great to see you as well, it's been a while. >> It has, so yesterday on the customer panel, the Raj I know had great conversation with customers from, Duke Energy was one. You also had Black Knight on the financial services side. >> Rajnish: And HSC. >> Yes, on the insurance side, and one of the things that, a couple things that really caught my attention, one was when Duke said, kind of, where they were using data and moving to Hadoop, but they are now a digital company. They're now a technology company that sells electricity and products, which I thought was fantastic. Another thing that I found really interesting about that was they all talked about the need to leverage big data, and glean insights and monetize that, really requires this cultural shift. So I know you love customer interactions. Talk to us about what you're seeing. Those are three great industry examples. What are you seeing? Where are customers on this sort of maturity model where big data and Hadoop are concerned? >> Sure, happy to. So one thing that I enjoy the most about my job is meeting customers and talking to them about the art of the possible. And some of the stuff that they're doing, and, which was only science fiction, really, about two or three years ago. And they're a couple of questions that you've just asked me as to where they are on their journey, what are they trying to accomplish, et cetera. I remember about, five, seven, 10 years ago where Marc Andreessen said "Software is eating the world." And to be honest with you, now, it's now more like every company is a data company. I wouldn't say data is eating the world, but without effective monetization of your data assets, you can't be a force to reckon with as a company. So that is a common theme that we are seeing irrespective of industry, irrespective of customer, irrespective of really the size of the customer. The only thing that sort of varies is the amount and complexity of data, from one company to the other. Now, when, I'm new to Hortonworks as you know. It's really my fifth month here. And one of the things that I've seen and, Lisa, as you know, are coming from TIBCO. So we've been dealing with data. I have been involved with data for over a decade and a half now, right. So the difference was, 15 years ago, we were dealing with really structured data and we actually connected the structured data and gleaned insights into structured data. Now, today, a seminal challenge that every CIO or chief data officer is trying to solve is how do you get actionable insights into semi-structured and unstructured data. Now, so, getting insights into that data first requires ability to aggregate data, right. Once you've aggregated data, you also need a platform to make sense of data in real-time, that is being streamed at you. Now once you do those two things, then you put yourself in a position to analyze that data. So in that journey, as you asked, where our customers are. Some are defining their data aggregation strategy. The others, having defined data aggregation, they're talking about streaming analytics as a platform, and then the others are talking about data science and machine learning and deep learning, as a journey. Now, you saw the customer panel yesterday. But the one point I'd like to make is, it's not only the Duke Energies and the Black Knights of the world, or the HSC, who I believe are big, large firms that are using data. Even a company like, an old agricultural company, or I shouldn't say old but steeped in heritage is probably the right word. 96, 97 year old agricultural company that's in the animal feed business. Animal feed. Multi-billion dollar animal feed business. They use data to monetize their business model. What they say is, they've been feeding animals for the last 70 years. Sp now they go to a farmer and they have enough data about how to feed animals, that they can actually tell the farmer, that this hog that you have right now, which is 17 pounds, I can guarantee you that I will have him or her on a nutrition that, by four months, it'll be 35 pounds. How much are you willing to pay? So even in the animal feed business, data is being used to drive not only insights, but monetization models. >> Wow. >> So. >> That's outstanding. >> Thank you. >> So in getting to that level of sophistication, it's not like every firm sort of has the skills and technology in place to do that. What are some of the steps that you find that they typically have to go through to get to that level of maturity? Like, where do they make mistakes? Where do they find the skills to manage on-prem infrastructure, if it is on-premmed? What about, if they're trying to do a hybrid cloud setup. How complex is that? >> I think that's where the power of the community comes through at multiple levels. So we're committed to the open-source movement. We're committed to the community-based development of data. Now, this community-based business model does a few things. Firstly, it keeps the innovation at the leading edge, bleeding edge, number one. But as you heard the panel talk about yesterday, one of the biggest benefits that our customers see of using open source, is, sure economics is good, but that's not the leading reason. Keeping up with innovation, very high up there. Avoiding when to lock in, again very, very high up there. But one of the biggest reasons that CIOs gave me for choosing open source as a business model is more to do with the fact that they can attract good talent, and without open source, you can't actually attract talent. And I can relate to that because I have a sophomore at home. And it just happened to me that she's 15 now but she's been using open source since she was 11. The iPhone and, she downloads an application for free. She uses it, and if she stretches the limit of that, then she orders something more in a paid model. So the community helps people do a few things. Be able to fail fast if they need to. The second is, it lowers the barriers of entry, right. Because it's really free. You can have the same model. The third is, you can rely on the community for support and methodologies and best practices and lessons learned from implementations. The fourth is, it's a great hiring ground in terms of bringing people in and attracting Millennial talent, young talent, and sought-after talent. So that's really probably the answer that I would have for that. >> When you talk about the business model, the open-source business model and the attraction on the customer side, that sounded like there's this analogy with sort of the agro-business customer in the sense that there are offering data along with their traditional product. If your traditional product is open-source data management, what a room started telling us this morning was the machine learning that goes along with operating not only your own sort of internal workloads but customers, and being to offer prescriptive advice on operations, essentially IT operations. Is that the core, will that become the core of sort of value-add through data for an open-source business model like yours? >> I don't want to be speculative but I'll probably answer it another way. I think our vision, which was set by our founder Rob Bearden, and he took you guys through that yesterday, was way back when, we did say that our mission in life is to manage the world's data. So that mission hasn't changed. And the second was, we would do it as a open-source community or as a big contributing part of that community. And that has really not changed. Now, we feel that machine learning and data science and deep learning are areas that we're very, very excited about, our customers are very, very excited about. Now, the one thing that we did cover yesterday and I think earlier today as well, I'm a computer science engineer. And when I was in college, way back when, 25 years ago, I was interested in AI and ML. And it has existed for 50 years. The reason why it hasn't been available to the common man, so as to speak, is because of two reasons. One is, it did not have a source of data that it could sit on top of, that makes machine learning and AI effective. Or at least not a commercially-viable option to do so. Now, there is one. The second is, the compute power required to run some of the large algorithms that really give you insights into machine learning and AI. So we've become the platform on which customers can take advantage of excellent machine learning and AI tools to get insights. Now, that is two independent sort of categories. One is the open source community providing the platform. And then what tools the customer has used to apply data science and machine learning, so. >> So, all right. I'm thinking something that is slightly different and maybe the nuance is making it tough to articulate. But it's how can Hortonworks take the data platform and data science tools that you use to help understand how to operate important works, whether it's on a customer prem, or in the cloud. In other words, how can you use machine learning to make it a sort of a more effective and automated manage service? >> Yeah, and I think that's, the nuance's not lost in me. I think what I'm trying to sort of categorize is, for that to happen, you require two things. One is data aggregator across on-prem and cloud. Because when you have data which is multi-tenancy, you have a lot of issues with data security, data governance, all the rest of it. Now, that is what we plan to manage for the world, so as to speak. Now, on top of that, customers who require to have data science or deep learning to be used, we provide that platform. Now, whether that is used as a service by the customer, which we would be happy to provide, or it is used inhouse, on-prem, on various cloud models, that's more a customer decision. We don't want to force that decision. However, from the art of the possible perspective, yes it's possible. >> I love the mission to manage the world's data. >> Thank you. >> That's a lofty goal, but yesterday's announcements with IBM were pretty, pretty transformative. In your opinion as chief operating officer, how do you see this extension of this technology and strategic partnership helping Hortonworks on the next level of managing the world's data? >> Absolutely, it's game-changing for us. We're very, very excited. Our colleagues are very, very excited about the opportunity to partner. It's also a big validation of the fact that we now have a pretty large open-source community that contributes to this cause. So we're very excited about that. The opportunity is in actually our partnering with a leader in data science, machine learning, and AI, a company that has steeped in heritage, is known for game-changing, next technology moves. And the fact that we're powering it from a data perspective is something that we're very, very excited and pleased about. And the opportunities are limitless. >> I love that, and I know you are a game-changer, in your fifth month. We thank you so much, Raj, for joining us. It was great to see you. Continued success, >> Thank you. >> at managing the world's data and being that game-changer, yourself, and for Hortonworks as well. >> Thank you Lisa, good to see you. >> You've been watching theCUBE. Again, we're live, day two of the DataWorks Summit, #DWS17. For my cohost, George Gilbert, I'm Lisa Martin. Stick around guys, we'll be right back with more great content. (jingle)
SUMMARY :
in the heart of Silicon Valley, Raj, it's great to have you on theCUBE. Great to see you as well, it's been a while. You also had Black Knight on the financial services side. Yes, on the insurance side, and one of the things that, But the one point I'd like to make is, What are some of the steps that you find is more to do with the fact that they can attract and the attraction on the customer side, Now, the one thing that we did cover yesterday and maybe the nuance is making it tough to articulate. for that to happen, you require two things. on the next level of managing the world's data? about the opportunity to partner. I love that, and I know you are a game-changer, at managing the world's data of the DataWorks Summit, #DWS17.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Marc Andreessen | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Duke Energy | ORGANIZATION | 0.99+ |
Lisa | PERSON | 0.99+ |
TIBCO | ORGANIZATION | 0.99+ |
Duke Energies | ORGANIZATION | 0.99+ |
Raj Verma | PERSON | 0.99+ |
35 pounds | QUANTITY | 0.99+ |
Raj | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
50 years | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
17 pounds | QUANTITY | 0.99+ |
fifth month | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Rajnish Verma | PERSON | 0.99+ |
HSC | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
15 | QUANTITY | 0.99+ |
four months | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Black Knights | ORGANIZATION | 0.99+ |
Duke | ORGANIZATION | 0.99+ |
two reasons | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
Firstly | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
one company | QUANTITY | 0.99+ |
DataWorks Summit 2017 | EVENT | 0.98+ |
DataWorks Summit | EVENT | 0.98+ |
three | QUANTITY | 0.98+ |
#DWS17 | EVENT | 0.98+ |
Multi-billion dollar | QUANTITY | 0.98+ |
fourth | QUANTITY | 0.98+ |
one thing | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
15 years ago | DATE | 0.97+ |
11 | QUANTITY | 0.96+ |
this morning | DATE | 0.95+ |
25 years ago | DATE | 0.95+ |
one point | QUANTITY | 0.94+ |
day two | QUANTITY | 0.93+ |
Rajnish | PERSON | 0.93+ |
first | QUANTITY | 0.93+ |
five | DATE | 0.91+ |
three years ago | DATE | 0.91+ |
theCUBE | ORGANIZATION | 0.9+ |
96, 97 year old | QUANTITY | 0.89+ |
Hortonworks - DataWorks Summit 2017 | EVENT | 0.87+ |
earlier today | DATE | 0.87+ |
COO | PERSON | 0.86+ |
10 years ago | DATE | 0.86+ |
about two | DATE | 0.84+ |
seven | DATE | 0.8+ |
couple | QUANTITY | 0.8+ |
Hadoop | ORGANIZATION | 0.75+ |
over a decade and a half | QUANTITY | 0.72+ |
last 70 years | DATE | 0.69+ |
Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017
>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)
SUMMARY :
brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jamie | PERSON | 0.99+ |
Telco | ORGANIZATION | 0.99+ |
Madhu | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Jamie Engesser | PERSON | 0.99+ |
Madhu Kochar | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Lisa | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
second step | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
first step | QUANTITY | 0.99+ |
two activities | QUANTITY | 0.99+ |
San Jose | LOCATION | 0.99+ |
second thing | QUANTITY | 0.99+ |
Hortonwork | ORGANIZATION | 0.99+ |
2015 | DATE | 0.99+ |
first | QUANTITY | 0.99+ |
first thought | QUANTITY | 0.98+ |
two things | QUANTITY | 0.98+ |
eight months | QUANTITY | 0.98+ |
three things | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
DataWorks Summit | EVENT | 0.97+ |
DataWorks Summit 2017 | EVENT | 0.97+ |
two next guests | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
Hadoop | TITLE | 0.97+ |
Apache | ORGANIZATION | 0.97+ |
George Chow, Simba Technologies - DataWorks Summit 2017
>> (Announcer) Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi everybody, this is George Gilbert, Big Data and Analytics Analyst with Wikibon. We are wrapping up our show on theCUBE today at DataWorks 2017 in San Jose. It has been a very interesting day, and we have a special guest to help us do a survey of the wrap-up, George Chow from Simba. We used to call him Chief Technology Officer, now he's Technology Fellow, but when we was explaining the different in titles to me, I thought he said Technology Felon. (George Chow laughs) But he's since corrected me. >> Yes, very much so >> So George and I have been, we've been looking at both Spark Summit last week and DataWorks this week. What are some of the big advances that really caught your attention? >> What's caught my attention actually is how much manufacturing has really, I think, caught into the streaming data. I think last week was very notable that both Volkswagon and Audi actually had case studies for how they're using streaming data. And I think just before the break now, there was also a similar session from Ford, showcasing what they are doing around streaming data. >> And are they using the streaming analytics capabilities for autonomous driving, or is it other telemetry that they're analyzing? >> The, what is it, I think the Volkswagon study was production, because I still have to review the notes, but the one for Audi was actually quite interesting because it was for managing paint defect. >> (George Gilbert) For paint-- >> Paint defect. >> (George Gilbert) Oh. >> So what they were doing, they were essentially recording the environmental condition that they were painting the cars in, basically the entire pipeline-- >> To predict when there would be imperfections. >> (George Chow) Yes. >> Because paint is an extremely high-value sort of step in the assembly process. >> Yes, what they are trying to do is to essentially make a connection between downstream defect, like future defect, and somewhat trying to pinpoint the causes upstream. So the idea is that if they record all the environmental conditions early on, they could turn around and hopefully figure it out later on. >> Okay, this sounds really, really concrete. So what are some of the surprising environmental variables that they're tracking, and then what's the technology that they're using to build model and then anticipate if there's a problem? >> I think the surprising finding they said were actually, I think it was a humidity or fan speed, if I recall, at the time when the paint was being applied, because essentially, paint has to be... Paint is very sensitive to the condition that is being applied to the body. So my recollection is that one of the finding was that it was a narrow window during which the paint were, like, ideal, in terms of having the least amount of defect. >> So, had they built a digital twin style model, where it's like a digital replica of some aspects of the car, or was it more of a predictive model that had telemetry coming at it, and when it's an outside a certain bounds they know they're going to have defects downstream? >> I think they're still working on the predictive model, or actually the model is still being built, because they are essentially trying to build that model to figure out how they should be tuning the production pipeline. >> Got it, so this is sort of still in the development phase? >> (George Chow) Yeah, yeah >> And can you tell us, did they talk about the technologies that they're using? >> I remember the... It's a little hazy now because after a couple weeks of conference, so I don't remember the specifics because I was counting on the recordings to come out in a couples weeks' time. So I'll definitely share that. It's a case study to keep an eye on. >> So tell us, were there other ones where this use of real-time or near real-time data had some applications that we couldn't do before because we now can do things with very low latency? >> I think that's the one that I was looking forward to with Ford. That was the session just earlier, I think about an hour ago. The session actually consisted of a demo that was being done live, you know. It was being streamed to us where they were showcasing the data that was coming off a car that's been rigged up. >> So what data were they tracking and what were they trying to anticipate here? >> They didn't give enough detail, but it was basically data coming off of the CAN bus of the car, so if anybody is familiar with the-- >> Oh that's right, you're a car guru, and you and I compare, well our latest favorite is the Porche Macan >> Yes, yes. >> SUV, okay. >> But yeah, they were looking at streaming the performance data of the car as well as the location data. >> Okay, and... Oh, this sounds more like a test case, like can we get telemetry data that might be good for insurance or for... >> Well they've built out the system enough using the Lambda Architecture with Kafka, so they were actually consuming the data in real-time, and the demo was actually exactly seeing the data being ingested and being acted on. So in the case they were doing a simplistic visualization of just placing the car on the Google Map so you can basically follow the car around. >> Okay so, what was the technical components in the car, and then, how much data were they sending to some, or where was the data being sent to, or how much of the data? >> The data was actually sent, streamed, all the way into Ford's own data centers. So they were using NiFi with all the right proxy-- >> (George Gilbert) NiFi being from Hortonworks there. >> Yeah, yeah >> The Hortonworks data flow, okay >> Yeah, with all the appropriate proxys and firewall to bring it all the way into a secure environment. >> Wow >> So it was quite impressive from the point of view of, it was life data coming off of the 4G modem, well actually being uploaded through the 4G modem in the car. >> Wow, okay, did they say how much compute and storage they needed in the device, in this case the car? >> I think they were using a very lightweight platform. They were streaming apparently from the Raspberry Pi. >> (George Gilbert) Oh, interesting. >> But they were very guarded about what was inside the data center because, you know, for competitive reasons, they couldn't share much about how big or how large a scale they could operate at. >> Okay, so Simba has been doing ODBC and JDBC drivers to standard APIs, to databases for a long time. That was all about, that was an era where either it was interactive or batch. So, how is streaming, sort of big picture, going to change the way applications are built? >> Well, one way to think about streaming is that if you look at many of these APIs, into these systems, like Spark is a good example, where they're trying to harmonize streaming and batch, or rather, to take away the need to deal with it as a streaming system as opposed to a batch system, because it's obviously much easier to think about and reason about your system when it is traditional, like in the traditional batch model. So, the way that I see it also happening is that streaming systems will, you could say will adapt, will actually become easier to build, and everyone is trying to make it easier to build, so that you don't have to think about and reason about it as a streaming system. >> Okay, so this is really important. But they have to make a trade-off if they do it that way. So there's the desire for leveraging skill sets, which were all batch-oriented, and then, presumably SQL, which is a data manipulation everyone's comfortable with, but then, if you're doing it batch-oriented, you have a portion of time where you're not sure you have the final answer. And I assume if you were in a streaming-first solution, you would explicitly know whether you have all the data or don't, as opposed to late arriving stuff, that might come later. >> Yes, but what I'm referring to is actually the programming model. All I'm saying is that more and more people will want streaming applications, but more and more people need to develop it quickly, without having to build it in a very specialized fashion. So when you look at, let's say the example of Spark, when they focus on structured streaming, the whole idea is to make it possible for you to develop the app without having to write it from scratch. And the comment about SQL is actually exactly on point, because the idea is that you want to work with the data, you can say, not mindful, not with a lot of work to account for the fact that it is actually streaming data that could arrive out of order even, so the whole idea is that if you can build applications in a more consistent way, irrespective whether it's batch or streaming, you're better off. >> So, last week even though we didn't have a major release of Spark, we had like a point release, or a discussion about the 2.2 release, and that's of course very relevant for our big data ecosystem since Spark has become the compute engine for it. Explain the significance where the reaction time, the latency for Spark, went down from several hundred milliseconds to one millisecond or below. What are the implications for the programming model and for the applications you can build with it. >> Actually, hitting that new threshold, the millisecond, is actually a very important milestone because when you look at a typical scenario, let's say with AdTech where you're serving ads, you really only have, maybe, on the order about 100 or maybe 200 millisecond max to actually turn around. >> And that max includes a bunch of things, not just the calculation. >> Yeah, and that, let's say 100 milliseconds, includes transfer time, which means that in your real budget, you only have allowances for maybe, under 10 to 20 milliseconds to compute and do any work. So being able to actually have a system that delivers millisecond-level performance actually gives you ability to use Spark right now in that scenario. >> Okay, so in other words, now they can claim, even if it's not per event processing, they can claim that they can react so fast that it's as good as per event processing, is that fair to say? >> Yes, yes that's very fair. >> Okay, that's significant. So, what type... How would you see applications changing? We've only got another minute or two, but how do you see applications changing now that, Spark has been designed for people that have traditional, batch-oriented skills, but who can now learn how to do streaming, real-time applications without learning anything really new. How will that change what we see next year? >> Well I think we should be careful to not pigeonhole Spark as something built for batch, because I think the idea is that, you could say, the originators, of Spark know that it's all about the ease of development, and it's the ease of reasoning about your system. It's not the fact that the technology is built for batch, so the fact that you could use your knowledge and experience and an API that actually is familiar, should leverage it for something that you can build for streaming. That's the power, you could say. That's the strength of what the Spark project has taken on. >> Okay, we're going to have to end it on that note. There's so much more to go through. George, you will be back as a favorite guest on the show. There will be many more interviews to come. >> Thank you. >> With that, this is George Gilbert. We are DataWorks 2017 in San Jose. We had a great day today. We learned a lot from Rob Bearden and Rob Thomas up front about the IBM deal. We had Scott Gnau, CTO of Hortonworks on several times, and we've come away with an appreciation for a partnership now between IBM and Hortonworks that can take the two of them into a set of use cases that neither one on its own could really handle before. So today was a significant day. Tune in tomorrow, we have another great set of guests. Keynotes start at nine, and our guests will be on starting at 11. So with that, this is George Gilbert, signing out. Have a good night. (energetic, echoing chord and drum beat)
SUMMARY :
in the heart of Silicon Valley, do a survey of the wrap-up, What are some of the big advances caught into the streaming data. but the one for Audi was actually quite interesting in the assembly process. So the idea is that if they record So what are some of the surprising environmental So my recollection is that one of the finding or actually the model is still being built, of conference, so I don't remember the specifics the data that was coming off a car the performance data of the car for insurance or for... So in the case they were doing a simplistic visualization So they were using NiFi with all the right proxy-- to bring it all the way into a secure environment. So it was quite impressive from the point of view of, I think they were using a very lightweight platform. the data center because, you know, for competitive reasons, going to change the way applications are built? so that you don't have to think about and reason about it But they have to make a trade-off if they do it that way. so the whole idea is that if you can build and for the applications you can build with it. because when you look at a typical scenario, not just the calculation. So being able to actually have a system that delivers but how do you see applications changing now that, so the fact that you could use your knowledge There's so much more to go through. that can take the two of them
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Scott Gnau | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Audi | ORGANIZATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
George Chow | PERSON | 0.99+ |
Ford | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
one millisecond | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
next year | DATE | 0.99+ |
100 milliseconds | QUANTITY | 0.99+ |
200 millisecond | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
Volkswagon | ORGANIZATION | 0.99+ |
this week | DATE | 0.99+ |
Google Map | TITLE | 0.99+ |
AdTech | ORGANIZATION | 0.99+ |
DataWorks 2017 | EVENT | 0.98+ |
DataWorks Summit 2017 | EVENT | 0.98+ |
both | QUANTITY | 0.98+ |
11 | DATE | 0.98+ |
Spark | TITLE | 0.98+ |
Wikibon | ORGANIZATION | 0.96+ |
under 10 | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
20 milliseconds | QUANTITY | 0.95+ |
Spark Summit | EVENT | 0.94+ |
first solution | QUANTITY | 0.94+ |
SQL | TITLE | 0.93+ |
hundred milliseconds | QUANTITY | 0.93+ |
2.2 | QUANTITY | 0.92+ |
one way | QUANTITY | 0.89+ |
Spark | ORGANIZATION | 0.88+ |
Lambda Architecture | TITLE | 0.87+ |
Kafka | TITLE | 0.86+ |
minute | QUANTITY | 0.86+ |
Porche Macan | ORGANIZATION | 0.86+ |
about 100 | QUANTITY | 0.85+ |
ODBC | TITLE | 0.84+ |
DataWorks | EVENT | 0.84+ |
NiFi | TITLE | 0.84+ |
about an hour ago | DATE | 0.8+ |
JDBC | TITLE | 0.79+ |
Raspberry Pi | COMMERCIAL_ITEM | 0.76+ |
Simba | ORGANIZATION | 0.75+ |
Simba Technologies | ORGANIZATION | 0.74+ |
couples weeks' | QUANTITY | 0.7+ |
CTO | PERSON | 0.68+ |
theCUBE | ORGANIZATION | 0.67+ |
twin | QUANTITY | 0.67+ |
couple weeks | QUANTITY | 0.64+ |
Joe Dickman, Vizuri and Michael Quintero, LogistiCare - Red Hat Summit 2017
>> Narrator: Live from Boston, Massachusetts, it's the Cube. Covering Red Hat Summit 2017, brought to you by Red Hat. (techno music) >> Welcome back to Boston, everybody. And welcome back to Red Hat Summit. This is the Cube, the leader in live tech coverage. My name is Dave Vellante, and I'm here with my co-host, Stu Miniman. Stu, we were saying this is your 100th Red Hat Summit, so congratulations on reaching that milestone. Joe Dickman is here. He's the senior vice president of Vizuri. Cool name, love it. And Michael Quintero, or Quintero if you prefer, of LogistiCare. He's an enterprise solutions architect. Gentlemen, welcome to the Cube. >> Thank you. It's a pleasure to be here. >> So Vizuri. Love the name. It strikes a visualization. It's (mumbles) trendy. Tell us about Vizuri, and tell us about your relationship with LogistiCare, and we'll get into it. >> Vizuri is the private division of a company called AEM Corporation. We created the brand to serve the commercial market for research and development. We became partners with JBoss before Red Hat's acquisition, so we jumped into open source in like 2003. And since then, we've built a business around open source technologies, and market leading technologies that bring value. We found LogistiCare because they solicited us for some work to help them transform their organization. And it's worked out well. I mean, Michael and I have been working together for about 18 months. >> So, tell us a little bit about LogistiCare. >> So LogistiCare is the world's largest provider of non-emergency medical transportation. So, we service the health market around people have benefits. The insurance companies don't provide transportation, and the members come to us and we broker the transportation for them. Been in business for quite some time. Do about 70 million trips a year, a little bit more. And we have roughly 80% of that market. And we just want to stay on top of, and be recognized as the world leader in that capability with the best services and the care for our members. >> So JBoss of course was like the second pillar for Red Hat after Red Hat (mumbles) Rob Bearden, who was a CEO at the time, and Cube alum and friend. But so, how did you utilize that capability, the sort of whole middleware, and how does that affect your digital transformation? And where did you guys all fit together? >> So, well digital transformation is a business strategy, not a technology. So, we looked at our need to be more flexible, and dynamic, and innovate. Our legacy, our what we call classic internally, software stack is limiting. It's not service oriented. It's not extensible. It's a compiled, executable, distributed -- serves the business very well. In fact, we're still using it today in some aspects. We haven't fully replaced it. But it's long in the tooth, and it's difficult for us to reach that new business requirement and test and deliver it scale. So, I joined the company to help modernize that architecture. Very quickly recognized that in order to get to scale, and loosely coupling, and massive customization, that microservices was a good solution for us. And when we surveyed the market for a partner that could help take us there, software wise, Red Hat has the most complete stack. They offer everything we need to do, and then they have the things we think we're going to do in the future. So, we looked around for somebody who could help us get to the Red Hat, enable to that, with Docker, and get to an auto-scaling kind of solution so we have infrastructure on demand. And we found Vizuri as a partner. They were able to help us enable the technology and teach us how to do things that we weren't presently doing. Because we didn't have any kind of scale solution in-house, it was just put more web servers out there. >> We started small, it started with a Business Process Management System. If you think about all the logistics that are necessary for coordinating medical transport, "I'm a dialysis patient. I'm somebody that is home-bound. I need to get to a physician appointment." We took that domain knowledge, that's part of one of the pillars of digital transformation. It's infrastructure, it's integration, and it's knowledge management. We started with knowledge management. Think about all the complex business rules for manage care organizations, reimbursement, right? Which is what LogistiCare does. Quickly after we solved that problem, we looked at integration, and we said, "Well now we have all these trading partners." So we guided LogistiCare into their next purchase which was Fuse. So now we had an API strategy for publicly linking them to other consumer providers, because they are a logistics organization for reimbursement. And as Michael said, we started building data centers. Or LogistiCare did. But guess what? Containers and OpenShift came in and we started provisioning our development environments to Amazon Web Services. And when they saw the cost-savings, they abandoned building out on-prem data centers, and went Cloud-native. >> So there's also a revenue drive, or component, as well, right? >> It is. It is. It's an OpEx (mumbles) and the CapEx cost-savings. >> Let's unpack both of those. >> Joe: Sure. >> Where do you want to start? Cost or the telephone numbers? (laughs) >> So, we're mostly a call center based company in history. Right? We have 20-something call centers around the country. We service most of the U.S. And we have a variety of contracts with medical care providers, like Aetna, and Wellpoint, and Blue Cross, and those type people. And then the managed care organizations come in. So, we look to reduce our OpEx by diminishing the number and the interfaces that we have with our call centers. People don't have to call in to the call centers to do business with us. You know, something like one-minute reduction in call-time is about a six or seven million dollar a year benefit for us. And there's a lot of things that people can do for themselves. I mean, you can call in and cancel a trip that they've had scheduled. We figured that about 30% of the cancellation rate, if we could get that done through a service interface, through an IVR, where you can come in and say "I'm not going to go." and cancel it. That's a five or six million dollar savings for us right there. Just in 30%. >> Michael, I'm curious. Was there any hesitancy inside to say, "Okay. I'm going to kill data centers, going to go to a public Cloud." You know, how did that transition go? And anything, you know, kind of the good, the bad, and the ugly that you could share. >> So, well, we're a healthcare company. HIPA and HITRUST certified coming. And there's a certain amount of fear on Cloud migration. So we had to demonstrate the knowledge, skills, and abilities around getting secure, scalable solutions out to the Cloud. And this is our core application. If we don't do this well, we could become Blockbuster and go away. Right? So we don't want that. So, we had Vizuri come to the table and help us understand just how secure we can be, how OpenShift is helping us make sure our information is never violated. There's great integrity in it. And then we did prototyping, and we actually evaluated it, and we have third parties that come in and take a look at our solution and say, "Can I penetrate that? Can I get into your information?" So, and, we also are subject to audit, not only by the federal government, but by all of our payer partners. So we have to be above the line in every criteria, and we think that we are. >> The other thing that you mention was, when we talk about OpEx, right? That's human capital. He talked about the minute per time on a call. We also reduce tribal knowledge. Think about all these new managed care organizations in health care. Is it the call center representative, is it our responsibility to train them on this car, and this company requires a car service, this company requires an ambulance. That knowledge, if we could eliminate that and put that in the middle tier. Now what we do is we have given them a business scale. Now they have a business strategy for taking on new managed health care organizations. Do you have different compliance rules? Do you have different knowledge? It is no longer us having to go back out to those 20 call centers and re-train everybody, because you never know where the consumers are coming from. So, what they do is they answer the phone, they put their information into the system, and the system makes the deterministic call as to what car service, when, and how it's reimbursed. >> So, you say you automated essentially that tribal knowledge. >> Joe: We did. >> Eliminated it. >> And we reduced it so it not only reduced the calls per time frame, but it sped up our time of getting a call center agent from three weeks of training down to basically one. >> Yes, and we have the ability now to support all of our contracts from any call center. So if there's disaster recovery models, or, you know, Phoenix for instance is one of our larger call centers and they get heavy downpours of rain there. There are times when people can't get to work, or they have outages. We can't afford for that function to be offline. So those skills are very easily moved to another call center to support the members that would call in there. Just route the calls. And there's no local knowledge about, you know, my contract in Arizona does a certain thing, or in the Southwest, so it's very simple to support our population from any call center. That gives us the benefit of providing very high quality service, 'cause people when they call in, they expect us to service them. >> Joe, I want to follow up. We were talking about kind of, you know, hesitancy, healthcare tends to be a little bit conservative. I hear things like microservices, and containers. You know, these are still relatively new things. Is (mumbles) -- sorry, OpenShift the solution that allows you to deliver that with confidence to your customers? >> Yes. OpenShift. (laughs) >> Yeah, sorry about that. (laughs) >> No worries. (laughs) OpenShift does. What happens is the Docker container format enables us to pre-configure those servers and those workloads, and we talked about microservices. We wanted to reduce the business decisions or the integrations into the smallest component. What we also wanted to do was provide some taxonomy with them. These are for billing, these are for scheduling, these are for a different aspect of the business. By that, we can change, and we can change often. >> Mhm. >> How long did it take before if we wanted to make a change to some of the infrastructure? >> So. >> Weeks? Months? >> Well, even longer. I mean infrastructure is hard to acquire. And you only talk about CapEx expense. It's very easy, I mean there's a refresh cycle for equipment that you get. So even when you have it, you have to pay attention to maintenance and keeping that thing going forward. As you add scale to your business, you got to go acquire more storage. And it's not a dynamic thing. You have to plan -- the planning cycle is very difficult. We moved to the Cloud. Now we have infrastructure on demand. There's a myriad of choices of platforms and solutions that we can apply to our business model. Things we hadn't even thought of before. We're actually looking now at potentially moving our call centers away from our in-house standard, and moving to an Amazon provided call center solution. Because it can scale. And we can consolidate. And we can provide service from anywhere in the world. That's a big benefit to us. >> It is. So call center as a service, essentially. >> Michael: Yes. >> Is something you're evaluating. >> Think about how big they are. 80 million rides, right. What they didn't want to do is be disintermediated by the newcomers. Right? The Uber's, the Lyft's. They had a large footprint. So, he used the word Blockbuster before, and that's what they use a lot internally. >> Dave: There's one left, in Alaska, I heard. (laughs) >> Who remembers Blockbuster? And then they remember how Blockbuster was no longer in business. So what they wanted to do is to ensure that -- they agilely transformed not only the software engineering discipline, but their firm beliefs. So, everybody from business analysis through implementation has this new agile approach. And one of the features that we developed, we used to send people home after four hours of dialysis in taxi cabs. So, an executive, or team, at LogistiCare said, "We need dependency. We need certified drivers." They actually entered into a business relationship with Lyft. And you want to talk about an agile enterprise? We developed a custom interface into Lyft with a scheduling service that never existed, within five weeks. >> Michael: That's right. >> We would never have been able to do that. And we moved our first ride after five weeks, and since then, we're currently up to about five or six thousand. But it's going to scale to thousands. And the goal is to, again, as Michael said, let people interface with LogistiCare by their device of choice. If we don't have to have people call in to cancel rides, or call in to schedule, then the business scales, and it scales without human capital. >> And the enablers there, (mumbles) we always talk about it, people, process, and technology. So the technology behind that was, what, you're living this API economy that everybody talks about. >> Michael And Joe: We are. >> Joe: That is exactly what we did. >> And then you've got underneath that, OpenShift, what else is sort of there that you're leveraging? >> BPMS, BRMS. So, Business Process Management System. Business Rules Management System. JBoss fused for an integration strategy and Camel Routes. And then Openshift, and then we do Ansible for doing server provisioning. >> And I have to ask you about the security question again. Stu was (mumbles) poking at it before. We've heard from a lot of practitioners that the security in the Cloud is just fine, it is great actually. The challenge is, it doesn't necessarily exactly map the edicts of our organization. So, is that, did you find that? And did you have to maybe change the way in which you plugged into AWS, or was it just sort of out of the box for you? >> So, you have to understand the shared responsibility model when you move to the Cloud, right? I mean they're very good at the security in the Cloud, or of the Cloud, and you have to be good at the security in the Cloud. You can choose bad technology at Amazon and be insecure. But they have a published, HIPA standard, that if you use these technologies, then you can be HIPA certified. We applied our HITRUST certification standards to our choices. We're making very solid -- and this isn't willy nilly. I mean I've been in a HIPA solution for 20 years. So it's not like I don't know what is required, and what the auditors are going to ask us. So, but I do want to redress one point that we can't go past. Is that (mumbles) Our customers are getting better service from all this we're doing. >> Joe: I agree. >> When somebody calls us and says, "I'm ready to go home from the doctor." and they didn't know what time they were going to go home when they scheduled their ride to the doctor, we can get somebody there in 10 minutes now to come and get them and take them home. >> Dave: Wow. >> That's a great satisfier. Rather than having to wait 90 minutes for us to find somebody that can go pick them up. That world has changed, right? And that's a great customer satisfier and that is why they're going to love continuing to do business with us. >> Great business outcome from something that you probably couldn't have done, you know, five years ago? Even maybe two years ago. >> They're a social caring organization. One of the largest rides that they do is for kidney dialysis. And those people, I mean, I've never had it, but somebody sitting there after four hours of dialysis, the last thing you want to do is wait 90 minutes for a cab. You want to go home. You also want to have an authoritative source that the drivers are credentialed drivers. And that's something that we're working on so that not only do these older generations, right? And think about the baby boomers, which I'm actually part of. >> Michael: Me too. (laughs) >> The age population is growing. So the need for these types of services is growing too. And we become accustomed and we get set in our ways. And people might be fearful. Any taxi showing up, versus now, a Lyft shows up, you know who the driver is. You see the car, you see that. There's a high degree of confidence that LogistiCare has the best interests of their constituents. So they manage that type of business. So it's not just technology, it really is a caring and methodical organization. >> But we have the ability to follow patterns that are already established. We look at how Netflix handles their widely distributed kinds of interface devices. You know, how do they figure out what kind of data-stream to send back to what he's got in his hand versus what I have. We're following the same kind of model, and we're using the technology platform to our best advantage to make sure that we're talking to someone who's got a flip-phone differently than we are talking to someone who's got a (mumbles) Plus, right? (Dave laughs) Because the payload can't be the same, but the backend services don't need to know that. We built a solution here that can examine the request and return the right data-stream. So, "Where's my ride?" Might be "Just around the corner." or it might be a map with a breadcrumb trail and a picture of the driver and all of that. Like you get with a Lyft or an Uber. So, you know, we're building it. >> Great case study, gentlemen. Thanks very much for coming to the Cube and sharing it. >> Well, thank you very much for having, we enjoyed the time. >> Alright, keep it right there everybody. We'll be right back with our next guests. This is the Cube. We're live from Red Hat Summit in Boston. Be right back. (electronic music)
SUMMARY :
brought to you by Red Hat. This is the Cube, the leader in live tech coverage. It's a pleasure to be here. and tell us about your relationship with LogistiCare, We created the brand to serve the commercial market and the members come to us and how does that affect your digital transformation? and then they have the things we and we said, "Well now we have all these trading partners." It's an OpEx (mumbles) and the CapEx cost-savings. and the interfaces that we have with our call centers. And anything, you know, and help us understand just how secure we can be, and the system makes the deterministic call So, you say you automated And we reduced it so it not only Yes, and we have the ability now that allows you to deliver that with confidence (laughs) (laughs) and we can change often. and solutions that we can apply to our business model. So call center as a service, essentially. is be disintermediated by the newcomers. Dave: There's one left, in Alaska, I heard. And one of the features that we developed, And we moved our first ride after five weeks, And the enablers there, (mumbles) and then we do Ansible for doing And I have to ask you about the security question again. and you have to be good at the security in the Cloud. and they didn't know what time and that is why they're going to love that you probably couldn't have done, the last thing you want to do (laughs) You see the car, you see that. We built a solution here that can examine the request Thanks very much for coming to the Cube and sharing it. we enjoyed the time. This is the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michael | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
LogistiCare | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Aetna | ORGANIZATION | 0.99+ |
Arizona | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Rob Bearden | PERSON | 0.99+ |
Michael Quintero | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Joe Dickman | PERSON | 0.99+ |
five | QUANTITY | 0.99+ |
Wellpoint | ORGANIZATION | 0.99+ |
20 | QUANTITY | 0.99+ |
20 years | QUANTITY | 0.99+ |
Quintero | PERSON | 0.99+ |
Vizuri | ORGANIZATION | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Alaska | LOCATION | 0.99+ |
30% | QUANTITY | 0.99+ |
20 call centers | QUANTITY | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
90 minutes | QUANTITY | 0.99+ |
2003 | DATE | 0.99+ |
Lyft | ORGANIZATION | 0.99+ |
one-minute | QUANTITY | 0.99+ |
Blue Cross | ORGANIZATION | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
HITRUST | ORGANIZATION | 0.99+ |
AEM Corporation | ORGANIZATION | 0.99+ |
Stu | PERSON | 0.99+ |
Vizuri | PERSON | 0.99+ |
three weeks | QUANTITY | 0.99+ |
OpEx | ORGANIZATION | 0.99+ |
four hours | QUANTITY | 0.99+ |
first ride | QUANTITY | 0.99+ |
HIPA | ORGANIZATION | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
thousands | QUANTITY | 0.99+ |
five years ago | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
second pillar | QUANTITY | 0.99+ |
OpenShift | TITLE | 0.99+ |
two years ago | DATE | 0.99+ |
10 minutes | QUANTITY | 0.99+ |
six million | QUANTITY | 0.99+ |
CapEx | ORGANIZATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.98+ |
JBoss | ORGANIZATION | 0.98+ |
Red Hat Summit 2017 | EVENT | 0.98+ |
about 18 months | QUANTITY | 0.98+ |
six thousand | QUANTITY | 0.98+ |
about 30% | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Red Hat Summit | EVENT | 0.98+ |
five weeks | QUANTITY | 0.97+ |
Openshift | TITLE | 0.97+ |