Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Jagane Sundar, WANdisco | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone here live in New York City. This is theCUBE special presentation of our annual event with theCUBE and Wikibon Research called BigData NYC, it's our own event that we have every year, celebrating what's going on in the big data world now. It's evolving to all data, cloud applications, AI, you name it, it's happening. In the enterprise, the impact is huge for developers, the impact is huge. I'm John Furrier, cohost of the theCUBE, with Peter Burris, Head of Research, SiliconANGLE Media and General Manager of Wikibon Research. Our next guest is Jagane Sundar, who's the CTO of WANdisco, Cube alumni, great to see you again as usual here on theCUBE. >> Thank you John, thank you Peter, it's great to be back on theCUBE. >> So we've been talking the big data for many years, certainly with you guys, and it's been a great evolution. I don't want to get into the whole backstory and history, we covered that before, but right now is a really, really important time, we see you know the hurricanes come through, we see the floods in Texas, we've seen Florida, and Puerto Rico now on the main conversation. You're seeing it, you're seeing disasters happen. Disaster recovery's been the low hanging fruit for you guys, and we talked about this when New York City got flooded years and years ago. This is a huge issue for IT, because they have to have disaster recovery. But now it's moving more beyond just disaster recovery. It's cloud. What's the update from WANdisco? You guys have a unique perspective on this. >> Yes, absolutely. So we have capabilities to replicate between the cloud and Hadoop multi data centers across geos, so disasters are not a problem for us. And we have some unique technologies we use. One of the things we do is we can replicate in an active-active mode between different cloud vendors, between cloud and on-prem Hadoop, and we are the only game in town. Nobody else can do that. >> So okay let me just stop right there. When you say the only game in town I got a little skeptic here. Are you saying that nobody does active-active replication at all? >> That is exactly what I'm saying. We had some wonderful announcements from Hortonworks, they have a great product called the Dataplane. But if you dig deep, you'll find that it's actually an active-passive architecture, because to do active-active, you need this capability called the Paxos algorithm for resolving conflict. That's a very hard algorithm to implement. We have over 10 years' experience in that. That's what gives us our ability to do this active-active replication, between clouds, between on-prem and cloud. >> All right so just to take that a step further, I know we're having a CTO conversation, but the classic cliche is skate to where the puck is going to be. So you kind of didn't just decide one morning you're going to be the active-active for cloud. You kind of backed into this. You know the world spun in your direction, the puck came to you guys. Is that a fair statement? >> That is a very fair statement. We've always known there's tremendous value in this technology we own, and with the global infrastructure trends, we knew that this was coming. It wasn't called the cloud when we started out, but that's exactly what it is now, and we're benefiting from it. >> And the cloud is just a data center, it's just, you don't own it. (mumbles) Peter, what's your reaction to this? Because when he says only game in town, implies some scarcity. >> Well, WANdisco has a patent, and it actually is very interesting technology, if I can summarize very quickly. You do continuous replication based on writes that are performed against the database, so that you can have two writers and two separate databases and you guarantee that they will be synchronized at some point in time because you guarantee that the writing of the logs and the messaging to both locations >> Absolutely. >> in order, which is a big issue. You guys put a stamp on the stuff, and it actually writes to the different locations with order guaranteed, and that's not the way most replication software works. >> Yes, that's exactly right. That's very hard to do, and that's the only way for you to allow your clients in different data centers to write to the same data store, whether it's a database, a Hadoop folder, whether it's a bucket in a cloud object store, it doesn't matter. The core fact remains, the Paxos algorithm is the only way for you to do active-active replication, and ours is the only Paxos implementation that can work over the >> John: And that's patented by you guys? >> Yes, it's patented. >> And so someone to replicate that, they'd have to essentially reverse engineer and have a little twist on it to not get around the patents. Are you licensing the technology or are you guys hoarding it for yourselves? >> We have different ways of engaging with partners. We are very reasonable with that, and we work with several powerful partners >> So you partner with the technology. >> Yes. >> But the key thing, John, in answer to your question is that it's unassailable. I mean there's no argument, that is, companies move more towards a digital way of doing things, largely driven by what customers want, your data becomes more of an asset. As you data becomes more of an asset, you make money by using that data in more places, more applications and more times. That is possible with data, but the problem you end up with consistency issues, and for certain applications, it's not an issue, you're basically writing, or if you're basically reading data it's not an issue. But the minute that you're trying to write on behalf of a particular business event or a particular value proposition, then now you have a challenge, you are limited in how you can do it unless you have this kind of a technology. And so this notion of continuous replication in a world that's going to become increasingly dependent upon data, data that is increasingly distributed, data that you want to ensure has common governance and policy in place, technologies like WANdisco provides are going to be increasingly important to the overall way that a business organizes itself, institutes its work and makes sure it takes care of its data assets. >> Okay, so my next question then, thanks for the clarification, it's good input there and thanks for summarizing it like that, 'cause I couldn't have done that. But when we last talked, I always was enamored by the fact that you guys have the data center replication thing down. I always saw that as a great thing for you guys. Okay, I get that, that's an on-premise situation, you have active-active, good for disaster recovery, lot of use cases, people should be beating down your door 'cause you have a better mousetrap, I get that. Now how does that translate to the cloud? So take me through why the cloud now fits nicely with that same paradigm. >> So, I mean, these are industry trends, right. What we've found is that the cloud object stores are very, very cost effective and efficient, so customers are moving towards that. They're using their Hadoop applications but on cloud object stores. Now it's trivial for us to add plugins that enable us to replicate between a cloud object store on one side, and a Hadoop on the other side. It could also be another cloud object store from a different cloud provider on the other side. Once you have that capability, now customers are freed from lock-in from either a cloud vendor or a Hadoop vendor, and they love that, they're looking at it as another way to leverage their data assets. And we enable them to do that without fear of lock-in from any of these vendors. >> So on the cloud side, the regions have always been a big thing. So we've heard Amazon have a region down here, and there was fix it. We saw at VMworld push their VMware solution to only one western region. What's the geo landscape look like in the cloud? Does that relate to anything in your tech? >> So yes, it does relate, and one of the things that people forget is that when you create an Amazon S3 bucket, for example, you specify a region. Well, but this is the cloud, isn't it worldwide? Turns out that object store actually resides in one region, and you can use some shaky technologies like cross-region replication to eventually get the data to the other region. >> Peter: Which just boosts the prices you pay. >> Yes, not just boost the price. >> Well they're trying to save price but then they're exposed on reliability. >> Reliability, exactly. You don't know when the data's going to be there, there are no guarantees. What we offer is, take your cloud storage, but we'll guarantee that we can replicate it in a synchronous fashion to another region. Could be the same provider, could be another provider. That gives tremendous benefits to the customers. >> So you actually have a guarantee when you go to customers, say with an SLA guarantee? Do you back it up with like money back, what's the guarantee? >> So the guarantees are, you know we are willing to back it up with contracts and such like, and our customers put us through rigorous testing procedures, naturally. But we stand up to every one of those. We can scale and maintain the consistency guarantees that they need for modern businesses. >> Okay, so take me through the benefits. Who wants this? Because you can almost get kind of sucked into the complexities of it, and the nuances of cloud and everything as Peter laid out, it's pretty complex even as he simplified it. Who buys this? (laughs) I mean, who's the guy, is it the IT department, is it the ops guy, is it the facilities, who... >> So we sell to the IT departments, and they absolutely love the technology. But to go back to your initial statement, we have all these disasters happening, you know, hopefully people are all doing reasonably okay at the end of these horrible disasters, but if you're an enterprise of any size, it doesn't have to be a big enterprise, you cannot go back to your users or customers and say that because of a hurricane you cannot have access to your data. That's sometimes legally not allowed, and other times it's just suicide for a business >> And HPE in Houston, it's a huge plant down there. >> Jagane: Indeed. >> They got hit hard. >> Yep, in those sort of circumstances, you want to make sure that your data is available in multiple data centers spread throughout the world, and we give you that capability. >> Okay, what are some of the successes? Let's talk through now, obviously you've got the technology, I get that. Where's the stakes in the ground? Who's adopting it? I know you do a lot of biz dev deals. I don't know if they're actually OEM-type deals, or they're just licensing deals. Take us through to where your successes are with this technology. >> So, biz dev wise, we have a mix of OEM deals and licenses and co-selling agreements. The strong ones are all OEMs, of course. We have great partnerships with IBM, Amazon, Microsoft, just wonderful partnerships. The actual end customers, we started off selling mostly to the financial industry because they have a legal mandate, so they were the first to look into this sort of a thing. But now we've expanded into automobile companies. A lot of the auto companies are generating vast amounts of data from their cars, and you can't push all that data into a single data center, that's just not reasonable. You want to push that data into a single data store that's distributed across the world in just wherever the car is closest to. We offer that capability that nobody else can, so that we've got big auto manufacturers signed up, we've got big retailers signed up for exactly the same capability. You cannot imagine ingesting all that data into a single location. You want this replicated across, you want it available no matter what happens to any single region or a data center. So we've got tremendous success in retail, banking, and a lot of this is through partnerships again. >> Well congratulations, I got to ask, you know, what's new with you guys? Obviously you have success with the active-active. We'll dig into the Hortonworks things to check your comment around them not having it, so we'll certainly look with the Dataplane, which we like. We interviewed Rob Bearden. Love the announcement, but they don't have the active-active, we're going to document that, and get that on the record. But you guys are doing well. What's new here, what's in New York, what are some of your wins, can you just give a quick update on what's going on at WANdisco? >> Okay, so quick recap, we love the Hortonworks Dataplane as well. We think that we can build value into that ecosystem by building a plugin for them. And we love the whole technology. I have wonderful friends there as well. As for our own company, we see all of our, a lot of our business coming from cloud and hybrid environments. It's just the reality of the situation. You had, you know, 20 years ago, you had NFS, which was the great appender of all storage, but turned out to be very expensive, and you had 10 years, seven years ago you had HDFS come along, and that appended the cost model of NFS and SANs, which those industries were still working their way through. And now we have cloud object stores, which have appended the HDFS model, it's much more cost-efficient to operate using cloud object stores. So we will be there, we have replication products for that. >> John: And you're in the major clouds, you in Azure? >> Yes, we are in Azure. >> Google? >> Jagane: Yes, absolutely. >> AWS? >> AWS, of course. >> Oracle? >> Oracle, of course. >> So you got all the top four companies. >> We're in all of them. >> All right, so here's the next question is, >> And you're also in IBM stuff too. >> Yes, we're built tightly into IBM >> So you've got a pretty strong legacy >> And a monopoly. >> On the mainframe. >> Like the fiber channel of replication. (John and Jagane laugh) That was a bad analogy. I mean it's like... Well, I mean fiber channel has only limited suppliers 'cause they have unique technology, it was highly important. >> But the basic proposition is look, any customer that wants to ensure that a particular data source is going to be available in a distributed way, and you're going to have some degree of consistency, is going to look at this as an option. >> Yes. >> Well you guys certainly had a great team under your leadership, it's got great tech. The final question I have for you here is, you know, we've had many conversations about the industry, we like to pontificate, I certainly like to speculate, but now we have eight years of history now in the big data world, we look back, you know, we're doing our own event in New York City, you know, thanks to great support from you guys and other great friends in the community. Appreciate everyone out there supporting theCUBE, that's awesome. But the world's changed. So I got to ask you, you're a student of the industry, I know that and knowing you personally. What's been the success formula that keeps the winners around today, and what do people need to do going forward? 'Cause we've seen the train wreck, we've seen the dead bodies in the industry, we've kind of seen what's happened, there've been some survivors. Why did the current list of characters and companies survive, and what's the winning formula in your opinion to stay relevant as big data grows in a huge way from IoT to AI cloud and everything in between? >> I'll quote Stephen Hawking in this. Intelligence is the capability to adapt to changes. That's what keeps industries, that's what keeps companies, that what keeps executives around. If you can adapt to change, if you can see things coming, and adapt your core values, your core technology to that, you can offer customers a value proposition that's going to last a long time. >> And in a big data space, what is that adaptive key focus, what should they be focused on? >> I think at this point, it's extracting information from this volume of data, whether you use machine learning in the modern days, or whether it was simple hive queries, that's the value proposition, and making sure the data's available everywhere so you can do that processing on it, that remains the strength. >> So the whole concept of digital business suggests that increasingly we're going to see our assets rendered in some form as data. >> Yes. >> And we want to be able to ensure that that data is able to be where it needs to be when it needs to be there for any number of reasons. It's a very, very interesting world we're entering into. >> Peter, I think you have a good grasp on this, and I love the narrative of programming the world in real time. What's the phrase you use? It's real time but it's programming the world... Programming the real world. >> Yeah, programming the real world. >> That's a huge, that means something completely, it's not a tech, it's a not a speed or feed. >> Well the way we think about it, is that we look at IoT as a big information transducer, where information's in one form, and then you turn it into another form to do different kinds of work. And that big data's a crucial feature in how you take data from one form and turn it into another form so that it can perform work. But then you have to be able to turn that around and have it perform work back in the real world. There's a lot of new development, a lot of new technology that's coming on to help us do that. But any way you look at it, we're going to have to move data with some degree of consistency, we're still going to have to worry about making sure that if our policy says that that action needs to take place there, and that action needs to take place there, that it actually happens the way we want it to, and that's going to require a whole raft of new technologies. We're just at the very beginning of this. >> And active-active, things like active-active in what you're talking about really is about value creation. >> Well the thing that makes active-active interesting is, again, borrowing from your terms, it's a new term to both of us, I think, today. I like it actually. But the thing that makes it interesting is the idea that you can have a source here that is writing things, and you can have a source over there that are writing things, and as a consequence, you can nonetheless look at a distributed database and keep it consistent. >> Consistent, yeah. >> And that is a major, major challenge that's going to become increasingly a fundamental feature of our digital business as well. >> It's an enabling technology for the value creation and you call it work. >> Yeah, that's right. >> Transformation of work. Jagane, congratulations on the active-active, and WANdiscos's technology and all your deals you're doing, got all the cloud locked up. What's next? Well you going to lock up the edge? You're going to lock up the edge too, the cloud. >> We do like this notion of the edge cloud and all the intermediate steps. We think that replicating data between those systems or running consistent compute across those systems is an interesting problem for us to solve. We've got all the ingredients to solve that problem. We will be on that. >> Jagane Sundar, CTO of WANdisco, back on theCUBE, bringing it down. New tech, whole new generation of modern apps and infrastructure happening in distributed and decentralized networks. Of course theCUBE's got it covered for you, and more live coverage here in New York City for BigData NYC, our annual event, theCUBE and Wikibon here in Hell's Kitchen in Manhattan, more live coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media great to see you again as usual here on theCUBE. Thank you John, thank you Peter, Disaster recovery's been the low hanging fruit for you guys, One of the things we do is we can replicate Are you saying that nobody does because to do active-active, you need this capability the puck came to you guys. and with the global infrastructure trends, And the cloud is just a data center, and the messaging to both locations You guys put a stamp on the stuff, is the only way for you to do active-active replication, or are you guys hoarding it for yourselves? and we work with several powerful partners But the key thing, John, in answer to your question that you guys have the data center replication thing down. Once you have that capability, Does that relate to anything in your tech? and you can use some shaky technologies but then they're exposed on reliability. Could be the same provider, could be another provider. So the guarantees are, you know we are willing to is it the ops guy, is it the facilities, who... you cannot have access to your data. And HPE in Houston, and we give you that capability. I know you do a lot of biz dev deals. and you can't push all that data into a single data center, and get that on the record. and that appended the cost model of NFS and SANs, So you got all Like the fiber channel of replication. But the basic proposition is look, in the big data world, we look back, you know, Intelligence is the capability to adapt to changes. and making sure the data's available everywhere So the whole concept of digital business is able to be where it needs to be What's the phrase you use? That's a huge, that means something completely, that it actually happens the way we want it to, in what you're talking about really is about is the idea that you can have a source here that's going to become increasingly and you call it work. Well you going to lock up the edge? We've got all the ingredients to solve that problem. and more live coverage here in New York City

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Jagane Sundar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Jagane	PERSON	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Stephen Hawking	PERSON	0.99+
two writers	QUANTITY	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
Puerto Rico	LOCATION	0.99+
Texas	LOCATION	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
VMworld	ORGANIZATION	0.99+
Florida	LOCATION	0.99+
Google	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
both	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two separate databases	QUANTITY	0.99+
20 years ago	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
first	QUANTITY	0.99+
WANdiscos	ORGANIZATION	0.98+
over 10 years'	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
one form	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.97+
One	QUANTITY	0.97+
today	DATE	0.97+
seven years ago	DATE	0.96+
one	QUANTITY	0.96+
one region	QUANTITY	0.96+
Hadoop	TITLE	0.96+
Hortonworks Dataplane	ORGANIZATION	0.95+
NYC	LOCATION	0.95+
four companies	QUANTITY	0.94+
single region	QUANTITY	0.94+
years	DATE	0.93+
Dataplane	ORGANIZATION	0.91+
single location	QUANTITY	0.91+
single data center	QUANTITY	0.91+
HPE	ORGANIZATION	0.9+
one side	QUANTITY	0.9+
one western	QUANTITY	0.89+
Paxos	TITLE	0.89+
Paxos	OTHER	0.88+
both locations	QUANTITY	0.88+
10 years	QUANTITY	0.88+
BigData	EVENT	0.87+
Azure	TITLE	0.86+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hortonworks Dataplane: