Partha Seetala, Robin Systems | DataWorks Summit 2018
>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone, you are watching day two of theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight. I'm coming at you with my cohost Jame Kobielus. We're joined by Partha Seetala, he is the Chief Technology Officer at Robin Systems, thanks so much for coming on theCUBE. >> Pleasure to be here. >> You're a first timer, so we promise we don't bite. >> Actually I'm not, I was on theCUBE- >> Oh! >> At DockerCon in 2016. >> Oh well excellent, okay, so now you're a veteran, right. >> Yes, ma'am. >> So Robin Systems, as before the cameras were rolling, we were talking about it, it's about four years old, based here in San Jose, venture backed company. Tell us a little bit more about the company and what you do. >> Absolutely. First of all, thanks for hosting me here. Like you said, Robin is a Silicon Valley based company. Our focus is in allowing applications, such as big data, databases, no sequel and AI ML, to run within the Kubernetes platform. What we have built is a product that converges storage, complex storage, networking, application workflow management, along with Kubernetes to create a one click experience where users can get managed services kind of feel when they're deploying these applications. They can also do one click life cycle management on these apps. Our thesis has initially been to, instead of looking at this problem from an infrastructure up into application, to actually look at it from the applications down and then say, "Let the applications drive the underlying infrastructure to meet the user's requirements." >> Is that your differentiating factor, would you say? >> Yeah, I think it is because most of the folks out there today are looking at is as if it's a competent based play, it's like they want to bring storage to Kubernetes or networking to Kubernetes but the challenges are not really around storage and networking. If you talk to the operations folk they say that, "You know what? Those are underlying problems but my challenge is more along the lines of, okay, my CIO says the initiative is to make my applications mobile. They want go across to different Clouds. That's my challenge." The line of business user says, "I want to get a managed source experience." Yes, storage is the thing that you want to manage underneath, but I want to go and click and create my, let's say, an Oracle database or distributions log. >> In terms of the developer experience here, from the application down, give us a sense for how Robin Systems tooling your product enables that degree of specification of the application logic that will then get containerized within? >> Absolutely, like I said, we want applications to drive the infrastructure. What it means is that we, Robin is a software platform. We later ourselves on top of the machines that we sit on whether it is bare metal machines on premises, our VMs, or even an Azure, Google Cloud as well as AWs. Then we make the underlying compute, storage, network resources almost invisible. We treat it as a pool of resources. Now once you have this pool of resources, they can be attached to the applications that are being deployed as can inside containers. I mean, it's a software place, install on machines. Once it's installed, the experience now moves away from infrastructure into applications. You log in, you can see a portal, you have a lot of applications in that portal. We ship support for about 25 applications of some such. >> So these are templates? >> Yes. >> That the developer can then customize to their specific requirements? Or no? >> Absolutely, we ship reference templates for pretty much a wide variety of the most popular big data, no sequel, database, AI ML applications today. But again, as I said, it's a reference implementation. Typically customers take the reference recommendation and they enhance it or they use that to onboard their custom apps, for example, or the apps that we don't ship out of the box. So it's a very open, extensible platform but the goal being that whatever the application might be, in fact we keep saying that, if it runs somewhere else, it's runs on Robin, right? So the idea here is that you can bring anything, and we just, the flip of switch, you can make it a one click deploy, one click manage, one click mobile across Clouds. >> You keep mentioning this one click and this idea of it being so easy, so convenient, so seamless, is that what you say is the biggest concern of your customers? Is this ease and speed? Or what are some other things that are on their minds that you want to deliver? >> Right, so one click of course is a user experience part but what is the real challenge? The real challenges, there are a wide variety of tools being used by enterprises today. Even the data analytic pipeline, there's a lot across the data store, processor pipeline. Users don't want to deal with setting it up and keeping it up and running. They don't want that, they want to get the job done, right? Now when you only get the job done, you really want to hide the underlying details of those platforms and the best way to convey that, the best way to give that experience is to make it a single click experience from the UI. So I keep calling it all one click because that is the experience that you get to hide the underlying complexity for these apps. >> Does your environment actually compile executable code based on that one click experience? Or where does the compilation and containerization actually happen in your distributed architecture? >> Alright, so, I think the simplest- >> You're a prem based offering, right? You're not in the Cloud yourself? >> No, we are. We work on all the three big public clouds. >> Oh, okay. >> Whether it is Azure, AWS or Google. >> So your entire application is containerized itself for deployment into these Clouds? >> Yes, it is. >> Okay. >> So the idea here is let's simplify it significantly, right? You have Kubernetes today, it can run anywhere, on premises, in the public Cloud and so on. Kubernetes is a great platform for orchestrating containers but it is largely inaccessible to a certain class of data centric applications. >> Yeah. >> We make that possible. But our take is, just onboarding those applications on Kubernetes does not solve your CXO or you line of business user's problems. You ought to make the management, from an application point of view, not from a container management point of view, from an application point of view, a lot easier and that is where we kind of create this experience that I'm talking about, one click experience. >> Give us a sense for how, we're here at DataWorks and it's the Hortonworks show. Discuss with us your partnership with Hortonworks and you know, we've heard the announcement of HDP 3.0 and containerization support, just give us a rough sense for how you align or partner with Hortonworks in this area. >> Absolutely. It's kind of interesting because Hortonworks is a data management platform, if you think about it from that point of view and when we engaged with them first- So some of our customers have been using the product, Hortonworks, on top of Robin, so orchestrating Hortonworks, making it a lot easier to use. >> Right. >> One of the requirements was, "Are you certified with Hortonworks?" And the challenge that Hortonworks also had is they had never certified a container based deployment of Hortonworks before. They actually were very skeptical, you know, "You guys are saying all these things. Can you actually containerize and run Hortonworks?" So we worked with Hortonworks and we are, I mean if you go to the Hortonworks website, you'll see that we are the first in the entire industry who have been certified as a container based play that can actually deploy and manage Hortonworks. They have certified us by running a wide variety of tests, which they call the Q80 Test Suite, and when we got certified the only other players in the market that got that stamp of approval was Microsoft in Azure and EMC with Isilon. >> So you're in good company? >> I think we are in great company. >> You're certified to work with HTP 3.0 or the prior version or both? >> When we got certified we were still in the 2.X version of Hortonworks, HTP 3.0 is a more relatively newer version. But our plan is that we want to continue working with Hortonworks to get certified as they release the program and also help them because HTP 3.0 also has some container based orchestration and deployment so you want to help them provide the underlying infrastructure so that it becomes easier for beyond to spin up more containers. >> The higher level security and governance and all these things you're describing, they have to be over the Kubernetes layer. Hortonworks supports it in their data plane services portfolio. Does Robin Systems solutions portfolio tap in to any of that, or do you provide your own layer of sort of security and metadata management so forth? >> Yeah, so we don't want- >> In context of what you offer? >> Right, so we don't want to take away the security model that the application itself provides because might have step it up so that they are doing governance, it's not just logging in and auto control and things like this. Some governance is built into. We don't want to change that. We want to keep the same experience and the same workflow hat customers have so we just integrate with whatever security that the application has. We, of course, provide security in terms of isolating these different apps that are running on the Robin platform where the security or the access into the application itself is left to the apps themselves. When I say apps, I'm talking about Hortonworks. >> Yeah, sure. >> Or any other databases. >> Moving forward, as you think about ways you're going to augment and enhance and alter the Robin platform, what are some of the biggest trends that are driving your decision making around that in the sense of, as we know that companies are living with this deluge of data, how are you helping them manage it better? >> Sure. I think there are a few trends that we are closely watching. One is around Cloud mobility. CIOs want their applications along with their data to be available where their end users are. It's almost like follow the sun model, where you might have generated the data in one Cloud and at a different time, different time zone, you'll basically want to keep the app as well as data, moving. So we are following that very closely. How we can enable the mobility of data and apps a lot easier in that world. The other one is around the general AI ML workflow. One of the challenges there, of course, you have great apps like TensorFlow or Theano or Caffe, these are very good AI ML toolkits but one of the challenges that people face, is they are buying this very expensive, let's say NVIDIA DGX Box, this box costs about $150,000 each, how do you keep these boxes busy so that you're getting a good return on investment? It will require you to better manage the resources offered with these boxes. We are also monitoring that space and we're seeing that how can we take the Robin platform and how do you enable the better utilization of GPUs or the sharing of GPUs for running your AI ML kind of workload. >> Great. >> Those are, I think, two key trends that we are closely watching. >> We'll be discussing those at the next DataWorks Summit, I'm sure, at some other time in the future. >> Absolutely. >> Thank you so much for coming on theCUBE, Partha. >> Thank you. >> Thank you, my pleasure. Thanks. >> I'm Rebecca Knight for James Kobielus, We will have more from DataWorks coming up in just a little bit. (techno beat music)
SUMMARY :
in the heart of Silicon Valley, he is the Chief Technology we promise we don't bite. so now you're a veteran, right. and what you do. from the applications down Yes, storage is the thing that you want the machines that we sit on or the apps that we don't because that is the No, we are. So the idea here is let's and that is where we kind of create and it's the Hortonworks show. if you think about it One of the requirements was, or the prior version or both? the underlying infrastructure so that to any of that, or do you that are running on the Robin platform the Robin platform and how do you enable that we are closely watching. at the next DataWorks Summit, Thank you so much for Thank you, my pleasure. We will have more from DataWorks
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca Knight | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Jame Kobielus | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Robin Systems | ORGANIZATION | 0.99+ |
Partha Seetala | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
one click | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
one | QUANTITY | 0.99+ |
2016 | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
HTP 3.0 | TITLE | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
DataWorks | ORGANIZATION | 0.99+ |
Robin | ORGANIZATION | 0.98+ |
Kubernetes | TITLE | 0.98+ |
One | QUANTITY | 0.98+ |
TensorFlow | TITLE | 0.98+ |
about $150,000 each | QUANTITY | 0.98+ |
about 25 applications | QUANTITY | 0.98+ |
one click | QUANTITY | 0.98+ |
Partha | PERSON | 0.98+ |
Isilon | ORGANIZATION | 0.97+ |
DGX Box | COMMERCIAL_ITEM | 0.97+ |
today | DATE | 0.96+ |
First | QUANTITY | 0.96+ |
DockerCon | EVENT | 0.96+ |
Azure | ORGANIZATION | 0.96+ |
Theano | TITLE | 0.96+ |
DataWorks Summit 2018 | EVENT | 0.95+ |
theCUBE | ORGANIZATION | 0.94+ |
Caffe | TITLE | 0.91+ |
Azure | TITLE | 0.91+ |
Robin | PERSON | 0.91+ |
Robin | TITLE | 0.9+ |
two key trends | QUANTITY | 0.89+ |
HDP 3.0 | TITLE | 0.87+ |
EMC | ORGANIZATION | 0.86+ |
single click | QUANTITY | 0.86+ |
day two | QUANTITY | 0.84+ |
DataWorks Summit | EVENT | 0.83+ |
three big public clouds | QUANTITY | 0.82+ |
DataWorks | EVENT | 0.81+ |
Tendü Yogurtçu, Syncsort | DataWorks Summit 2018
>> Live from San Jose, in the heart of Silicon Valley, It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California, I'm your host, along with my cohost, James Kobielus. We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. Thanks so much for coming on theCUBE, for returning to theCUBE I should say. >> Thank you Rebecca and James. It's always a pleasure to be here. >> So you've been on theCUBE before and the last time you were talking about Syncsort's growth. So can you give our viewers a company update? Where you are now? >> Absolutely, Syncsort has seen extraordinary growth within the last the last three year. We tripled our revenue, doubled our employees and expanded the product portfolio significantly. Because of this phenomenal growth that we have seen, we also embarked on a new initiative with refreshing our brand. We rebranded and this was necessitated by the fact that we have such a broad portfolio of products and we are actually showing our new brand here, articulating the value our products bring with optimizing existing infrastructure, assuring data security and availability and advancing the data by integrating into next generation analytics platforms. So it's very exciting times in terms of Syncsort's growth. >> So the last time you were on the show it was pre-GT prop PR but we were talking before the cameras were rolling and you were explaining the kinds of adoption you're seeing and what, in this new era, you're seeing from customers and hearing from customers. Can you tell our viewers a little bit about it? >> When we were discussing last time, I talked about four mega trends we are seeing and those mega trends were primarily driven by the advanced business and operation analytics. Data governance, cloud, streaming and data science, artificial intelligence. And we talked, we really made a lot of announcement and focus on the use cases around data governance. Primarily helping our customers for the GDPR Global Data Protection Regulation initiatives and how we can create that visibility in the enterprise through the data by security and lineage and delivering trust data sets. Now we are talking about cloud primarily and the keynotes, this event and our focus is around cloud, primarily driven by again the use cases, right? How the businesses are adopting to the new era. One of the challenges that we see with our enterprise customers, over 7000 customers by the way, is the ability to future-proof their applications. Because this is a very rapidly changing stack. We have seen the keynotes talking about the importance of how do you connect your existing infrastructure with the future modern, next generation platforms. How do you future-proof the platform, make a diagnostic about whether it's Amazon, Microsoft of Google Cloud. Whether it's on-premise in legacy platforms today that the data has to be available in the next generation platforms. So the challenge we are seeing is how do we keep the data fresh? How do we create that abstraction that applications are future-proofed? Because organizations, even financial services customers, banking, insurance, they now have at least one cluster running in the public cloud. And there's private implementations, hybrid becomes the new standard. So our focus and most recent announcements have been around really helping our customers with real-time resilient changes that capture, keeping the data fresh, feeding into the downstream applications with the streaming and messaging data frames, for example Kafka, Amazon Kinesis, as well as keeping the persistent stores and how to Data Lake on-premise in the cloud fresh. >> Puts you into great alignment with your partner Hortonworks so, Tendu I wonder if we are here at DataWorks, it's Hortonworks' show, if you can break out for our viewers, what is the nature, the levels of your relationship, your partnership with Hortonworks and how the Syncsort portfolio plays with HDP 3.0 with Hortonworks DataFlow and the data plan services at a high level. >> Absolutely, so we have been a longtime partner with Hortonworks and a couple of years back, we strengthened our partnership. Hortonworks is reselling Syncsort and we have actually a prescriptive solution for Hadoop and ETL onboarding in Hadoop jointly. And it's very complementary, our strategy is very complementary because what Hortonworks is trying and achieving, is creating that abstraction and future-proofing and interaction consistency around referred as this morning. Across the platform, whether it's on-premise or in the cloud or across multiple clouds. We are providing the data application layer consistency and future-proofing on top of the platform. Leveraging the tools in the platform for orchestration, integrating with HTP, certifying with Trange or HTP, all of the tools DataFlow and at last of course for lineage. >> The theme of this conference is ideas, insights and innovation and as a partner of Hortonworks, can you describe what it means for you to be at this conference? What kinds of community and deepening existing relationships, forming new ones. Can you talk about what happens here? >> This is one of the major events around data and it's DataWorks as opposed to being more specific to the Hadoop itself, right? Because stack is evolving and data challenges are evolving. For us, it means really the interactions with the customers, the organizations and the partners here. Because the dynamics of the use cases is also evolving. For example Data Lake implementations started in U.S. And we started MER European organizations moving to streaming, data streaming applications faster than U.S. >> Why is that? >> Yeah. >> Why are Europeans moving faster to streaming than we are in North America? >> I think a couple of different things might participate. The open sources really enabling organizations to move fast. When the Data Lake initiative started, we have seen a little bit slow start in Europe but more experimentation with the Open Source Stack. And by that the more transformative use cases started really evolving. Like how do I manage interactions of the users with the remote controls as they are watching live TV, type of transformative use cases became important. And as we move to the transformative use cases, streaming is also very critical because lots of data is available and being able to keep the cloud data stores as well as on-premise data stores and downstream applications with fresh data becomes important. We in fact in early June announced that Syncsort's now's a part of Microsoft One Commercial Partner Program. With that our integrate solutions with data integration and data quality are Azure gold certified and Azure ready. We are in co-sale agreement and we are helping jointly a lot of customers, moving data and workloads to Azure and keeping those data stores close to platforms in sync. >> Right. >> So lots of exciting things, I mean there's a lot happening with the application space. There's also lots still happening connected to the governance cases that we have seen. Feeding security and IT operations data into again modern day, next generation analytics platforms is key. Whether it's Splunk, whether it's Elastic, as part of the Hadoop Stack. So we are still focused on governance as part of this multi-cloud and on-premise the cloud implementations as well. We in fact launched our Ironstream for IBMI product to help customers, not just making this state available for mainframes but also from IBMI into Splunk, Elastic and other security information and event management platforms. And today we announced work flow optimization across on-premise and multi-cloud and cloud platforms. So lots of focus across to optimize, assure and integrate portfolio of products helping customers with the business use cases. That's really our focus as we innovate organically and also acquire technologies and solutions. What are the problems we are solving and how we can help our customers with the business and operation analytics, targeting those mega trends around data governance, cloud streaming and also data science. >> What is the biggest trend do you think that is sort of driving all of these changes? As you said, the data is evolving. The use cases are evolving. What is it that is keeping your customers up at night? >> Right now it's still governance, keeping them up at night, because this evolving architecture is also making governance more complex, right? If we are looking at financial services, banking, insurance, healthcare, there are lots of existing infrastructures, mission critical data stores on mainframe IBMI in addition to this gravity of data changing and lots of data with the online businesses generated in the cloud. So how to govern that also while optimizing and making those data stores available for next generation analytics, makes the governance quite complex. So that really keeps and creates a lot of opportunity for the community, right? All of us here to address those challenges. >> Because it sounds to me, I'm hearing Splunk, Advanced Machine did it, I think of the internet of things and sensor grids. I'm hearing IBM mainframes, that's transactional data, that's your customer data and so forth. It seems like much of this data that you're describing that customers are trying to cleanse and consolidate and provide strict governance on, is absolutely essential for them to drive more artificial intelligence into end applications and mobile devices that are being used to drive the customer experience. Do you see more of your customers using your tools to massage the data sets as it were than data scientists then use to build and train their models for deployment into edge applications. Is that an emerging area where your customers are deploying Syncsort? >> Thank you for asking that question. >> It's a complex question. (laughing) But thanks for impacting it... >> It is a complex question but it's very important question. Yes and in the previous discussions, we have seen, and this morning also, Rob Thomas from IBM mentioned it as well, that machine learning and artificial intelligence data science really relies on high-quality data, right? It's 1950s anonymous computer scientist says garbage in, garbage out. >> Yeah. >> When we are using artificial intelligence and machine learning, the implications, the impact of bad data multiplies. Multiplies with the training of historical data. Multiplies with the insights that we are getting out of that. So data scientists today are still spending significant time on preparing the data for the iPipeline, and the data science pipeline, that's where we shine. Because our integrate portfolio accesses the data from all enterprise data stores and cleanses and matches and prepares that in a trusted manner for use for advanced analytics with machine learning, artificial intelligence. >> Yeah 'cause the magic of machine learning for predictive analytics is that you build a statistical model based on the most valid data set for the domain of interest. If the data is junk, then you're going to be building a junk model that will not be able to do its job. So, for want of a nail, the kingdom was lost. For want of a Syncsort, (laughing) Data cleansing and you know governance tool, the whole AI superstructure will fall down. >> Yes, yes absolutely. >> Yeah, good. >> Well thank you so much Tendu for coming on theCUBE and for giving us a lot of background and information. >> Thank you for having me, thank you. >> Good to have you. >> Always a pleasure. >> I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of DataWorks 2018 just after this. (upbeat music)
SUMMARY :
in the heart of Silicon Valley, It's theCUBE, We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. It's always a pleasure to be here. and the last time you were talking about Syncsort's growth. and expanded the product portfolio significantly. So the last time you were on the show it was pre-GT prop One of the challenges that we see with our enterprise and how the Syncsort portfolio plays with HDP 3.0 We are providing the data application layer consistency and innovation and as a partner of Hortonworks, can you Because the dynamics of the use cases is also evolving. When the Data Lake initiative started, we have seen a little What are the problems we are solving and how we can help What is the biggest trend do you think that is businesses generated in the cloud. massage the data sets as it were than data scientists It's a complex question. Yes and in the previous discussions, we have seen, and the data science pipeline, that's where we shine. If the data is junk, then you're going to be building and for giving us a lot of background and information. of DataWorks 2018 just after this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
James | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Tendu Yogurtcu | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Syncsort | ORGANIZATION | 0.99+ |
1950s | DATE | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Hortonworks' | ORGANIZATION | 0.99+ |
North America | LOCATION | 0.99+ |
early June | DATE | 0.99+ |
DataWorks | ORGANIZATION | 0.99+ |
over 7000 customers | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
Elastic | TITLE | 0.97+ |
one | QUANTITY | 0.96+ |
today | DATE | 0.96+ |
IBMI | TITLE | 0.96+ |
four | QUANTITY | 0.95+ |
Splunk | TITLE | 0.95+ |
Tendü Yogurtçu | PERSON | 0.95+ |
Kafka | TITLE | 0.94+ |
this morning | DATE | 0.94+ |
Data Lake | ORGANIZATION | 0.93+ |
DataWorks | TITLE | 0.92+ |
iPipeline | COMMERCIAL_ITEM | 0.91+ |
DataWorks 2018 | EVENT | 0.91+ |
Splunk | PERSON | 0.9+ |
ETL | ORGANIZATION | 0.87+ |
Azure | TITLE | 0.85+ |
Google Cloud | ORGANIZATION | 0.83+ |
Hadoop | TITLE | 0.82+ |
last three year | DATE | 0.82+ |
couple of years back | DATE | 0.81+ |
Syncsort | PERSON | 0.8+ |
HTP | TITLE | 0.78+ |
European | OTHER | 0.77+ |
Tendu | PERSON | 0.74+ |
Europeans | PERSON | 0.72+ |
Data Protection Regulation | TITLE | 0.71+ |
Kinesis | TITLE | 0.7+ |
least one cluster | QUANTITY | 0.7+ |
Ironstream | COMMERCIAL_ITEM | 0.66+ |
Program | TITLE | 0.61+ |
Azure | ORGANIZATION | 0.54+ |
Commercial Partner | OTHER | 0.54+ |
DataFlow | TITLE | 0.54+ |
One | TITLE | 0.54+ |
CTO | PERSON | 0.53+ |
3.0 | TITLE | 0.53+ |
Trange | TITLE | 0.53+ |
Stack | TITLE | 0.51+ |
Arun Murthy, Hortonworks - Spark Summit East 2017 - #SparkSummit - #theCUBE
>> [Announcer] Live, from Boston, Massachusetts, it's the Cube, covering Spark Summit East 2017, brought to you by Data Breaks. Now, your host, Dave Alante and George Gilbert. >> Welcome back to snowy Boston everybody, this is The Cube, the leader in live tech coverage. Arun Murthy is here, he's the founder and vice president of engineering at Horton Works, father of YARN, can I call you that, godfather of YARN, is that fair, or? (laughs) Anyway. He's so, so modest. Welcome back to the Cube, it's great to see you. >> Pleasure to have you. >> Coming off the big keynote, (laughs) you ended the session this morning, so that was great. Glad you made it in to Boston, and uh, lot of talk about security and governance, you know we've been talking about that years, it feels like it's truly starting to come into the main stream Arun, so. >> Well I think it's just a reflection of what customers are doing with the tech now. Now, three, four years ago, a lot of it was pilots, a lot of it was, you know, people playing with the tech. But increasingly, it's about, you know, people actually applying stuff in production, having data, system of record, running workloads both on prem and on the cloud, cloud is sort of becoming more and more real at mainstream enterprises. So a lot of it means, as you take any of the examples today any interesting app will have some sort of real time data feed, it's probably coming out from a cell phone or sensor which means that data is actually not, in most cases not coming on prem, it's actually getting collected in a local cloud somewhere, it's just more cost effective, why would we put up 25 data centers if you don't have to, right? So then you got to connect that data, production data you have or customer data you have or data you might have purchased and then join them up, run some interesting analytics, do geobased real time threat detection, cyber security. A lot of it means that you need a common way to secure data, govern it, and that's where we see the action, I think it's a really good sign for the market and for the community that people are pushing on these dimensions of the broader, because, getting pushed in this dimension because it means that people are actually using it for real production work loads. >> Well in the early days of Hadoop you really didn't talk that much about cloud. >> Yeah. >> You know, and now, >> Absolutely. >> It's like, you know, duh, cloud. >> Yeah. >> It's everywhere, and of course the whole hybrid cloud thing comes into play, what are you seeing there, what are things you can do in a hybrid, you know, or on prem that you can't do in a public cloud and what's the dynamic look like? >> Well, it's definitely not an either or, right? So what we're seeing is increasingly interesting apps need data which are born in the cloud and they'll stay in the cloud, but they also need transactional data which stays on prem, you might have an EDW for example, right? >> Right. >> There's not a lot of, you know, people want to solve business problems and not just move data from one place to another, right? Or back from one place to another, so it's not interesting to move an EDW to the cloud, and similarly it's not interesting to bring your IOT data or sensor data back into on-prem, right? Just makes sense. So naturally what happens is, you know, at Hortonworks we talk of kinds of modern app or a modern data app, which means a modern data app has to spare, has to sort of, you know, it can pass both on-prem data and cloud data. >> Yeah, you talked about that in your keynote years ago. Furio said that the data is the new development kit. And now you're seeing the apps are just so dang rich, >> Exactly, exactly. >> And they have to span >> Absolutely. >> physical locations, >> Yeah. >> But then this whole thing of IOT comes up, we've been having a conversation on The Cube, last several Cubes of, okay, how much stays out, how much stays in, there's a lot of debates about that, there's reasons not to bring it in, but you talked today about some of the important stuff will come back. >> Yeah. >> So the way this is, this all is going to be, you know, there's a lot of data that should be born in the cloud and stay there, the IOT data, but then what will happen increasingly is, key summaries of the data will move back and forth, so key summaries of your EDW will move to the cloud, sometimes key summaries of your IOT data, you know, you want to do some sort of historical training in analytics, that will come back on-prem, so I think there's a bi-directional data movement, but it just won't be all the data, right? It'll be key interesting summaries of the data but not all of it. >> And a lot of times, people say well it doesn't matter where it lives, cloud should be an operating model, not a place where you put data or applications, and while that's true and we would agree with that, from a customer standpoint it matters in terms of performance and latency issues and cost and regulation, >> And security and governance. >> Yeah. >> Absolutely. >> You need to think those things through. >> Exactly, so I mean, so that's what we're focused on, to make sure that you have a common security and governance model regardless of where data is, so you can think of it as, infrastructure you own and infrastructure you lease. >> Right. >> Right? Now, the details matter of course, when you go to the cloud you lose S3 for example or ADLS from Microsoft, but you got to make sure that there's a common sort of security governance front and top of it, in front of it, as an example one of the things that, you know, in the open source community, Ranger's a really sort of key project right now from a security authorization and authentication standpoint. We've done a lot of work with our friends at Microsoft to make sure, you can actually now manage data in Wasabi which is their object store, data stream, natively with Ranger, so you can set a policy that says only Dave can access these files, you know, George can access these columns, that sort of stuff is natively done on the Microsoft platform thanks to the relationship we have with them. >> Right. >> So that's actually really interesting for the open source communities. So you've talked about sort of commodity storage at the bottom layer and even if they're different sort of interfaces and implementations, it's still commodity storage, and now what's really helpful to customers is that they have a common security model, >> Exactly. >> Authorization, authentication, >> Authentication, lineage prominence, >> Oh okay. >> You want to make sure all of these are common sources across. >> But you've mentioned off of the different data patterns, like the stuff that might be streaming in on the cloud, what, assuming you're not putting it into just a file system or an object store, and you want to sort of merge it with >> Yeah. >> Historical data, so what are some of the data stores other than the file system, in other words, newfangled databases to manage this sort of interaction? >> So I think what you're saying is, we certainly have the raw data, the raw data is going to line up in whatever cloud native storage, >> Yeah. >> It's going to be Amazon, Wasabi, ADLS, Google Storage. But then increasingly you want, so now the patterns change so you have raw data, you have some sort of an ETL process, what's interesting in the cloud is that even the process data or, if you take the unstructured raw data and structure it, that structured data also needs to live on the cloud platform, right? The reason that's important is because A, it's cheaper to use the native platform rather than set up your own database on top of it. The other one is you also want to take advantage of all the native sources that the cloud storage provides, so for example, linking your application. So automatically data in Wasabi, you know, if you can set up a policy and easily say this structured data stable that I have of which is a summary of all the IOT activity in the last 24 hours, you can, using the cloud provider's technologies you can actually make it show up easily in Europe, like you don't have to do any work, right? So increasingly what we Hortonworks focused a lot on is to make sure that we, all of the computer engines, whether it's Spark or Hive or, you know, or MapReduce, it doesn't really matter, they're all natively working on the cloud provider's storage platform. >> [George] Okay. >> Right, so, >> Okay. >> That's a really key consideration for us. >> And the follow up to that, you know, there's a bit of a misconception that Spark replaces Hadoop, but it actually can be a processing, a compute engine for, >> Yeah. >> That can compliment or replace some of the compute engines in Hadoop, help us frame, how you talk about it with your customers. >> For us it's really simple, like in the past, the only option you had on Hadoop to do any computation was MapReduce, that was, I started working in MapReduce 11 years ago, so as you can imagine, it's a pretty good run for any technology, right? Spark is definitely the interesting sort of engine for sort of the, anything from mission learning to ETL for data on top of Hadoop. But again, what we focus a lot on is to make sure that every time we bring in, so right now, when we started on HTP, the first on HTP had about nine open source projects literally just nine. Today, the last one we shipped was 2.5, HTP 2.5 had about 27 I think, like it's a huge sort of explosion, right? But the problem with that is not just that we have 27 projects, the problem is that you're going to make sure each of the 27 work with all the 26 others. >> It's a QA nightmare. >> Exactly. So that integration is really key, so same thing with Spark, we want to make sure you have security and YARN (mumbles), like you saw in the demo today, you can now run Spark SQL but also make sure you get low level (mumbles) masking, all of the enterprise capabilities that you need, and I was at a financial services three or four weeks ago in Chicago. Today, to do equivalent of what I showed today on demo, they need literally, they have a classic ADW, and they have to maintain anywhere between 1500 to 2500 views of the same database, that's a nightmare as you can imagine. Now the fact that you can do this on the raw data using whether it's Hive or Spark or Peg or MapReduce, it doesn't really matter, it's really key, and that's the thing we push to make sure things like YARN security work across all the stacks, all the open source techs. >> So that makes life better, a simplification use case if you will, >> Yeah. >> What are some of the other use cases that you're seeing things like Spark enable? >> Machine learning is a really big one. Increasingly, every product is going to have some, people call it, machine learning and AI and deep learning, there's a lot of techniques out there, but the key part is you want to build a predictive model, in the past (mumbles) everybody want to build a model and score what's happening in the real world against model, but equally important make sure the model gets updated as more data comes in on and actually as the model scores does get smaller over time. So that's something we see all over, so for example, even within our own product, it's not just us enabling this for the customer, for example at Hortonworks we have a product called SmartSense which allows you to optimize how people use Hadoop. Where the, what are the opportunities for you to explore deficiencies within your own Hadoop system, whether it's Spark or Hive, right? So we now put mesh learning into SmartSense. And show you that customers who are running queries like you are running, Mr. Customer X, other customers like you are tuning Hadoop this way, they're running this sort of config, they're using these sort of features in Hadoop. That allows us to actually make the product itself better all the way down the pipe. >> So you're improving the scoring algorithm or you're sort of replacing it with something better? >> What we're doing there is just helping them optimize their Hadoop deploys. >> Yep. >> Right? You know, configuration and tuning and kernel settings and network settings, we do that automatically with SmartSense. >> But the customer, you talked about scoring and trying to, >> Yeah. >> They're tuning that, improving that and increasing the probability of it's accuracy, or is it? >> It's both. >> Okay. >> So the thing is what they do is, you initially come with a hypothesis, you have some amount of data, right? I'm a big believer that over time, more data, you're better off spending more, getting more data into the system than to tune that algorithm financially, right? >> Interesting, okay. >> Right, so you know, for example, you know, talk to any of the big guys on Facebook because they'll do the same, what they'll say is it's much better to get, to spend your time getting 10x data to the system and improving the model rather than spending 10x the time and improving the model itself on day one. >> Yeah, but that's a key choice, because you got to >> Exactly. >> Spend money on doing either, >> One of them. >> And you're saying go for the data. >> Go for the data. >> At least now. >> Yeah, go for data, what happens is the good part of that is it's not just the model, it's the, what you got to really get through is the entire end to end flow. >> Yeah. >> All the way from data aggregation to ingestion to collection to scoring, all that aspect, you're better off sort of walking through the paces like building the entire end to end product rather than spending time in a silo trying to make a lot of change. >> We've talked to a lot of machine learning tool vendors, application vendors, and it seems like we got to the point with Big Data where we put it in a repository then we started doing better at curating it and understanding it then starting to do a little bit exploration with business intelligence, but with machine learning, we don't have something that does this end to end, you know, from acquiring the data, building the model to operationalizing it, where are we on that, who should we look to for that? >> It's definitely very early, I mean if you look at, even the EDW space, for example, what is EDW? EDW is ingestion, ETL, and then sort of fast query layer, Olap BI, on and on and on, right? So that's the full EDW flow, I don't think as a market, I mean, it's really early in this space, not only as an overall industry, we have that end to end sort of industrialized design concept, it's going to take time, but a lot of people are ahead, you know, the Google's a world ahead, over time a lot of people will catch up. >> We got to go, I wish we had more time, I had so many other questions for you but I know time is tight in our schedule, so thanks so much Arun, >> Appreciate it. For coming on, appreciate it, alright, keep right there everybody, we'll be back with our next guest, it's The Cube, we're live from Spark Summit East in Boston, right back. (upbeat music)
SUMMARY :
brought to you by Data Breaks. father of YARN, can I call you that, Glad you made it in to Boston, So a lot of it means, as you take any of the examples today you really didn't talk that has to sort of, you know, it can pass both on-prem data Yeah, you talked about that in your keynote years ago. but you talked today about some of the important stuff So the way this is, this all is going to be, you know, And security and You need to think those so that's what we're focused on, to make sure that you have as an example one of the things that, you know, in the open So that's actually really interesting for the open source You want to make sure all of these are common sources in the last 24 hours, you can, using the cloud provider's in Hadoop, help us frame, how you talk about it with like in the past, the only option you had on Hadoop all of the enterprise capabilities that you need, Where the, what are the opportunities for you to explore What we're doing there is just helping them optimize and network settings, we do that automatically for example, you know, talk to any of the big guys is it's not just the model, it's the, what you got to really like building the entire end to end product rather than but a lot of people are ahead, you know, the Google's everybody, we'll be back with our next guest, it's The Cube,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Dave Alante | PERSON | 0.99+ |
Arun Murthy | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
10x | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
Chicago | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
Arun | PERSON | 0.99+ |
Wasabi | ORGANIZATION | 0.99+ |
25 data centers | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
Hadoop | TITLE | 0.99+ |
Wasabi | LOCATION | 0.99+ |
YARN | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
ADLS | ORGANIZATION | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Horton Works | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Data Breaks | ORGANIZATION | 0.99+ |
1500 | QUANTITY | 0.98+ |
SmartSense | TITLE | 0.98+ |
S3 | TITLE | 0.98+ |
Boston, Massachusetts | LOCATION | 0.98+ |
One | QUANTITY | 0.98+ |
27 projects | QUANTITY | 0.98+ |
three | DATE | 0.98+ |
ORGANIZATION | 0.98+ | |
Furio | PERSON | 0.98+ |
Spark | TITLE | 0.98+ |
2500 views | QUANTITY | 0.98+ |
first | QUANTITY | 0.97+ |
Spark Summit East | LOCATION | 0.97+ |
both | QUANTITY | 0.97+ |
Spark SQL | TITLE | 0.97+ |
Google Storage | ORGANIZATION | 0.97+ |
26 | QUANTITY | 0.96+ |
Ranger | ORGANIZATION | 0.96+ |
four weeks ago | DATE | 0.95+ |
one | QUANTITY | 0.94+ |
each | QUANTITY | 0.94+ |
four years ago | DATE | 0.94+ |
11 years ago | DATE | 0.93+ |
27 work | QUANTITY | 0.9+ |
MapReduce | TITLE | 0.89+ |
Hive | TITLE | 0.89+ |
this morning | DATE | 0.88+ |
EDW | TITLE | 0.88+ |
about nine open source | QUANTITY | 0.88+ |
day one | QUANTITY | 0.87+ |
nine | QUANTITY | 0.86+ |
years | DATE | 0.84+ |
Olap | TITLE | 0.83+ |
Cube | ORGANIZATION | 0.81+ |
a lot of data | QUANTITY | 0.8+ |