Adam Wilson & Joe Hellerstein, Trifacta - Big Data SV 17 - #BigDataSV - #theCUBE

>> Commentator: Live from San Jose, California. It's theCUBE covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data SV (mumbles) event in conjunction with Strata + Hadoop. Our companion event, the Big Data NYC and we're here breaking down the Big Data world as it evolves and goes to the next level up on the step function, AI machine learning, IOT really forcing people to really focus on a clear line of the side of the data. I'm John Furrier with our announcer from Wikibon, George Gilbert and our next guest, our two executives from Trifacta. The founder and Chief Strategy Officer, Joe Hellerstein and Adam Wilson, the CEO. Guys, welcome to theCUBE. Welcome back. >> Great to be here. >> Good to be here. >> Founder, co-founder? >> Co-founder. >> Co-founder. He's a multiple co-founders. I remember it 'cause you guys were one of the first sites that have the (mumbles) in the about section on all the management team. Just to show you how technical you guys are. Welcome back. >> And if you're Trifacta, you have to have three founders, right? So that's part of the tri, right? >> The triple threat, so to speak. Okay, so a big year for you guys. Give us the update. I mean, also we had Alation announce this partnering going on and some product movement. >> Yup. >> But there's a turbulent time right now. You have a lot of things happening in multiple theaters to technical theater to business theater. And also within the customer base. It's a land grand, it seems to be on the metadata and who's going to control what. What's happening? What's going on in the market place and what's the update from you guys? >> Yeah, yeah. Last year was an absolutely spectacular year for Trifacta. It was four times growth in bookings, three times growth in customers. You know, it's been really exciting for us to see the technology get in the hands of some of the largest companies on the planet and to see what they're able to do with it. From the very beginning, we really believed in this idea of self service and democratization. We recognize that the wrangling of the data is often where a lot of the time and the effort goes. In fact, up to 80% of the time and effort goes in a lot of these analytic projects and to the extent that we can help take the data from (mumbles) in a more productive way and to allow more people in an organization to do that. That's going to create information agility that that we feel really good about and there are customers and they are telling us is having an impact on their use of Big Data and Hadoop. And I think you're seeing that transition where, you know, in the very beginning there was a lot of offloading, a lot of like, hey we're going to grab some cost savings but then in some point, people scratch their heads and said, well, wait a minute. What about the strategic asset that we were building? That was going to change the way people work with the data. Where is that piece of it? And I think as people started figuring out in order to get our (mumbles), we got to have users and use cases on these clusters and the data like itself is not a used case. Tools like Trifacta have been absolutely instrumental and really fueling that maturity in the market and we feel great about what's happening there. >> I want to get some more drilled out before we get to some of these questions for Joe too because I think you mentioned, you got some quotes. I just want to double up a click on that. It always comes up in the business model question for people. What's your business model? >> Sure. >> And doing democratization is really hard. Sometimes democratization doesn't appear until years later so it's one of those elusive things. You see it and you believe it but then making it happen are two different things. >> Yeah, sure. >> So. And appreciate that the vision they-- (mumbles) But ultimately, at the end of the day, that business model comes down to how you organized. Prove points. >> Yup. >> Customers, partnerships. >> Yeah. >> We had Alation on Stephanie (mumbles). Can you share just and connect the dots on the business model? >> Sure. >> With respect to the product, customers, partners. How was that specifically evolving? >> Adam: Sure. >> Give some examples. >> Sure, yeah. And I would say kind of-- we felt from the beginning that, you know, we wanted to turn what was traditionally a very complex messy problem dealing with data, you know, in the user experience problem that was powered by machine learning and so, a lot of it was down to, you know, how we were going to build and architect the technology needed (mumbles) for really getting the power in the hands of the people who know the data best. But it's important, and I think this is often lost in Silicon Valley where the focus on innovation is all around technology to recognize that the business model also has to support democritization so one of the first things we did coming in was to release a free version of the product. So Trifacta Wrangler that is now being used by over 4500 companies, ten of thousands of users and the power of that in terms of getting people something of value that they could start using right away on spreadsheets and files and small data and allowing them to get value but then also for us, the exchange is that we're actually getting a chance to curate at scale usage data across all of these-- >> Is this a (mumbles) product? >> It's a hybrid product. >> Okay. >> So the data stays local. It never leaves their local laptop. The metadata is hashed and put into the cloud and now we're-- >> (mumbles) to that. >> Absolutely. And so now we can use that as training data that actually has more people wrangle, the product itself gets smarter based on that. >> That's good. >> So that's creating real tangible value for customers and for us is a source of very strategic advantage and so we think that combination of the technology innovation but also making sure that we can get this in the hands of users and they can get going and as their problem grows up to be bigger and more complicated, not just spreadsheets and files on the desktop but something more complicated, then we're right there along with them for products that would have been modified. >> How about partnerships with Alation? How they (mumbles)? What are all the deals you got going on there? >> So Alation has been a great partner for us for a while and we've really deepened the integration with the announcements today. We think that cataloging and data wrangling are very complimentary and they're a natural fit. We've got customers like Munich Re, like eBay as well as MarketShare that are using both solutions in concert with one another and so, we really felt that it was natural to tighten that coupling and to help people go from inventorying what's going on in their data legs and their clusters to then cleansing, standardizing. Essentially making it fit for purpose and then ensuring that metadata can roundtrip back into the catalog. And so that's really been an extension of what we're doing also at the technical level with technologies like Cloudera Navigator with Atlas and with the project that Joe's involved with at Berkeley called Ground. So I don't know if you want to talk-- >> Yeah, tell him about Ground. >> Sure. So part of our outlook on this and this speaks to the kind of way that the landscape in the industry's shaping out is that we're not going to see customers buying until it's sort of lock in on the key components of the area for (mumbles). So for example, storage, HD (mumbles). This is open and that's key, I think, for all the players in this base at HTFS. It's not a product from a storage vendor. It's an open platform and you can change vendors along the way and you could role your own and so on. So metadata, to my mind, is going to move in the same direction. That the storage of metadata, the basic component tree that keeps the metadata, that's got to be open to give people the confidence that they're going to pour the basic descriptions of what's in their business and what their people are doing into a place that they know they can count on and it will be vendor neutral. So the catalog vendors are, in my mind, providing a functionality above that basic storage that relates to how do you search the catalog, what does the catalog do for you to suggest things, to suggest data sets that you should be looking at. So that's a value we have on top but below that what we're seeing is, we're seeing Horton and Cloudera coming out with either products re opensource and it's sort of the metadata space and what would be a shame is if the two vendors ended up kind of pointing guns inward and kind of killing the metadata storage. So one of the things that I got interested in as my dual role as a professor at Berkeley and also as a founder of a company in this space was we want to ensure that there's a free open vendor neutral metadata solution. So we began building out a project called Ground which is both a platform for metadata storage that can be sitting underneath catalog vendors and other metadata value adds. And it's also a platform for research much as we did with Spark previously at Berkeley. So Ground is a project in our new lab at Berkeley. The RISELab which is the successor to the AMPLab that gave us Spark. And Ground has now got, you know, collaboratives from Cloudera, from LinkedIn. Capital One has significantly invested in Ground and is putting engineers behind it and contributors are coming also from some startups to build out an open-sourced platform for metadata. >> How old has Ground been around? >> Joe: Ground's been around for about 12 months. It's very-- >> So it's brand new. How do people get involved? >> Brand new. >> Just standard similar to the way the AMPLab was? Just jump in and-- >> Yeah, you know-- >> Go away and-- >> It comes up on GitHub. There's (mumbles) to go download and play with. It's in alpha. And you know, we hope we (mumbles) and the usual opensource still. >> This is interesting. I like this idea because one thing you've been riffing on the cue ball of time is how do you make data addressable? Because ultimately, you know, real time you need to have access to data really really low (mumbles) to see the inside to make it work. Hence the data swamp problem right? So, how do you guys see that? 'Cause now I can just pop in. I can hear the objections. Oh, security! You know. How do you guys see the protections? I'd love to help get my data in there and get something back in return in a community model. Security? Is it the hashing? What's the-- How do you get any security (mumbles)? Or what are the issues? >> Yeah, so I mean the straightforward issues are the traditional issues of authorization and encryption and those are issues that are reasonably well-plumed out in the industry and you can go out and you can take the solutions from people like Clutter or from Horton and those solutions have plugin quite nicely actually to a variety of platforms. And I feel like that level of enterprise security is understood. It's work for vendors to work with that technology so when we went out, we make sure we were carburized in all the right ways at Trifacta to work with these vendors and that we integrated well with Navigator, we integrated with Atlas. That was, you know, there was some labor there but it's understood. There's also-- >> It's solvable basically. >> It's solvable basically and pluggable. There are research questions there which, you know, on another day we could talk about but for instance if you don't trust your cloud hosting service what do you do? And that's like an open area that we're working on at Berkeley. Intel SGX is a really interesting technology and that's based probably a topic for another day. >> But you know, I think it's important-- >> The sooner we get you out of the studio, Paolo Alto would love to drill on that. >> I think it's important though that, you know, when we talk about self service, the first question that comes up is I'm only going to let you self service as far as I can govern what's going on, right? And so I think those things-- >> Restrictions, guard rails-- >> Really going hand in here. >> About handcuffs. >> Yeah so, right. Because that's always a first thing that kind of comes out where people say, okay wait minute now is this-- if I've now got, you know-- you've got an increasing number of knowledge workers who think that is their-- and believe that it is their unalienable right to have access to data. >> Well that's the (mumbles) democratization. That's the top down, you know, governance control point. >> So how do you balance that? And I think you can't solve for one side of that equation without the other, right? And that's really really critical. >> Democratization is anarchization, right? >> Right, exactly. >> Yes, exactly. But it's hard though. I mean, and you look at all the big trends where there was, you know, web one data, web (mumbles), all had those democratization trends but they took six years to play out and I think there might be a more auxiliary with cloud when you point about this new stop. Okay George, go ahead. You might get in there. >> I wanted to ask you about, you know, what we were talking about earlier and what customers are faced with which is, you know, a lot of choice and specialization because building something end to end and having it fully functional is really difficult. So... What are the functional points where you start driving the guard rails in that Ikee cares about and then what are the user experience points where you have critical mass so that the end users then draw other compliant tools in. You with me? On sort of the IT side and the user side and then which tools start pulling those standards? >> Well, I would say at the highest level, to me what's been very interesting especially would be with that's happened in opensource is that people have now gotten accustomed to the idea that like I don't have to go buy a big monolithic stacks where the innovation moves only as fast as the slowest product in the stack or the portfolio. I can grab onto things and I can download them today and be using them tomorrow. And that has, I think, changed the entire approach that companies like Trifacta are taking to how we how we build and release product to market, how we inter operate with partners like Alation and Waterline and how we integrate with the platform vendors like Cloudera, MapR, and Horton because we recognize that we are going to have to be meniacal focused on one piece of this puzzle and to go very very deep but then play incredibly well both, you know, with all the rest of the ecosystem and so I think that is really colored our entire product strategy and how we go to market and I think customers, you know, they want the flexibility to change their minds and the subscription model is all about that, right? You got to earn it every single year. >> So what's the future of (mumbles)? 'Cause that brings up a good point we were kind of critical of Google and you mentioned you guys had-- I saw in some news that you guys were involved with Google. >> Yup. >> Being enterprise ready is not just, hey we have the great tech and you buy from us, damn it we're Google. >> Right. >> I mean, you have to have sales people. You have to have automation mechanism to create great product. Will the future of wrangling and data prep go into-- where does it end up? Because enterprises want, they want certain things. They're finicky of things. >> Right, right. >> As you guys know. So how does the future of data prep deal with the, I won't say the slowness of the enterprise, but they're more conservative, more SLA driven than they are price performance. >> But they're also more fragmented than ever before and you know, while that may not be a great thing for the customers for a company that's all about harmonizing data that's actually a phenomenal opportunity, right? Because we want to be the decision that customers make that guarantee that all their other decisions are changeable, right? And I go and-- >> Well they have legacy systems of record. This is the challenge, right? So I got the old oracle monolithic-- >> That's fine. And that's good-- >> So how do you-- >> The more the merrier, right? >> Does that impact you guys at all? How did you guys handle that situation? >> To me, to us that is more fragmentation which creates more need for wrangling because that introduces more complexity, right? >> You guys do well in that environment. >> Absolutely. And that, you know, is only getting bigger, worse, and more complicated. And especially as people go from (mumbles) to cloud as people start thinking about moving from just looking at transactions to interactions to now looking at behavior data and the IOT-- >> You're welcome in that environment. >> So we welcome that. In fact, that's where-- we went to solve this problem for Hadoop and Big Data first because we wanted to solve the problems at scale that were the most complicated and over time we can always move downstream to sort of more structured and smaller data and that's kind of what's happened with our business. >> I guess I want to circle back to this issue of which part of this value chain of refining data is-- if I'm understanding you right, the data wrangling is the anchor and once a company has made that choice then all the other tool choices have to revolve around it? Is that a-- >> Well think about this way, I mean, the bulk of the time when you talk to the analysts and also the bulk of the labor cost and these things isn't getting the data from its raw form into usage. That whole process of wrangling which is not really just data prep. It's all the things you do all day long to kind of massage these data sets and get 'em from here to there and make 'em work. That space is where the labor cost is. That also means that's spaces were the value add is because that's where your people power or your business context is really getting poured in to understand what do I have, what am I doing with it and what do I want to get out of it. As we move from bottom line IT to top line value generation with data, it becomes all the more so, right? Because now it's not just the matter of getting the reports out every month. It's also what did that brilliant in sales do to that dataset to get that much left? I need to learn from her and do a similar thing. Alright? So, that whole space is where the value is. What that means is that, you know, you don't want that space to be tied to a particular BI tool or a particular execution edge. So when we say that we want to make a decision in the middle of that enables all the other decisions, what you really want to make sure is that that work process in there is not tightly bound to the rest of the stack. Okay? And so you want to particularly pick technologies in that space that will play nicely with different storage, that play nicely with different execution environments. Today it's a dupe, tomorrow it's Amazon, the next day it's Google and they have different engines back there potentially. And you want it certainly makes your place with all the analytic and visualizations-- >> So decouple from all that? >> You want to decouple that and you want to not lock yourself in 'cause that's where the creativity's happening on the consumption side and that's where the mess that you talked about is just growing on the production side so data production is just getting more complicated. Data consumption's getting more interesting. >> That's actually a really really cool good point. >> Elaborating on that, does that mean that you have to open up interfaces with either the UI layer or at the sort of data definition layer? Or does that just mean other companies have to do the work to tie in to the styles? The styles and structures that you have already written? >> In fact it's sort of the opposite. We do the work to tie in to a lot of this, these other decisions in this infrastructure, you know. We don't pretend for a minute that people are going to sort of pick a solution like Trifacta and then build their organization around it. As your point, there's tons of legacy, technology out there. There is all kinds of things moving. Absolutely. So we, a big part of being the decoder ring for data for Trifacta and saying it's like listen, we are going to inter operate with your existing investments and we're going to make sure that you can always get at your data, you can always take it from whatever state its in to whatever state you need to be in, you can change your mind along the way. And that puts a lot of owners on us and that's the reason why we have to be so focused on this space and not jump into visualization and analytics and not jump in to its storage and processing and not try to do the other things to the right or left. Right? >> So final question. I'd like you guys both to take a stab at it. You know, just going to pivot off at what Joe was saying. Some of the most interesting things are happening in the data exploration kind of discovery area from creativity to insights to game changing stuff. >> Yup. >> Ventures potentially. >> Joe: Yup. >> The problem of the complexity, that's conflict. >> Yeah. >> So how does we resolve this? I mean, besides the Trifacta solution which you guys are taming, creating a platform for that, how do people in industry work together to solve that problem? What's the approach? >> So I think actually there's a couple sort of heartening trends on this front that make me pretty optimistic. One of these is that the inside of structures are in the enterprises we work with becoming quite aligned between IT and the line of business. It's no longer the case that the line of business that are these annoying people that they're distracting IT from their bottom line function. IT's bottom line function is being translated into a what's your value for the business question? And the answer for a savvy IT management person is, I will try to empower the people around me to be rabid fans and I will also try to make sure that they do their own works so I don't have to learn how to do it for them. Right? And so, that I think is happening-- >> Guys to this (mumbles) business guys, a bunch of annoying guys who don't get what I need, right? So it works both ways, right? >> It does, it does. And I see that that's improving sort of in the industry as the corporate missions around data change, right? So it's no longer that the IT guys really only need to take care of executives and everyone else doesn't matter. Their function really is to serve the business and I see that alignment. The other thing that I think is a huge opportunity and the part of who I-- we're excited to be so tightly coupled with Google and also have our stuff running in Amazon and at Microsoft. It's as people read platform to the cloud, a lot of legacy becomes a shed or at least become deprecated. And so there is a real-- >> Or containerized or some sort of microservice. >> Yeah. >> Right, right. >> And so, people are peeling off business function and as part of that cost savings to migrate it to the cloud, they're also simplified. And you know, things will get complicated again. >> What's (mumbles) solution architects out there that kind of re-boot their careers because the old way was, hey I got networks, I got apps and stacks and so that gives the guys who could be the new heroes coming in. >> Right. >> And thinking differently about enabling that creativity. >> In the midst of all that, everything you said is true. IT is a massive place and it always will be. And tools that can come in and help are absolutely going to be (mumbles). >> This is obvious now. The tension's obviously eased a bit in the sense that there's clear line of sight that top line and bottom line are working together now on. You mentioned that earlier. Okay. Adam, take a stab at it. (mumbling) >> I was just going to-- hey, I know it's great. I was just going to give an example, I think, that illustrates that point so you know, one of our customers is Pepsi. And Pepsi came to us and they said, listen we work with retailers all over the world and their reality is that, when they place orders with us, they often get it wrong. And sometimes they order too much and then they return it, it spoils and that's bad for us. Or they order too little and they stock out and we miss revenue opportunities. So they said, we actually have to be better at demand planning and forecasting than the orders that are literally coming in the door. So how do we do that? Well, we're getting all of the customers to give us their point of sale data. We're combining that with geospatial data, with weather data. We're like looking at historical data and industry averages but as you can see, they were like-- we're stitching together data across a whole variety of sources and they said the best people to do this are actually the category managers and the people responsible for the brands 'cause they literally live inside those businesses and they understand it. And so what happened was they-- the IT organization was saying, look listen, we don't want to be the people doing the janitorial work on the data. We're going to give that work over to people who understand it and they're going to be more productive and get to better outcomes with that information and that brings us up to go find new and interesting sources and I think that collaborative model that you're starting to see emerge where they can now be the data heroes in a different way by not being the ones beating the bottleneck on provisioning but rather can go out and figure out how do we share the best stuff across the organization? How do we find new sources of information to bring in that people can leverage to make better decisions? That's in incredibly powerful place to be and you know, I think that that model is really what's going to be driving a lot of the thinking at Trifacta and in the industry over the next couple of years. >> Great. Adam Wilson, CEO of Trifacta. Joe Hellestein, CTO-- Chief Strategy Officer of Trifacta and also a professor at Berkeley. Great story. Getting the (mumbles) right is hard but under the hood stuff's complicated and again, congratulations about sharing the Ground project. Ground open source. Open source lab kind of thing at-- in Berkeley. Exciting new stuff. Thanks so much for coming on theCUBE. I appreciate great conversation. I'm John Furrier, George Gilbert. You're watching theCUBE here at Big Data SV in conjunction with Strata and Hadoop. Thanks for watching. >> Great. >> Thanks guys.

Published Date : Mar 16 2017

SUMMARY :

It's theCUBE covering Big Data Silicon Valley 2017. and Adam Wilson, the CEO. that have the (mumbles) in the about section Okay, so a big year for you guys. and what's the update from you guys? and really fueling that maturity in the market in the business model question for people. You see it and you believe it but then that business model comes down to how you organized. on the business model? With respect to the product, customers, partners. that the business model also has to support democritization So the data stays local. the product itself gets smarter and files on the desktop but something more complicated, and to help people go from inventorying that relates to how do you search the catalog, It's very-- So it's brand new. and the usual opensource still. I can hear the objections. and that we integrated well with Navigator, There are research questions there which, you know, The sooner we get you out and believe that it is their unalienable right That's the top down, you know, governance control point. And I think you can't solve for one side of that equation and I think there might be a more auxiliary with cloud so that the end users then draw other compliant tools in. and how we go to market and I think customers, you know, I saw in some news that you guys hey we have the great tech and you buy from us, I mean, you have to have sales people. So how does the future of data prep deal with the, So I got the old oracle monolithic-- And that's good-- in that environment. and the IOT-- You're welcome in that and that's kind of what's happened with our business. the bulk of the time when you talk to the analysts and you want to not lock yourself in and that's the reason why we have to be in the data exploration kind of discovery area The problem of the complexity, in the enterprises we work with becoming quite aligned And I see that that's improving sort of in the industry as or some sort of microservice. and as part of that cost savings to migrate it to the cloud, so that gives the guys who could be In the midst of all that, everything you said is true. in the sense that there's clear line of sight and in the industry over the next couple of years. and again, congratulations about sharing the Ground project.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George	PERSON	0.99+
Joe	PERSON	0.99+
George Gilbert	PERSON	0.99+
Joe Hellestein	PERSON	0.99+
John Furrier	PERSON	0.99+
Trifacta	ORGANIZATION	0.99+
Pepsi	ORGANIZATION	0.99+
Adam Wilson	PERSON	0.99+
Adam	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Waterline	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Berkeley	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Alation	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
Horton	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Capital One	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
Last year	DATE	0.99+
two executives	QUANTITY	0.99+
Trifacta	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
one piece	QUANTITY	0.98+
both solutions	QUANTITY	0.98+
today	DATE	0.98+
over 4500 companies	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
both ways	QUANTITY	0.98+
both	QUANTITY	0.98+
three founders	QUANTITY	0.97+
two vendors	QUANTITY	0.97+
first sites	QUANTITY	0.97+
Ground	ORGANIZATION	0.97+
Munich Re	ORGANIZATION	0.97+
about 12 months	QUANTITY	0.97+
NYC	LOCATION	0.96+
first thing	QUANTITY	0.96+
four times	QUANTITY	0.96+
eBay	ORGANIZATION	0.95+
Wikibon	ORGANIZATION	0.95+
Paolo Alto	PERSON	0.95+
next day	DATE	0.95+
three times	QUANTITY	0.94+
ten of thousands of users	QUANTITY	0.93+
one side	QUANTITY	0.93+
years later	DATE	0.92+

Josh Rogers, Syncsort - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's The Cube covering Big Data Silicon Valley 2017. (innovative music) >> Welcome back, everyone, Live in Silicon Valley is The Cube's coverage of Big Data SV, our event in Silicon Valley in conjunction with our Big Data NYC for New York City. Every year, twice a year, we get our event going around Strata Hadoop in conjunction with those guys. I'm John Furrier with SiliconANGLE with George Gilbert, our Wikibon (mumbles). Our next guest is Josh Rogers, the CEO of Syncsort, but on many times, Cube alumni, that firm that acquired Trillium, which we talked about yesterday. Welcome back to The Cube, good to see you. >> Good to see you, how are ya? >> So Syncsort is just one of those companies that's really interesting. We were talking about this. I want to get your thoughts on this because I'm not sure if it was in the plan or not, or really ingenius moves by you guys on the manager's side, but Legacy Business, lockdown legacy environments, like the mainframe, and then transform into a modern data company. Was that part of the plan or kind of on purpose by accident? Or what's-- >> Part of the plan. You think about what we've been doing for the last 40 years. We had specific capabilities around managing data at scale and around helping customers who process that data to give more value out of it through analytics, and we've just continually moved through the various kind of generations of technology to apply that same discipline in new environments and big data is frankly been a terrific opportunity for us to apply that same technical and talented DNA in that new environment. It's kind of been running the same game plan. (talking over each other) >> You guys have a good execution, but I think one of the things we were point out, and this is one of those things where, certainly, I live in Palo Alto in Silicon Valley. We love innovation. We love all the shiny, new toys, but you get tempted to go after something really compelling, cool, and relevant, and then go, "Whoa, I forgot about locking down "some of the legacy data stuff," and then you're kind of working down and you guys took a different approach. You going in to the trends from a solid foundation. That's a different execution approach and, like you said, by design, so that's working. >> Yeah, it's definitely working and I think it's also kind of focused on an element that maybe is under-reported, which is a lot of these legacy systems aren't going away, and so one of the big challenges-- >> And this is for record, by the way. >> Right (talking over each other). How do I integrate those legacy environments with these next-generation environments and to do that you have to have expertise on both side, and so one of the things I think we've done a good job is developing that big data expertise and then turning around and saying we can solve that challenge for you, and obviously, the big iron, the big data solutions we bring to market are a perfect example of that, but there's additional solutions that we can provide customers, and we'll talk more about those in a few-- >> Talk about the Trillium acquisition. I want to just, you take a minute to describe that you bought a company called Trillium. What is it, just take a minute to explain what it is and why is it relevant? >> Trillium is a really special company. They are the independent leader in data quality and have been for many years. They've been in the top-right of the gartner magic quadrant for more than a decade, and really, when you look at large, complex, global enterprises, they are the kind of gold-standard in data quality, and when I say data quality, what I mean is an ability to take a dataset, understand the issues with that dataset, and then establish business rules to improve the quality of that data so you can actually trust that data. Obviously that's relevant in a near-adjacency to the data movement and transformation that Syncsort's been known for for so long. What's interesting about it is you think about the development and the maturity of big data environments, specifically Hadoop, you know, people have a desire to obviously do analytics in that data and implicit in that is the ability to trust that data and the way you get there is being able to apply profiling equality rules in that environment, and that's an underserved market today. When we thought about the Trillium acquisition, it was partly, "Hey, this is a great firm "that has so much respect and the space, "and so much talented capability, a powerful capability "and market-leading data quality talent, "but also, we have an ability to apply it "in this next generation environment "much like we did on the ETL and data movement space." And I think that the industry is at a point where enterprises are realizing, "I'm going to need to apply the same "data management disciplines to make use of my data "in my next generation analytics environment "that I did in my data warehouse environment." Obviously, there's different technologies involved. There's different types of data involved. But those disciplines don't go away and being able to improve the quality and be able to kind of build integrity in your datasets is critical, and Trillium is best in market capabilities in that respect. >> Josh, you were telling us earlier about sort of the strategy of knocking down the pins one by one as, you know, it's become clear that we sort of took, first the archive from the data warehouse, and then ETL off-loaded, now progressively more of the business intelligence. What are some of the, besides data quality, what are some of the other functions you have to-- >> There's the whole notion of metadata management, right? And that's incredibly important to support a number of key business initiatives that people want to leverage. There's different styles of movement of data so a thing you'll hear a lot about is change data capture, right, so if I'm moving datasets from source systems into my Hadoop environment, I can move the whole set, but how do I move the incremental changes on a ongoing basis at the speed of business. There's notions of master data management, right? So how do I make sure that I understand and have a gold kind of standard of reference data that I can use to try my own analytic capabilities, and then of course, there's all the analytics that people want to do both in terms of visualization and predictive analytics, but you can think about all these is various engines that I need to apply the data to get maximum value. And it's not so much that these engines aren't important anymore. It's I can now apply them in a different environment that gives me a lot more flexibility, a lot more scale, a better cost structure, and an ability to kind of harness broader datasets. And so that's really our strategy is bring those engines to this new environment. There's two ways to do that. One is build it from scratch, which is kind of a long process to get it right when you're thinking about complex, global, large enterprise requirements. The other is to take existing, tested, proven, best-in-market engines and integrate it deeply in this environment and that's the strategy we've taken. We think that offers a much faster time to value for customers to be able to maximize their investments in this next generation analytics infrastructure. >> So who shares that vision and sort of where are we in the race? >> I think we're fairly unique in our approach of taking that approach. There's certainly other large platform players. They have a broad (mumbles) ability and I think they're working on, "How do I kind of take that architecture and make it relevant?" It ends up creating a co-generation approach. I think that approach has limitations, and I think if you think about taking the core engine and integrate it deeply within the Hadoop ecosystem and Hadoop capabilities, you get a faster time to market and a more manageable solution going forward, and also one that gives you kind of a future pre-shoot from underlying changes that we'll continue to see in the Hadoop component, sort of the big data components, I guess is a better articulation. >> Josh, what's the take on the show this year and the trends, (mumbles) will become a machine learning, and I've seen that. You guys look at your execution plan. What's the landscape happening out there in the show this year? I mean, we're starting to see more business outcome conversations about machine-learning in AI. It's really putting pressure on the companies, and certainly IOT in the cloud-growth as a forcing function. Do you see the same thing? What's your thoughts? >> So machine-learning's a really powerful capability and I think as it relates to the data integration kind of space, there's a lot of benefit to be had. Think about quality. If I have to establish a set of business rules to improve the quality of my data, wouldn't it be great if those little rules could learn as they actually process datasets and see how they change over time, so there's really interesting opportunities there. We're seeing a lot of adoption of cloud. More and more customers are looking at "How do I live in a world where I've got a piece "of my operations on premise, "I've got a piece of operations in cloud, "manage those together and gradually "probably shift more into cloud over time." So I'm doing a lot of work in that space. There's some basic fundamental recognitions that have happened, which is, if I stand up a Hadoop cluster, I am going to have to buy a series of tools to make to get value out of that data in that cluster. That's a good step forward in my perspective because this notion of I'm going to stand up a team off-shore and they're just going to build all these things. >> Cost of ownership goes through the roof. >> Yeah, so I think the industry's moved past this concept of "I make an investment in Hadoop. "I don't need additional solutions." >> It highlights something that we were talking about at Google Next last week about enterprise-ready, and I want to get your thoughts 'cause you guys have a lot of experience, something that's, get in your wheelhouse, how you guys have attacked the market's been pretty impressive and not obvious, and on paper, it looks pretty boring, but you're doing great! I mean, you've done the right strategy, it works. Mainframe, locking in the mainframe, system of record. We've talked this on The Cube. Lots of videos going back three years, but enterprise-ready is a term now that's forcing people, even the best at Google, to be like like, look in the mirror and saying, "Wait a minute. "We have a blind spot." Best tech doesn't always win. You've got table steps; you've got SLAs; you've got mission data quality. One piece of bad data that should be clean could really screw up something. So what's your thoughts on enterprise-ready right now? >> I think that people are recognizing that to get a payoff on a lot of these investments in next generation analytic infrastructure, they're going to need to build, run mission-critical workloads there and take on mission-critical kind of business initiatives and prove out the value. To do that you have to be able to manage the environment, achieve the up-times, have the reliability resiliency that, quite frankly, we've been delivering for four years, and so I think that's another kind of point in our value proposition that frankly seems to be so unique, which is hey, we've been doing this for thousands of customers, the most sophisticated-- >> What are one of the ones that are going to be fatal flaws for people if they don't pay attention to? >> Well, security is huge. I think the manageability, right. So look, if I have to upgrade 25 components in my Hadoop cluster to get to the next version and I need to upgrade all the tools, I've got to have a way to do that that allows me to not only get to the next level of capability that the vendors are providing, but also to do that in a way that doesn't maybe bring down all these mission-critical workloads that have to be 24 by seven. Those pieces are really important and having both the experience and understanding of what that means, and also being able to invest the engineering resources to be able to-- >> And don't forget the sales force. You've got the DNA and the people on the streets. Josh, thanks for coming to The Cube, really appreciate it, great insight. You guys have, just to give you a compliment, great strategy, and again, good execution on your side and as you guys, you're in new territory. Every time we talk to you, you're entering in something new every time, so great to see you. Syncsort here inside The Cube. Always back at sharing commentary on what's going on in the marketplace: AI machine-learning with the table stakes in the enterprise security and what not, still critical for execution and again, IOT is really forcing the function of (mumbles). You've got to focus on the data. Thanks so much. I'm (mumbles). We'll be back with more live coverage after this break. (upbeat innovative music)

Published Date : Mar 16 2017

SUMMARY :

Announcer: Live from Welcome back to The Cube, good to see you. Was that part of the plan or kind of generations of technology to apply You going in to the trends and to do that you have to a minute to describe and implicit in that is the from the data warehouse, and have a gold kind of and also one that gives you and certainly IOT in the cloud-growth lot of benefit to be had. Cost of ownership Yeah, so I think the even the best at Google, to be like like, and so I think that's of capability that the in the marketplace: AI

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
Steve Mullaney	PERSON	0.99+
Katie	PERSON	0.99+
David Floyer	PERSON	0.99+
Charles	PERSON	0.99+
Mike Dooley	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris	PERSON	0.99+
Tristan Handy	PERSON	0.99+
Bob	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Wolf	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Merim	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Rossi	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Chris Wegmann	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Eric	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Caterpillar	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Josh	PERSON	0.99+
Europe	LOCATION	0.99+
Jerome	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lori MacVittie	PERSON	0.99+
2007	DATE	0.99+
Seattle	LOCATION	0.99+
10	QUANTITY	0.99+
five	QUANTITY	0.99+
Ali Ghodsi	PERSON	0.99+
Peter McKee	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
India	LOCATION	0.99+
Mike	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Tanuja Randery	PERSON	0.99+

Donna Prlich, Pentaho, Informatica - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's theCUBE. Covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. Here live in Silicon Valley this is theCUBE. I'm John Furrier, covering our Big Data SV event, #BigDataSV. Our companion event to Big Data NYC, all in conjunction Strata Hadoop, the Big Data World comes together, and great to have guests come by. Donna Prlich, who's the senior VP of products and solutions at Pentaho, a Hitachi company who we've been following before Hitachi had acquired you guys. But you guys are unique in the sense that you're a company within Hitachi left alone after the acquisition. You're now running all the products. Congratulations, welcome back, great to see you. >> Yeah, thank you, good to be back. It's been a little while, but I think you've had some of our other friends on here, as well. >> Yep, and we'll be at Pentaho World, you have Orlando, I think is October. >> Yeah, October, so I'm excited about that, too, so. >> I'm sure the agenda is not yet baked for that because it's early in the year. But what's going on with Hitachi? Give us the update, because you're now, your purview into the product roadmap. The Big Data World, you guys have been very, very successful taking this approach to big data. It's been different and unique to others. >> [Donna} Yep. What's the update? >> Yeah, so, very exciting, actually. So, we've seen, especially at the show that the Big Data World, we all know that it's here. It's monetizable, it's where we, actually, where we shifted five years ago, and it's been a lot of what Pentaho's success has been based on. We're excited because the Hitachi acquisition, as you mentioned, sets us up for the next bit thing, which is IOT. And I've been hearing non-stop about machine learning, but that's the other component of it that's exciting for us. So, yeah, Hitachi, we're-- >> You guys doing a lot of machine learning, a lot of machine learning? >> So we, announced our own kind of own orchestration capabilities that really target how do you, it's less about building models, and how do you enable the data scientists and data preparers to leverage the actual kind of intellectual properties that companies have in those models they've built to transform their business. So we have our own, and then the other exciting piece on the Hitachi side is, on the products, we're now at the point where we're running as Pentaho, but we have access to these amazing labs, which there's about 25 to 50 depending on where you are, whether you're here or in Japan. And those data scientists are working on really interesting things on the R & D side, when you apply those to the kind of use cases we're solving for, that's just like a kid in a candy store with technology, so that's a great-- >> Yeah, you had a built-in customer there. But before I get into Pentaho focusing on what's unique, really happening within you guys with the product, especially with machine learning and AI, as it starts to really get some great momentum. But I want to get your take on what you see happening in the marketplace. Because you've seen the early days and as it's now, hitting a whole another step function as we approach machine learning and AI. Autonomous vehicles, sensors, everything's coming. How are enterprises in these new businesses, whether they're people supporting smart cities or a smart home or automotive, autonomous vehicles. What's the trends you are seeing that are really hitting the pavement here. >> Yeah, I think what we're seeing is, and it's been kind of Pentaho's focus for a long time now, which is it's always about the data. You know, what's the data challenge? Some of the amounts of data which everybody talks about from IOT, and then what's interesting is, it's not about kind of the concepts around AI that have been around forever, but when you start to apply some of those AI concepts to a data pipeline, for instance. We always talk about that 6data pipeline. The reason it's important is because you're really bringing together the data and the analytics. You can't separate those two things, and that's been kind of not only a Pentaho-specific, sort of bent that I've had for years, but a personal one, as well. That, hey, when you start separating it, it makes it really hard to get to any kind of value. So I think what we're doing, and what we're going to be seeing going forward, is applying AI to some of the things that, in a way, will close the gaps between the process and the people, and the data and the analytics that have been around for years. And we see those gaps closing with some of the tools that are emerging around preparing data. But really, when you start to bring some of that machine learning into that picture, and you start applying math to preparing data, that's where it gets really interesting. And I think we'll see some of that automation start to happen. >> So I got to ask you, what is unique about Pentaho? Take a minute to share with the audience some of the unique things that you guys are doing that's different in this sea of people trying to figure out big data. You guys are doing well, an6d you wrote a blog post that I referenced earlier yesterday, around these gaps. How, what's unique about Pentaho and what are you guys doing with examples that you could share? >> Yeah, so I think the big thing about Pentaho that's unique is that it's solving that analytics workflow from the data side. Always from the data. We've always believed that those two things go together. When you build a platform that's really flexible, it's based on open source technology, and you go into a world where a customer says, "I not only want to manage and have a data lake available," for instance, "I want to be able to have that thing extend over the years to support different groups of users. I don't want to deliver it to a tool, I want to deliver it to an application, I want to embed analytics." That's where having a complete end-to-end platform that can orchestrate the data and the analytics across the board is really unique. And what's happened is, it's like, the time has come. Where all we're hearing is, hey, I used to think it was throw some data over and, "here you go, here's the tools." The tools are really easy, so that's great. Now we have all kinds of people that can do analytics, but who's minding the data? With that end-to-end platform, we've always been able to solve for that. And when you move in the open source piece, that just makes it much easier when things like Spark emerge, right. Spark's amazing, right? But we know there's other things on the horizon. Flink, Beam, how are you going to deal with that without being kind of open source, so this is-- >> You guys made a good bet there, and your blog post got my attention because of the title. It wasn't click bait either, it was actually a great article, and I just shared it on Twitter. The Holy Grail of analytics is the value between data and insight. And this is interesting, it's about the data, it's in bold, data, data, data. Data's the hardest part. I get that. But I got to ask you, with cloud computing, you can see the trends of commoditization. You're renting stuff, and you got tools like Kinesis, Redshift on Amazon, and Azure's got tools, so you don't really own that, but the data, you own, right? >> Yeah, that's your intellectual property, right? >> But that's the heart of your piece here, isn't it, the Holy Grail. >> Yes, it is. >> What is that Holy Grail? >> Yeah, that Holy Grail is when you can bring those two things together. The analytics and the data, and you've got some governance, you've got the control. But you're allowing the access that lets the business derive value. For instance, we just had a customer, I think Eric might have mentioned it, but they're a really interesting customer. They're one of the largest community colleges in the country, Ivy Tech, and they won an award, actually, for their data excellence. But what's interesting about them is, they said we're going to create a data democracy. We want data to be available because we know that we see students dropping out, we can't be efficient, people can't get the data that they need, we have old school reporting. So they took Pentaho, and they really transformed the way they think about running their organization and their community colleges. Now they're adding predictive to that. So they've got this data democracy, but now they're looking at things like, "Okay we an see where certain classes are over capacity, but what if we could predict, next year, not only which classes are over capacity, what's the tendency of a particular student to drop out?" "What could we do to intervene?" That's where the kind of cool machine learning starts to apply. Well, Pentaho is what enables that data democracy across the board. I think that's where, when I look at it from a customer perspective, it's really kind of, it's only going to get more interesting. >> And with RFID and smart phones, you could have attendance tracking, too. You know, who's not showing up. >> Yeah absolutely. And you bring Hitachi into the picture, and you think about, for instance, from an IOT perspective, you might be capturing data from devices, and you've got a digital twin, right? And then you bring that data in with data that might be in a data lake, and you can set a threshold, and say, "Okay, not only do we want to be able to know where that student is," or whatever, "we want to trigger something back to that device," and say, "hey, here's a workshop for you to login to right away, so that you don't end up not passing a class." Or whatever it is, it's a simplistic model, but you can imagine where that starts to really become transformative. >> So I asked Eric a question yest6erday. It was from Dave Valante, who's in Boston, stuck in the snowstorm, but he was watching, and I'll ask you and see how it matches. He wrote it differently on Crouch, it was public, but this is in my chat, "HDS is known for main frames, historically, and storage, but Hitachi is an industrial giant. How is Pentaho leveraging the Hitachi monster?" >> Yes, that's a great way to put it. >> Or Godzilla, because it's Japan. >> We were just comparing notes. We were like, "Well, is it an $88 billion company or $90 billion. According to the yen today, it's 88. We usually say 90, but close enough, right? But yeah, it's a huge company. They're in every industry. Make all kinds of things. Pretty much, they've got the OT of the world under their belt. How we're leveraging it is number one, what that brings to the table, in terms of the transformations from a software perspective and data that we can bring to the table and the expertise. The other piece is, we've got a huge opportunity, via the Hitachi channel, which is what's seeing for us the growth that we've had over the last couple of years. It's been really significant since we were acquired. And then the next piece is how do we become part of that bigger Hitachi IOT strategy. And what's been starting to happen there is, as I mentioned before, you can kind of probably put the math together without giving anything away. But you think about capturing, being able to capture device data, being able to bring it into the digital twin, all of that. And then you think about, "Okay, and what if I added Pentaho to the mix?" That's pretty exciting. You bring those things together, and then you add a whole bunch of expertise and machine learning and you're like, okay. You could start to do, you could start to see where the IOT piece of it is where we're really going to-- >> IOT is a forcing function, would you agree? >> Yes, absolutely. >> It's really forcing IT to go, "Whoa, this is coming down fast." And AI and machine learning, and cloud, is just forcing everyone. >> Yeah, exactly. And when we came into the big data market, whatever it was, five years ago, in the early market it's always hard to kind of get in there. But one of the things that we were able to do, when it was sort of, people were still just talking about BI would say, "Have you heard about this stuff called big data, it's going to be hard." You are going to have to take advantage of this. And the same thing is happening with IOT. So the fact that we can be in these environments where customers are starting to see the value of the machine generated data, that's going to be-- >> And it's transformative for the business, like the community college example. >> Totally transformative, yeah. The other one was, I think Eric might have mentioned, the IMS, where all the sudden you're transforming the insurance industry. There's always looking at charts of, "I'm a 17-year-old kid," "Okay, you're rate should be this because you're a 17-year-old boy." And now they're starting to track the driving, and say, "Well, actually, maybe not, maybe you get a discount." >> Time for the self-driving car. >> Transforming, yeah. >> Well, Donna, I appreciate it. Give us a quick tease here, on Pentaho World coming in October. I know it's super early, but you have a roadmap on the product side, so you can see a little bit around the corner. >> Donna: Yeah. >> What is coming down the pike for Pentaho? What are the things that you guys are beavering away at inside the product group? >> Yeah, I think you're going to see some really cool innovations we're doing. I won't, on the Spark side, but with execution engines, in general, we're going to have some really interesting kind of innovative stuff coming. More on the machine learning coming out, and if you think about, if data is, you know what, is the hard part, just think about applying machine learning to the data, and I think you can think of some really cool things, we're going to come up with. >> We're going to need algorithms for the algorithms, machine learning for the machine learning, and, of course, humans to be smarter. Donna, thanks so much for sharing here inside theCUBE, appreciate it. >> Thank you. >> Pentaho, check them out. Going to be at Pentaho World in October, as well, in theCUBE, and hopefully we can get some more deep dives on, with their analyst group, for what's going on with the engines of innovation there. More CUBE coverage live from Silicon Valley for Big Data SV, in conjunction with Strata Hadoop, I'm John Furrier. Be right back with more after this short break. (techno music)

Published Date : Mar 16 2017

SUMMARY :

it's theCUBE. and great to have guests come by. but I think you've had some you have Orlando, I think is October. Yeah, October, so I'm because it's early in the year. What's the update? that the Big Data World, and how do you enable the data scientists What's the trends you are seeing and the data and the analytics and what are you guys doing that can orchestrate the but the data, you own, right? But that's the heart of The analytics and the data, you could have attendance tracking, too. and you think about, for and I'll ask you and see how it matches. of the transformations And AI and machine learning, and cloud, And the same thing is happening with IOT. for the business, the IMS, where all the on the product side, so and I think you can think for the algorithms, Going to be at Pentaho

ENTITIES

Entity	Category	Confidence
Donna	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Donna Prlich	PERSON	0.99+
Dave Valante	PERSON	0.99+
Eric	PERSON	0.99+
Ivy Tech	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Boston	LOCATION	0.99+
$88 billion	QUANTITY	0.99+
$90 billion	QUANTITY	0.99+
Japan	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
October	DATE	0.99+
Silicon Valley	LOCATION	0.99+
next year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
88	QUANTITY	0.99+
90	QUANTITY	0.99+
two things	QUANTITY	0.99+
17-year	QUANTITY	0.99+
NYC	LOCATION	0.99+
Big Data SV	ORGANIZATION	0.98+
Orlando	LOCATION	0.98+
five years ago	DATE	0.98+
Pentaho	ORGANIZATION	0.98+
five years ago	DATE	0.98+
today	DATE	0.98+
#BigDataSV	EVENT	0.98+
Informatica	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Silicon Valley	LOCATION	0.97+
Big Data SV	EVENT	0.95+
Spark	TITLE	0.94+
about 25	QUANTITY	0.93+
17-year-old	QUANTITY	0.92+
Pentaho	PERSON	0.91+
Twitter	ORGANIZATION	0.9+
Big Data World	EVENT	0.9+
Azure	ORGANIZATION	0.88+
Big Data Silicon Valley 2017	EVENT	0.88+
Big Data World	ORGANIZATION	0.85+
Big Data	EVENT	0.84+
Pentaho World	ORGANIZATION	0.81+
Pentaho	LOCATION	0.8+
Kinesis	ORGANIZATION	0.8+
Beam	PERSON	0.78+
6data	QUANTITY	0.78+
Redshift	ORGANIZATION	0.74+
Pentaho World	LOCATION	0.74+
Flink	ORGANIZATION	0.67+
yen	OTHER	0.66+
twin	QUANTITY	0.63+
HDS	ORGANIZATION	0.62+
Crouch	ORGANIZATION	0.62+
earlier yesterday	DATE	0.62+
CUBE	ORGANIZATION	0.61+
last couple of years	DATE	0.59+
Pentaho World	ORGANIZATION	0.58+
50	QUANTITY	0.58+

Murthy Mathiprakasam, - Informatica - Big Data SV 17 - #BigDataSV - #theCUBE1

(electronic music) >> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are live in Silicon Valley for Big Data Silicon Valley. Our companion showed at Big Data NYC in conjunction with Strata Hadoop, Big Data Week. Our next guest is Murthy Mathiprakasam, with the director of product marketing Informatica. Did I get it right? >> Murthy: Absolutely (laughing)! >> Okay (laughing), welcome back. Good to see you again. >> Good to see you! >> Informatica, you guys had a AMIT on earlier yesterday, kicking off our event. It is a data lake world out there, and the show theme has been, obviously beside a ton of machine learning-- >> Murthy: Yep. >> Which has been fantastic. We love that because that's a real trend. And IOT has been a subtext to the conversation and almost a forcing function. Every year the big data world is getting more and more pokes and levers off of Hadoop to a variety of different data sources, so a lot of people are taking a step back, and a protracted view of their landscape inside their own companies and, saying, Okay, where are we? So kind of a checkpoint in the industry. You guys do a lot of work with customers, your history with Informatica, and certainly over the past few years, the change in focus, certainly on the product side, has been kind of interesting. You guys have what looks like to be a solid approach, a abstraction layer for data and metadata, to be the keys to the kingdom, but yet not locking it down, making it freely available, yet provide the governance and all that stuff. >> Murthy: Exactly. >> And my interview with AMIT laid it all out there. But the question is what are the customers doing? I'd like to dig in, if you could share just some of the best practices. What are you seeing? What are the trends? Are they taking a step back? How is IOT affecting it? What's generally happening? >> Yeah, I know, great question. So it has been really, really exciting. It's been kind of a whirlwind over the last couple years, so many new technologies, and we do get the benefit of working with a lot of very, very, innovative organizations. IOT is really interesting because up until now, IOT's always been sort of theoretical, you're like, what's the thing? >> John: Yeah. (laughing) What's this Internet of things? >> But-- >> And IT was always poo-pooing someone else's department (laughing). >> Yeah, exactly. But we have actually have customers doing this now, so we've been working with automative manufacturers on connected vehicle initiatives, pulling sensor data, been working with oil and gas companies, connected meters and connected energy, manufacturing, logistics companies, looking at putting meters on trucks, so they can actually track where all the trucks are going. Huge cost savings and service delivery kind of benefits from all this stuff, so you're absolutely right IOT, I think is finally becoming real. And we have a streaming solution that kind of works on top of all the open source streaming platforms, so we try to simplify everything, just like we have always done. We did that MapReduce, with Spark, now with all the streaming technologies. You gave a graphical approach where you can go in and say, Well, here's what the kind of processing we want. You'd lay it out visually and it executes in the Hadoop cluster. >> I know you guys have done a great job with the product, it's been very complimentary you guys, and it's almost as if there's been an transformation within Informatica. And I know you went private and everything, but a lot of good product shops there. You guys got a lot good product guys, so I got to ask you the question, I don't see IOT sometimes as an operational technology component, usually running their own stacks, not even plugged into IT, so that's the whole another story. I'll get to that in a second. But the trend here is you have the batch world, companies that have been in this ecosystem here that are on the show floor, at O'Reilly Media, or talking to us on The Cube. Some have been just pure play batch-related! Then the fashionable steaming technologies have come out, but what's happened with Spark, you're starting to see the collision between batch and realtime-- >> Umm-hmm. >> Called streaming or what not. And at the center of that's the deep learning, it's the IOT, and it's the AI, that's going to be at the intersection of these two colliding forces, so you can't have a one-trick pony here and there. You got to kind of have a blended, more of a holistic, horizontal, scalable approach. >> Murthy: Yes. >> So I want to get your reaction to that. And two, what product gaps and organizational gaps and process gaps emerge from this trend? And what do you guys do? So, three-part question. >> Murthy: Yeah (laughing). >> Go ahead. Go ahead. >> I'll try to cover all three. >> So, first, the collision and your reaction to that trend. >> Murthy: Yeah, yeah. >> And then the gaps. >> Absolutely. So basically, you know Informatica, we've supported every type of kind of variation of these type of environments, and so we're not really a believer in it's this or that. It's not on premise or cloud, it's not realtime or batch. We want to make it simple and no matter how you want to process the data, or where you want to process it. So customers who use our platform for their realtime or streaming solutions, are using the same interface, as if they were doing it batched. We just run it differently under the hood. And so, that simplifies and makes a lot of these initiatives more practical because you might start with a certain latency, and you think maybe it's okay to do it at one speed. Maybe you decide to change. It could be faster or slower, and you don't have to go through code rewrites and just starting completely from scratch. That's the benefit of the abstraction layer, like you were saying. And so, I think that's one way that organizations can shield themselves from the question because why even pose that question in the first... Why is it either this or that? Why not have a system that you can actually tune and maybe today you want to start batch, and tomorrow you evolve it to be more streaming and more realtime. Help me on the-- >> John: On the gaps-- >> Yes. >> Always product gaps because, again, you mentioned that you're solving it, and that might be an integration challenge for you guys. >> Yep. >> Or an integration solution for you guys, challenge, opportunity, whatever you guys want to call it. >> Absolutely! >> Organizational gaps maybe not set up for and then processed. >> Right. I think it was interesting that we actually went out to dinner with a couple of customers last night. And they were talking a lot about the organizational stuff because the technology they're using is Informatica, so that's part's easy. So, they're like, Okay, it's always the stuff around budgeting, it's around resourcing, skills gap, and we've been talking about this stuff for a long time, right. >> John: Yeah. >> But it's fascinating, even in 2017, it's still a persistent issue, and part of what their challenge was is that even the way IT projects have been funded in the past. You have this kind of waterfall-ish type of governance mechanism where you're supposed to say, Oh, what are you going to do over the next 12 months? We're going to allocate money for that. We'll allocate people for that. Like, what big data project takes 12 months? Twelve months you're going to have a completely (laughing) different stack that you're going to be working with. And so, their challenge is evolving into a more agile kind of model where they can go justify quick-hit projects that may have very unknown kind of business value, but it's just getting by in that... Hey, sometime might be discovered here? This is kind of an exploration-use case, discovery, a lot of this IOT stuff, too. People are bringing back the sensor data, you don't know what's going to coming out of that or (laughing)-- >> John: Yeah. >> What insights you're going to get. >> So there's-- >> Frequency, velocity, could be completely dynamic. >> Umm-hmm. Absolutely! >> So I think part of the best practice is being able to set outside of this kind of notion of innovation where you have funding available for... Get a small cross-functional team together, so this is part of the other aspect of your question, which is organizationally, this isn't just IT. You got to have the data architects from IT, you got to have the data engineers from IT. You got to have data stewards from the line of business. You got business analysts from the line of business. Whenever you get these guys together-- >> Yeah. >> Small core team, and people have been talking about this, right. >> John: Yeah. >> Agile development and all that. It totally applies to the data world. >> John: And the cloud's right there, too, so they have to go there. >> Murthy: That's right! Exactly. So you-- >> So is the 12-month project model, the waterfall model, however you want... maybe 24 months more like it. But the problem on the fail side there is that when they wake up and ship the world's changed, so there's kind of a diminishing return. Is that kind of what you're getting out there on that fail side? >> Exactly. It's all about failing fast forward and succeeding very quickly as well. And so, when you look at most of the successful organizations, they have radically faster project lifecycles, and this is all the more reason to be using something like Informatica, which abstracts all the technology away, so you're not mired in code rewrites and long development cycles. You just want to ship as quickly as possible, get the organization by in that, Hey, we can make this work! Here's some new insights that we never had before. That gets you the political capital-- >> John: Yeah. >> For the next project, the next project, and you just got to keep doing that over and over again. >> Yeah, yeah. I always call that agile more of a blank check in a safe harbor because, in case you fail forward, (laughing) I'm failing forward. (laughing) You keep your job, but there's some merit to that. But here's the trick question for you: Now let's talk about hybrid. >> Umm-hmm. >> On prem and cloud. Now, that's the real challenge. What are you guys doing there because now I don't want to have a job on prem. I don't want to have a job on the cloud. That's not redundancy, that's inefficient, that's duplicates. >> Yes. >> So that's an issue. So how do you guys tee it up there for the customer? And what's the playbook for them, and people who are trying to scratching their heads saying, I want on prem. And Oracle got this right. Their earnings came out pretty good, same code on prem, off prem, same code base. So workloads can move depending upon the use cases. >> Yep. >> How do you guys compare? >> Actually that's the exact same approach that we're taking because, again, it's all about that customer shouldn't have to make the either or-- >> So for you guys, interfacing code same on prem and cloud. >> That's right. So you can run our big data solutions on Amazon, Microsoft, any kind of cloud Hadoop environment. We can connect to data sources that are in the cloud, so different SAAS apps. >> John: Umm-hmm. >> If you want to suck data out of there. We got all the out-of-the-box connectivity to all the major SAAS applications. And we can also actually leverage a lot of these new cloud processing engines, too. So we're trying to be the abstraction layer, so now it's not just about Spark and Spark streaming, there's all these new platforms that are coming out in the cloud. So we're integrating with that, so you can use our interface and then push down the processing to a cloud data processing system. So there's a lot of opportunity here to use cloud, but, again, we don't want to be... We want to make things more flexible. It's all about enabling flexibility for the organization. So if they want to go cloud, great. >> John: Yep. >> There's plenty of organizations that if they don't want to go cloud, that's fine, too. >> So if I get this right, standard interface on prem and cloud for the usability, under the hood it's integration points in clouds, so that data sources, whatever they are and through whatever could be Kinesis coming off Amazon-- >> Exactly! >> Into you guys, or Ah-jahs got some stuff-- >> Exactly! >> Over there, That all works under the hood. >> Exactly! >> Abstracts from the user. >> That's right! >> Okay, so the next question is, okay, to go that way, that means it's a multicloud world. You probably agree with that. Multicloud meaning, I'm a customer. I might have multiple workloads on multiple clouds. >> That's where it is today. I don't know if that's the endgame? And obviously all this is changing very, very quickly. >> Okay (laughing). >> So I mean, Informatica we're neutral across multiple vendors and everything. So-- >> You guys are Switzerland. >> We're the Switzerland (laughing), so we work with all the major cloud providers, and there's new one that we're constantly signing up also, but it's unclear how the market rule shipped out. >> Umm-hmm. >> There's just so much information out there. I think it's unlikely that you're going to see mass consolidation. We all know who the top players are, and I think that's where a lot of large enterprises are investing, but we'll see how things go in the future, too. >> Where should customers spend their focus because this you're seeing the clouds. I was just commenting about Google yesterday, with AMIT, AI, and others. That they're to be enterprise-ready. You guys are very savvy in the enterprising, there's a lot of table stakes, SLAs to integration points, and so, there's some clouds that aren't ready for prime time, like Google for the enterprise. Some are getting there fast like Amazon Ah-jahs super enterprise-friendly. They have their own problems and opportunities. But they are very strong on the enterprise. What do you guys advise customers? What are they looking at right now? Where should they be spending their time, writing more code, scripts, or tackling the data? How do you guys help them shift their focus? >> Yeah, yeah! >> And where-- >> And definitely not scripts (laughing). >> It's about the worst thing you can do because... And it's all for all the reasons we understand. >> Why is that? >> Well, again, we we're talking about being agile. There's nothing agile about manually sitting there, writing Java code. Think about all the developers that were writing MapReduce code three or four years ago (laughing). Those guys, well, they're probably looking for new jobs right now. And with the companies who built that code, they're rewriting all of it. So that approach of doing things at the lowest possible level doesn't make engineering sense. That's why the kind of abstraction layer approach makes so much better sense. So where should people be spending their time? It's really... The one thing technology cannot do is it can't substitute for context. So that's business context, understanding if you're in healthcare there's things about the healthcare industry that only that healthcare company could possibly know, and know about their data, and why certain data is structured the way it is. >> John: Yeah. >> Or financial services or retail. So business context is something that only that organization can possibly bring to the table, and organizational context, as you were alluding to before, roles and responsibilities, who should have access to data, who shouldn't have access to data, That's also something that can be prescribed from the outside. It's something that organizations have to figure out. Everything else under the hood, there's no reason whatsoever to be mired in these long code cycles. >> John: Yeah. >> And then you got to rewrite it-- >> John: Yeah. >> And you got to maintain it. >> So automation is one level. >> Yep. >> Machine learning is a nice bridge between the taking advantage of either vertical data, or especially, data for that context. >> Yep. >> But then the human has to actually synthesize it. >> Right! >> And apply it. That's the interface. Did I get that right, that progression? >> Yeah, yeah. Absolutely! And the reason machine learning is so cool... And I'm glad you segway into that. Is that, so it's all about having the machine learning assist the human, right. So the humans don't go away. We still have to have people who understand-- >> John: Okay. >> The business context and the organizational context. But what machine learning can do is in the world of big data... Inherently, the whole idea of big data is that there's too much data for any human to mentally comprehend. >> John: Yeah. >> Well, you don't have to mentally comprehend it. Let the machine learning go through, so we've got this unique machine learning technology that will actually scan all the data inside of Hadoop and outside of Hadoop, and it'll identify what the data is-- >> John: Yeah. >> Because it's all just pattern matching and correlations. And most organizations have common patterns to their data. So we figured up all this stuff, and we can say, Oh, you got credit card information here. Maybe you should go look at that, if that's not supposed to be there (laughing). Maybe there's a potential violation there? So we can focus the manual effort onto the places where it matters, so now you're looking at issues, problems, instead of doing the day-to-day stuff. The day-to-day stuff is fully automated and that's not what organizations-- >> So the guys that are losing their jobs, those Java developers writing scripts, to do the queries, where should they be focusing? Where should they look for jobs? Because I would agree with you that their jobs would be because the the MapReduce guys and all the script guys and the Java guys... Java has always been the bulldozer of the programming language, very functional. >> Murthy: Yep. >> But where those guys go? What's your advice for... We have a lot of friends, I'm sure you do, too. I know a lot of friends who are Java developers who are awesome programmers. >> Yeah. >> Where should they go? >> Well, so first, I'm not saying that Java's going to go away, obviously (laughing). But I think Java-- >> Well, I mean, Java guys who are doing some of the payload stuff around some of the deep--- >> Exactly! >> In the bowels of big data. >> That's right! Well, there's always things that are unique to the organization-- >> Yeah. >> Custom applications, so all that stuff is fine. What we're talking about is like MapReduce coding-- >> Yeah, what should they do? What should those guys be focusing on? >> So it's just like every other industry you see. You go up the value stack, right. >> John: Right. >> So if you can become more of the data governor, the data stewards, look at policy, look at how you should be thinking about organizational context-- >> John: And governance is also a good area. >> And governance, right. Governance jobs are just going to explode here because somebody has to define it, and technology can't do this. Somebody has to tell the technology what data is good, what data is bad, when do you want to get flagged if something is going wrong, when is it okay to send data through. Whoever decides and builds those rules, that's going to be a place where I think there's a lot of opportunities. >> Murthy, final question. We got to break, we're getting the hook sign here, but we got Informatica World coming up soon in May. What's going to be on the agenda? What should we expect to hear? What's some of the themes that you could tease a little bit, get people excited. >> Yeah, yeah. Well, one thing we want to really provide a lot of content around the journey to the cloud. And we've been talking today, too, there's so many organizations who are exploring the cloud, but it's not easy, for all the reasons we just talked about. Some organizations want to just kind of break away, take out, rip out everything in IT, move all their data and their applications to the cloud. Some of them are taking more of a progressive journey. So we got customers who've been on the leading front of that, so we'll be having a lot of sessions around how they've done this, best practices that they've learned. So hopefully, it's a great opportunity for both our current audience who's always looked to us for interesting insights, but also all these kind of emerging folks-- >> Right. >> Who are really trying to figure out this new world of data. >> Murthy, thanks so much for coming on The Cube. Appreciate it. Informatica World coming up. You guys have a great solution, and again, making it easier (laughing) for people to get the data and put those new processes in place. This is The Cube breaking it down for Big Data SV here in conjunction with Strata Hadoop. I'm John Furrier. More live coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, Did I get it right? Good to see you again. and the show theme has been, So kind of a checkpoint in the industry. What are the trends? over the last couple years, John: Yeah. And IT was always poo-pooing and it executes in the Hadoop cluster. so I got to ask you the question, and it's the AI, And what do you guys do? Go ahead. So, first, the collision and you don't have to and that might be an integration for you guys, not set up for and then processed. it's always the stuff around is that even the way IT could be completely dynamic. Umm-hmm. from the line of business. and people have been and all that. John: And the cloud's right there, too, So you-- So is the 12-month project model, at most of the successful organizations, and you just got to keep doing But here's the trick question for you: Now, that's the real challenge. So how do you guys So for you guys, sources that are in the cloud, the processing to a cloud that if they don't want to go cloud, That all works under the hood. Okay, so the next question I don't know if that's the endgame? So I mean, Informatica We're the Switzerland (laughing), go in the future, too. Google for the enterprise. And it's all for all the Think about all the from the outside. is a nice bridge between the has to actually synthesize it. That's the interface. So the humans don't go away. and the organizational context. Let the machine learning go through, instead of doing the day-to-day stuff. So the guys that are losing their jobs, I'm sure you do, too. going to go away, obviously (laughing). so all that stuff is fine. So it's just like every John: And governance that's going to be a place where I think What's some of the themes that you could for all the reasons we just talked about. to figure out this new world of data. get the data and put those

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2017	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Murthy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AMIT	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Twelve months	QUANTITY	0.99+
Java	TITLE	0.99+
Informatica	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
24 months	QUANTITY	0.99+
May	DATE	0.99+
tomorrow	DATE	0.99+
yesterday	DATE	0.99+
Google	ORGANIZATION	0.99+
Spark	TITLE	0.99+
first	QUANTITY	0.99+
last night	DATE	0.99+
today	DATE	0.98+
Murth	PERSON	0.98+
Informatica World	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
two	QUANTITY	0.98+
three-part	QUANTITY	0.98+
three	QUANTITY	0.98+
both	QUANTITY	0.97+
three	DATE	0.96+
NYC	LOCATION	0.96+
Big Data Week	EVENT	0.96+
one level	QUANTITY	0.96+
one	QUANTITY	0.96+
one speed	QUANTITY	0.96+
two colliding forces	QUANTITY	0.95+
one-trick	QUANTITY	0.93+
MapReduce	TITLE	0.93+
one way	QUANTITY	0.93+
four years ago	DATE	0.92+
#BigDataSV	TITLE	0.91+
Kinesis	ORGANIZATION	0.87+
The Cube	ORGANIZATION	0.86+
MapReduce	ORGANIZATION	0.85+
agile	TITLE	0.84+
Big Data	ORGANIZATION	0.81+

Raymie Stata, SAP - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: From San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Welcome back everyone. We are at Big Data Silicon Valley, running in conjunction with Strata + Hadoop World in San Jose. I'm George Gilbert and I'm joined by Raymie Stata, and Raymie was most recently CEO and Founder of Altiscale. Hadoop is a service vendor. One of the few out there, not part of one of the public clouds. And in keeping with all of the great work they've done, they got snapped up by SAP. So, Rami, since we haven't seen you, I think on The Cube since then, why don't you catch us up with all that, the good work that's gone on between you and SAP since then. >> Sure, so the acquisition closed back in September, so it's been about six months. And it's been a very busy six months. You know, there's just a lot of blocking and tackling that needs to happen. So, you know, getting people on board. Getting new laptops, all that good stuff. But certainly a huge effort for us was to open up a data center in Europe. We've long had demand to have that European presence, both because I think there's a lot of interest over in Europe itself, but also large, multi-national companies based in the US, you know, it's important for them to have that European presence as well. So, it was a natural thing to do as part of SAP, so kind of first order of business was to expand over into Europe. So that was a big exercise. We've actually had some good traction on the sales side, right, so we're getting new customers, larger customers, more demanding customers, which has been a good challenge too. >> So let's pause for a minute on, sort of unpack for folks, what Altiscale offered, the core services. >> Sure. >> That were, you know, here in the US, and now you've extended to Europe. >> Right. So our core platform is kind of Hadoop, Hive, and Spark, you know, as a service in the cloud. And so we would offer HDFS and YARN for Hadoop. Spark and Hive kind of well-integrated. And we would offer that as a cloud service. So you would just, you know, get an account, login, you know, store stuff in HDFS, run your Spark programs, and the way we encourage people to think about it is, I think very often vendors have trained folks in the big data space to think about nodes. You know, how many nodes am I going to get? What kind of nodes am I going to get? And the way we really force people to think twice about Hadoop and what Hadoop as a service means is, you know, they don't, why are you asking that? You don't need to know about nodes. Just store stuff, run your jobs. We worry about nodes. And that, you know, once people kind of understood, you know, just how much complexity that takes out of their lives and how that just enables them to truly focus on using these technologies to get business value, rather that operating them. You know, there's that aha moment in the sales cycle, where people say yeah, that's what I want. I want Hadoop as a service. So that's been our value proposition from the beginning. And it's remained quite constant, and even coming into SAP that's not changing, you know, one bit. >> So, just to be clear then, it's like a lot of the operational responsibilities sort of, you took control over, so that when you say, like don't worry about nodes, it's customer pours x amount of data into storage, which in your case would be HDFS, and then compute is independent of that. They need, you spin up however many, or however much capacity they need, with Spark for instance, to process it, or Hive. Okay, so. >> And all on demand. >> Yeah so it sounds like it's, how close to like the Big Query or Athena services, Athena on AWS or Big Query on Google? Where you're not aware of any servers, either for storage or for compute? >> Yeah I think that's a very good comparable. It's very much like Athena and Big Query where you just store stuff in tables and you issue queries and you don't worry about how much compute, you know, and managing it. I think, by throwing, you know, Spark in the equation, and YARN more generally, right, we can handle a broader range of these cases. So, for example, you don't have to store data in tables, you can store them into HDFS files which is good for processing log data, for example. And with Spark, for example, you have access to a lot of machine learning algorithms that are a little bit harder to run in the context of, say, Athena. So I think it's the same model, in terms of, it's fully operated for you. But a broader platform in terms of its capabilities. >> Okay, so now let's talk about what SAP brought to the table and how that changed the use cases that were appropriate for Altiscale. You know, starting at the data layer. >> Yeah, so, I think the, certainly the, from the business perspective, SAP brings a large, very engaged customer base that, you know, is eager to embrace, kind of a data-driven mindset and culture and is looking for a partner to help them do that, right. And so that's been great to be in that environment. SAP has a number of additional technologies that we've been integrating into the Altiscale offering. So one of them is Vora, which is kind of an interactive sequel engine, it also has time series capabilities and graph capabilities and search capabilities. So it has a lot of additive capabilities, if you will, to what we have at Altiscale. And it also integrates very deeply into HANA itself. And so we now have that for a technology available as a service at Altiscale. >> Let me make sure, so that everyone understands, and so I understand too, is that so you can issue queries from HANA and they can, you know, beyond just simple sequel queries, they can handle the time series, and predictive analytics, and access data sort of seamlessly that's in Hadoop, or can it go the other way as well? >> It's both ways. So you can, you know, from HANA you can essentially federate out into Vora. And through that access data that's in a Hadoop cluster. But it's also the other way around. A lot of times there's an analyst who really lives in the big data world, right, they're in the Hadoop world, but they want to join in data that's sitting in a HANA database, you know. Might be dimensions in a warehouse or, you know, customer details even in a transactional system. And so, you know, that Hadoop-based analyst now has access to data that's out in those HANA databases. >> Do you have some Lighthouse accounts that are working with this already? >> Yes, we do. (laughter) >> Yes we do, okay. I guess that was the diplomatic way of saying yes. But no comment. Alright, so tell us more about SAPs big data stack today and how that might evolve. >> Yeah, of course now, especially that now we've got the Spark, Hadoop, Hive offering that we have. And then four sitting on top of that. There's an offering called Predictive Analytics, which is Spark-based predictive analytics. >> Is that something that came from you, or is that, >> That's an SAP thing, so this is what's been great about the acquisition is that SAP does have a lot of technologies that we can now integrate. And it brings new capabilities to our customer base. So those three are kind of pretty key. And then there's something called Data Services as well, which allows us to move data easily in and out of, you know, HANA and other data stores. >> Is it, is this ability to federate queries between Hadoop and HANA and then migration of the data between the stores, does that, has that changed the economics of how much data people, SAP customers, maintain and sort of what types of apps they can build on it now that they might, it's economically feasible to store a lot more data. >> Well, yes and no. I think the context of Altiscale, both before and after the acquisition is very often there's, what you might call a big data source, right. It could be your web logs, it could be some IOT generated log data, it could be social media streams. You know, this is data that's, you know, doesn't have a lot of structure coming in. It's fairly voluminous. It doesn't, very naturally, go into a sequel database, and that's kind of the sweet spot for the big data technologies like Hadoop and Spark. So, those datas come into your big data environment. You can transform it, you can do some data quality on it. And then you can eventually stage it out into something like HANA data mart, where it, you know, to make it available for reporting. But obviously there's stuff that you can do on the larger dataset in Hadoop as well. So, in a way, yes, you can now tame, if you will, those huge data sources that, you know, weren't practical to put into HANA databasing. >> If you were to prioritize, in the context of, sort of, the applications SAP focuses on, would you be, sort of, with the highest priority use case be IOT related stuff, where, you know, it was just prohibitive to put it in HANA since it's mostly in memory. But, you know, SAP is exposed to tons of that type of data, which would seem to most naturally have an afinity to Altiscale. >> Yeah, so, I mean, IOT is a big initiative. And is a great use case for big data. But, you know, financial-to-financial services industry, as another example, is fairly down the path using Hadoop technologies for many different use cases. And so, that's also an opportunity for us. >> So, let me pop back up, you know, before we have to wrap. With Altiscale as part of the SAP portfolio, have the two companies sort of gone to customers with a more, with more transformational options, that, you know, you'll sell together? >> Yeah, we have. In fact, Altiscale actually is no longer called Altiscale, right? We're part of a portfolio of products, you know, known as the SAP Cloud Platform. So, you know, under the cloud platform we're the big data services. The SAP Cloud Platform is all about business transformation. And business innovation. And so, we bring to that portfolio the ability to now bring the types of data sources that I've just discussed, you know, to bear on these transformative efforts. And so, you know, we fit into some momentum SAP already has, right, to help companies drive change. >> Okay. So, along those lines, which might be, I mean, we know the financial services has done a lot of work with, and I guess telcos as well, what are some of the other verticals that look like they're primed to fall, you know, with this type of transformational network? >> So you mentioned one, which I kind of call manufacturing, right, and there tends to be two kind of different use cases there. One of them I call kind of the shop floor thing. Where you're collecting a lot of sensor data, you know, out of a manufacturing facility with the goal of increasing yield. So you've got the shop floor. And then you've got the, I think, more commonly discussed measuring stuff out in the field. You've got a product, you know, out in the field. Bringing the telemetry back. Doing things like predictive meetings. So, I think manufacturing is a big sector ready to go for big data. And healthcare is another one. You know, people pulling together electronic medical records, you know trying to combine that with clinical outcomes, and I think the big focus there is to drive towards, kind of, outcome-based models, even on the payment side. And big data is really valuable to drive and assess, you know, kind of outcomes in an aggregate way. >> Okay. We're going to have to leave it on that note. But we will tune back in at I guess Sapphire or TechEd, whichever of the SAP shows is coming up next to get an update. >> Sapphire's next. Then TechEd. >> Okay. With that, this is George Gilbert, and Raymie Stata. We will be back in few moments with another segment. We're here at Big Data Silicon Valley. Running in conjunction with Strata + Hadoop World. Stay tuned, we'll be right back.

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, covering Big One of the few out there, companies based in the US, you So let's pause for a minute That were, you know, here in the US, And that, you know, once so that when you say, you know, and managing it. You know, starting at the data layer. very engaged customer base that, you know, And so, you know, that Yes, we do. and how that might evolve. the Spark, Hadoop, Hive in and out of, you know, migration of the data You know, this is data that's, you know, be IOT related stuff, where, you know, But, you know, financial-to-financial So, let me pop back up, you know, And so, you know, we fit into you know, with this type you know, out of a manufacturing facility We're going to have to Gilbert, and Raymie Stata.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
George Gilbert	PERSON	0.99+
George Gilbert	PERSON	0.99+
September	DATE	0.99+
US	LOCATION	0.99+
Raymie Stata	PERSON	0.99+
Altiscale	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Raymie	PERSON	0.99+
One	QUANTITY	0.99+
six months	QUANTITY	0.99+
TechEd	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
HANA	TITLE	0.99+
SAP	ORGANIZATION	0.99+
Rami	PERSON	0.99+
Hadoop	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Big Data	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Sapphire	ORGANIZATION	0.99+
both	QUANTITY	0.98+
twice	QUANTITY	0.98+
SAP Cloud Platform	TITLE	0.98+
one	QUANTITY	0.98+
about six months	QUANTITY	0.98+
Spark	TITLE	0.98+
AWS	ORGANIZATION	0.98+
Google	ORGANIZATION	0.97+
both ways	QUANTITY	0.97+
Athena	TITLE	0.97+
Strata + Hadoop World	ORGANIZATION	0.96+
Strata	ORGANIZATION	0.92+
Predictive Analytics	TITLE	0.91+
Athena	ORGANIZATION	0.91+
one bit	QUANTITY	0.9+
first order	QUANTITY	0.89+
The Cube	ORGANIZATION	0.89+
Vora	TITLE	0.88+
Big Query	TITLE	0.87+
today	DATE	0.86+

Bruno Aziza & Josh Klahr, AtScale - Big Data SV 17 - #BigDataSV - #theCUBE1

>> Announcer: Live from San Jose, California, it's The Cube. Covering Big Data, Silicon Valley, 2017. (electronic music) >> Okay, welcome back everyone, live at Silicon Valley for the big The Cube coverage, I'm John Furrier, with me Wikibon analyst George Gilbert, Bruno Aziza, who's on the CMO of AtScale, Cube alumni, and Josh Klahr VP at AtScale, welcome to the Cube. >> Welcome back. >> Thank you. >> Thanks, Brian. >> Bruno, great to see you. You look great, you're smiling as always. Business is good? >> Business is great. >> Give us the update on AtScale, what's up since we last saw you in New York? >> Well, thanks for having us, first of all. And, yeah, business is great, we- I think Last time I was here on The Cube we talked about the Hadoop Maturity Survey and at the time we'd just launched the company. And, so now you look about a year out and we've grown about 10x. We have large enterprises across just about any vertical you can think of. You know, financial services, your American Express, healthcare, think about ETNA, SIGNA, GSK, retail, Home Depot, Macy's and so forth. And, we've also done a lot of work with our partner Ecosystem, so Mork's- OEM's AtScale technology which is a great way for us to get you AtScale across the US, but also internationally. And then our customers are getting recognized for the work that they are doing with AtScale. So, last year, for instance, Yellowpages got recognized by Cloudera, on their leadership award. And Macy's got a leadership award as well. So, things are going the right trajectory, and I think we're also benefitting from the fact that the industry is changing, it's maturing on the the big data side, but also there's a right definition of what business intelligence means. This idea that you can have analytics on large-scale data without having to change your visualization tools and make that work with existing stock you have in place. And, I think that's been helping us in growing- >> How did you guys do it? I mean, you know, we've talked many times in there's some secret sauce there, but, at the time when you guys were first starting it was kind of crowded field, right? >> Bruno: Yeah. >> And all these BI tools were out there, you had front end BI tools- >> Bruno: Yep. But everyone was still separate from the whole batch back end. So, what did you guys do to break out? >> So, there's two key differentiators with AtScale. The first one is we are the only platform that does not have a visualization tool. And, so people think about this as, that's a bug, that's actually a feature. Because, most enterprises have already that stuff made with traditional BI tools. And so our ability to talk to MDX and SQL types of BI tools, without any changes is a big differentiator. And then the other piece of our technology, this idea that you can get the speed, the scale and security on large data sets without having to move the data. It's a big differentiation for our enterprise to get value out of the data. They already have in Hadoop as well as non-Hadoop systems, which we cover. >> Josh, you're the VP of products, you have the roadmaps, give us a peek into what's happening with the current product. And, where's the work areas? Where are you guys going? What's the to-do list, what's the check box, and what's the innovation coming around the corner? >> Yeah, I think, to follow up on what Bruno said about how we hit the sweet spot. I think- we made a strategic choice, which is we don't want to be in the business of trying to be Tableu or Excel or be a better front end. And there's so much diversity on the back end if you look at the ecosystem right now, whether it's Spark Sequel, or Hive, or Presto, or even new cloud based systems, the sweet spot is really how do you fit into those ecosystems and support the right level of BI on top of those applications. So, what we're looking at, from a road map perspective is how do we expand and support the back end data platforms that customers are asking about? I think we saw a big white space in BI on Hadoop in particular. And that's- I'd say, we've nailed it over the past year and a half. But, we see customers now that are asking us about Google Big Query. They're asking us about Athena. I think these server-less data platforms are really, really compelling. They're going to take a while to get adoption. So, that's a big investment area for us. And then, in terms of supporting BI front ends, we're kind of doubling down on making sure our Tableau integration is great, Power BI is I think getting really big traction. >> Well, two great products, you've got Microsoft and Tableau, leaders in that area. >> The self-service BI revolution has, I would say, has won. And the business user wants their tool of choice. Where we come in is the folks responsible for data platforms on the back end, they want some level of control and consistency and so they're trying to figure out, where do you draw the line? Where do you provide standards? Where do you provide governance, and where do you let the business lose? >> All right, so, Bruno and Josh, I want you to answer the questions, be a good quiz. So, define next generation BI platforms from a functional standpoint and then under the hood. >> Yeah, there's a few things you can look at. I think if you were at the Gartner BI conference last week you saw that there was 24 vendors in the magic quadrant and I think in general people are now realizing that this is a space that is extremely crowded and it's also sitting on technology that was built 20 years ago. Now, when you talk to enterprises like the ones we work with, like, as I named earlier, you realize that they all have multiple BI tools. So, the visualization war, if you will, kind of has been set up and almost won by Microsoft and Tableau at this point. And, the average enterprise is 15 different BI tools. So, clearly, if you're trying to innovate on the visualization side, I would say you're going to have a very hard time. So, you're dealing with that level of complexity. And then, at the back end standpoint, you're now having to deal with database from the past - that's the Teradata of this world - data sources from today - Hadoop - and data sources from the future, like Google Big Query. And, so, I think the CIO answer of what is the next gen BI platform I want is something that is enabling me to simplify this very complex world. I have lots of BI tools, lots of data, how can I standardize in the middle in order to provide security, provide scale, provide speed to my business users and, you know, that's really radically going to change the space, I think. If you're trying to sell a full stack that's integrated from the bottom all the way to visualization, I don't think that's what enterprises want anymore >> Josh, under the hood, what's the next generation- you know, key leverage for the tech, and, just the enabler. >> Yeah, so, for me the end state for the next generation GI platform is a user can log in, they can point to their data, wherever that data is, it's on Prime, it's in the cloud, it's in a relational database, it's a flat file, they can design their business model. We spend a lot of time making sure we can support the creation of business models, what are the key metrics, what are the hierarchies, what are the measures, it may sound like I'm talking about OLAP. You know, that's what our history is steeped in. >> Well, faster data is coming, that's- streaming and data is coming together. >> So, I should be able to just point at those data sets and turn around and be able to analyze it immediately. On the back end that means we need to have pretty robust modeling capabilities. So that you can define those complex metrics, so you can functionally do what are traditional business analytics, period over period comparisons, rolling averages, navigate up and down business hierarchies. The optimizations should be built in. It shouldn't be the responsibility of the designer to figure out, do I need to create indeces, do I need to create aggregates, do I need to create summarization? That should all be handled for you automatically. Shouldn't think about data movement. And so that's really what we've built in from an AtScale perspective on the back end. Point to data, we're smart about creating optimal data structure so you get fast performance. And then, you should be able to connect whatever BI tool you want. You should be able to connect Excel, we can talk the MDX Query language. We can talk Sequel, we can talk Dax, whatever language you want to talk. >> So, take the syntax out of the hands of the user. >> Yeah. >> Yeah. >> And getting in the weeds on that stuff. Make it easier for them- >> Exactly. >> And the key word I think, for the future of BI is open, right? We've been buying tools over the last- >> What do you mean by that, explain. >> Open means that you can choose whatever BI tool you want, and you can choose whatever data you want. And, as a business user there's no real compromise. But, because you're getting an open platform it doesn't mean that you have to trade off complexity. I think some of the stuff that Josh was talking about, period analysis, the type of multidimensional analysis that you need, calendar analysis, historical data, that's still going to be needed, but you're going to need to provide this in a world where the business, user, and IT organization expects that the tools they buy are going to be open to the rest of the ecosystem, and that's new, I think. >> George, you want to get a question in, edgewise? Come on. (group laughs) >> You know, I've been sort of a single-issue candidate, I guess, this week on machine learning and how it's sort of touching all the different sectors. And, I'm wondering, are you- how do you see yourselves as part of a broader pipeline of different users adding different types of value to data? >> I think maybe on the machine learning topic there is a few different ways to look at it. The first is we do use machine learning in our own product. I talked about this concept of auto-optimization. One of the things that AtScale does is it looks at end-user query patterns. And we look at those query patterns and try to figure out how can we be smart about anticipating the next thing they're going to ask so we can pre-index, or pre-materialize that data? So, there's machine learning in the context of making AtScale a better product. >> Reusing things that are already done, that's been the whole machine-learning- >> Yes. >> Demos, we saw Google Next with the video editing and the video recognition stuff, that's been- >> Exactly. >> Huge part of it. >> You've got users giving you signals, take that information and be smart with it. I think, in terms of the customer work flow - Comcast, for example, a customer of ours - we are in a data discovery phase, there's a data science group that looks at all of their set top box data, and they're trying to discover programming patterns. Who uses the Yankees' network for example? And where they use AtScale is what I would call a descriptive element, where they're trying to figure out what are the key measures and trends, and what are the attributes that contribute to that. And then they'll go in and they'll use machine learning tools on top of that same data set to come up with predictive algorithms. >> So, just to be clear there, they're hypotehsizing about, like, say, either the pattern of users that might be- have an affinity for a certain channel or channels, or they're looking for pathways. >> Yes. And I'd say our role in that right now is a descriptive role. We're supporting the descriptive element of that analytics life cycle. I think over time our customers are going to push us to build in more of our own capabilities, when it comes to, okay, I discovered something descriptive, can you come up with a model that helps me predict it the next time around? Honestly, right now people want BI. People want very traditional BI on the next generation data platform. >> Just, continuing on that theme, leaving machine learning aside, I guess, as I understand it, when we talked about the old school vendors, Care Data, when they wanted to support data scientists they grafted on some machine learning, like a parallel version of our- in the core Teradata engine. They also bought Astro Data, which was, you know, for a different audience. So, I guess, my question is, will we see from you, ultimately, a separate product line to support a new class of users? Or, are you thinking about new functionality that gets integrated into the core product. I think it's more of the latter. So, the way that we view it- and this is really looking at, like I said, what people are asking for today is, kind of, the basic, traditional BI. What we're building is essentially a business model. So, when someone uses AtScale, they're designing and they're telling us, they're asserting, these are the things I'm interested in measuring, and these are the attributes that I think might contribute to it. And, so that puts us in a pretty good position to start using, whether it's Spark on the back end, or built in machine learning algorithms on the Hadoop cluster, let's start using our knowledge of that business model to help make predictions on behalf of the customer. So, just a follow-up, and this really leaves out the machine learning part, which is, it sounds like, we went- in terms of big data we we first to archive it- supported more data retension than could do affordably with the data warehouse. Then we did the ETL offload, now we're doing more and more of the visualization, the ad-hoc stuff. >> That's exactly right. So, what- in a couple years time, what remains in the classic data warehouse, and what's in the Hadoop category? >> Well, so there is, I think what you're describing is the pure evolution, of, you know, any technology where you start with the infrastructure, you know, we've been in this for over ten years, now, you've got cloud. They are going APO and then going into the data science workbench. >> That's not official yet. >> I think we read about this, or at least they filed. But I think the direction is showing- now people are relying on the platform, the Hadoop platform, in order to build applications on top of it. And, so, I think, just like Josh is saying, the mainstream application on top of the database - and I think this is true for non-Hadoop systems as well - is always going to be analytics. Of course, data science is something that provides a lot of value, but it typically provides a lot of value to a few set of people that will then scale it out to the rest of their organization. I think if you now project out to what does this mean for the CIO and their environment, I don't think any of these platforms, Teradata or Hadoop, or Google, or Amazon or any of those, I don't think do 100% replace. And, I think that's where it becomes interesting, because you're now having to deal with a hetergeneous environment, where the business user is up, they're using Excel, they're using they're standard net application, they might be using the result of machine learning models, but they're also having to deal with the heterogeneous environment at the data level. Hadoop on Prime, Hadoop in the cloud, non-Hadoop in the cloud and non-Hadoop on Prime. And, of course that's a market that I think is very interesting for us as a simplification platform for that world. >> I think you guys are really thinking about it in a new way, and I think that's kind of a great, modern approach, let the freedom- and by the way, quick question on the Microsoft tool and Tableau, what percentage share do you think they are of the market? 50? Because you mentioned those are the two top ones. >> Are they? >> Yeah, I mentioned them, because if you look at the magic quadrant, clearly Microsoft, Power BI and Tableau have really shot up all the way to the right. >> Because it's easy to use, and it's easy to work with data. >> I think so, I think- look, from a functionality standpoint, you see Tableau's done a very good job on the visualization side. I think, from a business standpoint, and a business model execution, and I can talk from my days at Microsoft, it's a very great distribution model to get thousands and thousands of users to use power BI. Now, the guys that we didn't talk about on the last magic quadrant. People who are like Google Data Studio, or Amazon Quicksite, and I think that will change the ecosystem as well. Which, again, is great news for AtScale. >> More muscle coming in. >> That's right. >> For you guys, just more rising tide floats all boats. >> That's right. >> So, you guys are powering it. >> That's right. >> Modern BI would be safe to say? >> That's the idea. The idea is that the visualization is basically commoditized at this point. And what business users want and what enterprise leaders want is the ability to provide freedom and openness to their business users and never have to compromise security, speed and also the complexity of those models, which is what we- we're in the business of. >> Get people working, get people productive faster. >> In whatever tool they want. >> All right, Bruno. Thanks so much. Thanks for coming on. AtScale. Modern BI here in The Cube. Breaking it down. This is The Cube covering bid data SV strata Hadoop. Back with more coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube. live at Silicon Valley for the big The Cube coverage, Bruno, great to see you. Hadoop Maturity Survey and at the time So, what did you guys do to break out? this idea that you can get the speed, What's the to-do list, what's the check box, the sweet spot is really how do you Microsoft and Tableau, leaders in that area. and where do you let the business lose? I want you to answer the questions, So, the visualization war, if you will, and, just the enabler. for the next generation GI platform is and data is coming together. of the designer to figure out, So, take the syntax out of the hands And getting in the weeds on that stuff. the type of multidimensional analysis that you need, George, you want to get a question in, edgewise? all the different sectors. the next thing they're going to ask You've got users giving you signals, either the pattern of users that might be- on the next generation data platform. So, the way that we view it- and what's in the Hadoop category? is the pure evolution, of, you know, the Hadoop platform, in order to build applications I think you guys are really thinking about it because if you look at the magic quadrant, and it's easy to work with data. Now, the guys that we didn't talk about For you guys, just more The idea is that the visualization This is The Cube covering bid data

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Bruno	PERSON	0.99+
Bruno Aziza	PERSON	0.99+
George	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
ETNA	ORGANIZATION	0.99+
Brian	PERSON	0.99+
John Furrier	PERSON	0.99+
New York	LOCATION	0.99+
Josh Klahr	PERSON	0.99+
SIGNA	ORGANIZATION	0.99+
GSK	ORGANIZATION	0.99+
Josh	PERSON	0.99+
Home Depot	ORGANIZATION	0.99+
24 vendors	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Yankees'	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
US	LOCATION	0.99+
Excel	TITLE	0.99+
last year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
last week	DATE	0.99+
Silicon Valley	LOCATION	0.99+
AtScale	ORGANIZATION	0.99+
American Express	ORGANIZATION	0.99+
first one	QUANTITY	0.99+
first	QUANTITY	0.99+
20 years ago	DATE	0.99+
50	QUANTITY	0.98+
2017	DATE	0.98+
Tableau	TITLE	0.98+
Macy's	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Mork	ORGANIZATION	0.98+
power BI	TITLE	0.98+
Ecosystem	ORGANIZATION	0.98+
Sequel	PERSON	0.97+
Google	ORGANIZATION	0.97+
this week	DATE	0.97+
Power BI	TITLE	0.97+
Cloudera	ORGANIZATION	0.96+
15 different BI tools	QUANTITY	0.95+
past year and a half	DATE	0.95+
over ten years	QUANTITY	0.95+
today	DATE	0.95+
Tableu	TITLE	0.94+
Tableau	ORGANIZATION	0.94+
SQL	TITLE	0.93+
Astro Data	ORGANIZATION	0.93+
Cube	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
two key differentiators	QUANTITY	0.92+
AtScale	TITLE	0.91+
Care Data	ORGANIZATION	0.9+
about 10x	QUANTITY	0.9+
Spark Sequel	TITLE	0.89+
two top ones	QUANTITY	0.89+
Hadoop	TITLE	0.88+
Athena	ORGANIZATION	0.87+
two great products	QUANTITY	0.87+
Big Query	TITLE	0.86+
The Cube	ORGANIZATION	0.85+
Big Data	ORGANIZATION	0.85+

Ravi Dharnikota, SnapLogic & Katharine Matsumoto, eero - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (light techno music) >> Hey, welcome back everybody. Jeff Frick here with theCUBE. We're at Big Data SV, wrapping up with two days of wall-to-wall coverage of Big Data SV which is associated with Strata Comp, which is part of Big Data Week, which always becomes the epicenter of the big data world for a week here in San Jose. We're at the historic Pagoda Lounge, and we're excited to have our next two guests, talking a little bit different twist on big data that maybe you hadn't thought of. We've got Ravi Dharnikota, he is the Chief Enterprise Architect at SnapLogic, welcome. - Hello. >> Jeff: And he has brought along a customer, Katharine Matsumoto, she is a Data Scientist at eero, welcome. >> Thank you, thanks for having us. >> Jeff: Absolutely, so we had SnapLogic on a little earlier with Garavs, but tell us a little bit about eero. I've never heard of eero before, for folks that aren't familiar with the company. >> Yeah, so eero is a start-up based in San Francisco. We are sort of driven to increase home connectivity, both the performance and the ease of use, as wifi becomes totally a part of everyday life. We do that. We've created the world's first mesh wifi system. >> Okay. >> So that means you have, for an average home, three different individual units, and you plug one in to replace your router, and then the other three get plugged in throughout the home just to power, and they're able to spread coverage, reliability, speed, throughout your homes. No more buffering, dead zones, in that way back bedroom. >> Jeff: And it's a consumer product-- >> Yes. >> So you got all the fun and challenges of manufacturing, you've got the fun challenges of distribution, consumer marketing, so a lot of challenges for a start-up. But you guys are doing great. Why SnapLogic? >> Yeah, so in addition to the challenges with the hardware, we also are a really strong software. So, everything is either set up via the app. We are not just the backbone to your home's connectivity, but also part of it, so we're sending a lot of information back from our devices to be able to learn and improve the wifi that we're delivering based on the data we get back. So that's a lot of data, a lot of different teams working on different pieces. So when we were looking at launch, how do we integrate all of that information together to make it accessible to business users across different teams, and also how do we handle the scale. I made a checklist (laughs), and SnapLogic was really the only one that seemed to be able to deliver on both of those promises with a look to the future of like, I don't know what my next Sass product is, I don't know what our next API point we're going to need to hit is, sort of the flexibility of that as well as the fact that we have analysts were able to pick it up, engineers were able to pick it up, and I could still manage all the software written by, or the pipelines written by each of those different groups without having to read whatever version of code they're writing. >> Right, so Ravi, we heard you guys are like doubling your customer base every year, and lots of big names, Adobe we talked about earlier today. But I don't know that most people would think of SnapLogic really, as a solution to a start-up mesh network company. >> Yeah, absolutely, so that's a great point though, let me just start off with saying that in this new world, we don't discriminate-- (guest and host laugh) we integrate and we don't discriminate. In this new world that I speak about is social media, you know-- >> Jeff: Do you bus? (all laugh) >> So I will get to that. (all laugh) So, social, mobile, analytics, and cloud. And in this world, people have this thing which we fondly call integrators' dilemma. You want to integrate apps, you go to a different tool set. You integrate data, you start thinking about different tool sets. So we want to dispel that and really provide a unified platform for both apps and data. So remember, when we are seeing all the apps move into the cloud and being provided as services, but the data systems are also moving to the cloud. You got your data warehouses, databases, your BI systems, analytical tools, all are being provided to you as services. So, in this world data is data. If it's apps, it's probably schema mapping. If it's data systems, it's transformations moving from one end to the other. So, we're here to solve both those challenges in this new world with a unified platform. And it also helps that our lineage and the brain trust that brings us here, we did this a couple of decades ago and we're here to reinvent that space. >> Well, we expect you to bring Clayton Christensen on next time you come to visit, because he needs a new book, and I think that's a good one. (all laugh) But I think it was a really interesting part of the story though too, is you have such a dynamic product. Right, if you looked at your boxes, I've got the website pulled up, you wouldn't necessarily think of the dynamic nature that you're constantly tweaking and taking the data from the boxes to change the service that you're delivering. It's not just this thing that you made to a spec that you shipped out the door. >> Yeah, and that's really where the auto connected, we did 20 from our updates last year. We had problems with customers would have the same box for three years, and the technology change, the chips change, but their wifi service is the same, and we're constantly innovating and being able to push those out, but if you're going to do that many updates, you need a lot of feedback on the updates because things break when you update sometimes, and we've been able to build systems that catch that that are able to identify changes that say, not one person could be able to do by looking at their own things or just with support. We have leading indicators across all sorts of different stability and performance and different devices, so if Xbox changes their protocols, we can identify that really quickly. And that's sort of the goal of having all the data in one place across customer support and manufacturing. We can easily pinpoint where in the many different complicated factors you can find the problem. >> Have issues. - Yeah. >> So, I've actually got questions for both of you. Ravi, starting with you, it sounds like you're trying to tackle a challenge that in today's tools would have included Kafka at the data integration level, and there it's very much a hub and spoke approach. And I guess it's also, you would think of the application level integration more like the TIBCO and other EAI vendors in a previous generation-- - [Ravi] Yeah. >> Which I don't think was hub and spoke, it was more point to point, and I'm curious how you resolve that, in other words, how you'd tackle both together in a unified architecture? >> Yeah, that's an excellent question. In fact, one of the integrators' dilemma that I spoke about you've got the problem set where you've got the high-latency, high-volume, where you go to ETL tools. And then the low-latency, low-volume, you immediately go to the TIBCOs of the world and that's ESB, EAI sort of tool sets that you look to solve. So what we've done is we've thought about it hard. At one level we've just said, why can integration not be offered as a service? So that's step number one where the design experience is through the cloud, and then execution can just happen anywhere, behind your firewall or in the cloud, or in a big data system, so it caters to all of that. But then also, the data set itself is changing. You're seeing a lot of the document data model that are being offered by the Sass services. So the old ETL companies that were built before all of this social, mobile sort of stuff came around, it was all row and column oriented. So how do you deal with the more document oriented JSON sort of stuff? And we built that for, the platform to be able to handle that kind of data. Streaming is an interesting and important question. Pretty much everyone I spoke to last year were, streaming was a big-- let's do streaming, I want everything in real-time. But batch also has it's place. So you've got to have a system that does batch as well as real-time, or as near real-time as needed. So we solve for all of those problems. >> Okay, so Katharine, coming to you, each customer has a different, well, every consumer has a different, essentially, a stall base. To bring all the telemetry back to make sense out of what's working and what's not working, or how their environment is changing. How do you make sense out of all that, considering that it's not B to B, it's B to C so, I don't know how many customers you have, but it must be in the tens or hundreds. >> I'm sure I'm not allowed to say (laughs). >> No. But it's the distinctness of each customer that I gather makes the support challenge for you. >> Yeah, and part of that's exposing as much information to the different sources, and starting to automate the ways in which we do it. There's certainly a lot, we are very early on as a company. We've hit our year mark for public availability the end of last month so-- >> Jeff: Congratulations. >> Thank you, it's been a long year. But with that we learn more, constantly, and different people come to different views as different new questions come up. The special-snowflake aspect of each customer, there's a balance between how much actually is special and how much you can find patterns. And that's really where you get into much more interesting things on the statistics and machine learning side is how do you identify those patterns that you may not even know you're looking for. We are still beginning to understand our customers from a qualitative standpoint. It actually came up this week where I was doing an analysis and I was like, this population looks kind of weird, and with two clicks was able to send out a list over to our CX team. They had access to all the same systems because all of our data is connected and they could pull up the tickets based on, because through SnapLogic, we're joining all the data together. We use Looker as our BI tool, they were just able to start going into all the tickets and doing a deep dive, and that's being presented later this week as to like, hey, what is this population doing? >> So, for you to do this, that must mean you have at least some data that's common to every customer. For you to be able to use something like Looker, I imagine. If every customer was a distinct snowflake, it would be very hard to find patterns across them. >> Well I mean, look at how many people have iPhones, have MacBooks, you know, we are looking at a lot of aggregate-level data in terms of how things are behaving, and always the challenge of any data science project is creating those feature extractions, and so that's where the process we're going through as the analytics team is to start extracting those things and adding them to our central data source. That's one of the areas also where having very integrated analytics and ETL has been helpful as we're just feeding that information back in to everyone. So once we figure out, oh hey, this is how you differentiate small businesses from homes, because we do see a couple of small businesses using our product, that goes back into the data and now everyone's consuming it. Each of those common features, it's a slow process to create them, but it's also increases the value every time you add one to the central group. >> One last question-- >> It's an interesting way to think of the wifi service and the connected devices an integration challenge, as opposed to just an appliance that kind of works like an old POTS line, which it isn't, clearly at all. (all laugh) With 20 firmware updates a year (laughs). >> Yeah, there's another interesting point, that we were just having the discussion offline, it's that it's a start-up. They obviously don't have the resources or the app, but have a large IT department to set up these systems. So, as Katharine mentioned, one person team initially when they started, and to be able to integrate, who knows which system is going to be next. Maybe they experiment with one cloud service, it perhaps scales to their liking or not, and then they quickly change and go to another one. You cannot change the integration underneath that. You got to be able to adjust to that. So that flexibility, and the other thing is, what they've done with having their business become self-sufficient is another very fascinating thing. It's like, give them the power. Why should IT or that small team become the bottom line? Don't come to me, I'll just empower you with the right tool set and the patterns and then from there, you change and put in your business logic and be productive immediately. >> Let me drill down on that, 'cause my understanding, at least in the old world was that DTL was kind of brittle, and if you're constantly ... Part of actually, the genesis of Hadoop, certainly at Yahoo was, we're going to bring all the data we might ever possibly need into the repository so we don't have to keep re-writing the pipeline. And it sounds like you have the capability to evolve the pipeline rather quickly as you want to bring more data into this sort of central resource. Am I getting that about right? >> Yeah, it's a little bit of both. We do have a central, I think, down data's the fancy term for that, so we're bringing everything into S3, jumping it into those raw JSONs, you know, whatever nested format it comes into, so whatever makes it so that extraction is easy. Then there's also, as part of ETL, there's that last mile which is a lot of business logic, and that's where you run into teams starting to diverge very quickly if you don't have a way for them to give feedback into the process. We've really focused on empowering business users to be self-service, in terms of answering their own questions, and that's freed up our in list to add more value back into the greater group as well as answer harder questions, that both beget more questions, but also feeds back insights into that data source because they have access to their piece of that last business logic. By changing the way that one JSON field maps or combining two, they've suddenly created an entirely new variable that's accessible to everyone. So it's sort of last-leg business logic versus the full transport layer. We have a whole platform that's designed to transport everything and be much more robust to changes. >> Alright, so let me make sure I understand this, it sounds like the less-trained or more self-sufficient, they go after the central repository and then the more highly-trained and scarcer resource, they are responsible for owning one or more of the feeds and that they enrich that or make that more flexible and general-purpose so that those who are more self-sufficient can get at it in the center. >> Yeah, and also you're able to make use of the business. So we have sort of a hybrid model with our analysts that are really closely embedded into the teams, and so they have all that context that you need that if you're relying on, say, a central IT team, that you have to go back and forth of like, why are you doing this, what does this mean? They're able to do all that in logic. And then the goal of our platform team is really to focus on building technologies that complement what we have with SnapLogic or others that are accustomed to our data systems that enable that same sort of level of self-service for creating specific definitions, or are able to do it intelligently based on agreed upon patterns of extraction. >> George: Okay. >> Heavy science. Alright, well unfortunately we are out of time. I really appreciate the story, I love the site, I'll have to check out the boxes, because I know I have a bunch of dead spots in my house. (all laugh) But Ravi, I want to give you the last word, really about how is it working with a small start-up doing some cool, innovative stuff, but it's not your Adobes, it's not a lot of the huge enterprise clients that you have. What have you taken, why does that add value to SnapLogic to work with kind of a cool, fun, small start-up? >> Yeah, so the enterprise is always a retrofit job. You have to sort of go back to the SAPs and the Oracle databases and make sure that we are able to connect the legacy with a new cloud application. Whereas with a start-up, it's all new stuff. But their volumes are constantly changing, they probably have spikes, they have burst volumes, they're thinking about this differently, enabling everyone else, quickly changing and adopting newer technologies. So we have to be able to adjust to that agility along with them. So we're very excited as sort of partnering with them and going along with them on this journey. And as they start looking at other things, the machine learning and the AI and the IRT space, we're very excited to have that partnership and learn from them and evolve our platform as well. >> Clearly. You're smiling ear-to-ear, Katharine's excited, you're solving problems. So thanks again for taking a few minutes and good luck with your talk tomorrow. Alright, I'm Jeff Frick, he's George Gilbert, you're watching theCUBE from Big Data SV. We'll be back after this short break. Thanks for watching. (light techno music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE, that maybe you hadn't thought of. Jeff: And he has brought along a customer, for folks that aren't familiar with the company. We are sort of driven to increase home connectivity, and you plug one in to replace your router, So you got all the fun and challenges of manufacturing, We are not just the backbone to your home's connectivity, and lots of big names, Adobe we talked about earlier today. (guest and host laugh) but the data systems are also moving to the cloud. and taking the data from the boxes and the technology change, the chips change, - Yeah. more like the TIBCO and other EAI vendors the platform to be able to handle that kind of data. considering that it's not B to B, that I gather makes the support challenge for you. and starting to automate the ways in which we do it. and how much you can find patterns. that must mean you have at least some data as the analytics team is to start and the connected devices an integration challenge, and then they quickly change and go to another one. into the repository so we don't have to keep and that's where you run into teams of the feeds and that they enrich that and so they have all that context that you need it's not a lot of the huge enterprise clients that you have. and the Oracle databases and make sure and good luck with your talk tomorrow.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Katharine Matsumoto	PERSON	0.99+
Jeff	PERSON	0.99+
Ravi Dharnikota	PERSON	0.99+
Katharine	PERSON	0.99+
George Gilbert	PERSON	0.99+
Adobe	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
George	PERSON	0.99+
San Jose	LOCATION	0.99+
San Francisco	LOCATION	0.99+
tens	QUANTITY	0.99+
last year	DATE	0.99+
three years	QUANTITY	0.99+
Clayton Christensen	PERSON	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Ravi	PERSON	0.99+
San Jose, California	LOCATION	0.99+
SnapLogic	ORGANIZATION	0.99+
iPhones	COMMERCIAL_ITEM	0.99+
Kafka	TITLE	0.99+
two days	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
two	QUANTITY	0.99+
tomorrow	DATE	0.99+
two clicks	QUANTITY	0.99+
TIBCO	ORGANIZATION	0.99+
both	QUANTITY	0.99+
each customer	QUANTITY	0.99+
Xbox	COMMERCIAL_ITEM	0.99+
Big Data Week	EVENT	0.99+
Oracle	ORGANIZATION	0.99+
One last question	QUANTITY	0.98+
eero	ORGANIZATION	0.98+
Pagoda Lounge	LOCATION	0.98+
20 firmware updates	QUANTITY	0.98+
Adobes	ORGANIZATION	0.98+
this week	DATE	0.98+
S3	TITLE	0.98+
Strata Comp	ORGANIZATION	0.98+
MacBooks	COMMERCIAL_ITEM	0.98+
Each	QUANTITY	0.97+
three	QUANTITY	0.97+
each	QUANTITY	0.97+
one person	QUANTITY	0.96+
JSON	TITLE	0.96+
two guests	QUANTITY	0.95+
today	DATE	0.95+
three different individual units	QUANTITY	0.95+
later this week	DATE	0.95+
a week	QUANTITY	0.94+
#BigDataSV	TITLE	0.93+
earlier today	DATE	0.92+
one level	QUANTITY	0.92+
couple of decades ago	DATE	0.9+
CX	ORGANIZATION	0.9+
theCUBE	ORGANIZATION	0.9+
SnapLogic	TITLE	0.87+
end	DATE	0.87+
first mesh	QUANTITY	0.87+
one person team	QUANTITY	0.87+
Sass	TITLE	0.86+
one cloud	QUANTITY	0.84+
Big Data SV	TITLE	0.84+
last month	DATE	0.83+
one place	QUANTITY	0.83+
Big Data Silicon Valley 2017	EVENT	0.82+

Darren Chinen, Malwarebytes - Big Data SV 17 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Hey, welcome back everybody. Jeff Frick here with The Cube. We are at Big Data SV in San Jose at the Historic Pagoda Lounge, part of Big Data week which is associated with Strata + Hadoop. We've been coming here for eight years and we're excited to be back. The innovation and dynamicism of big data and evolutions now with machine learning and artificial intelligence, just continues to roll, and we're really excited to be here talking about one of the nasty aspects of this world, unfortunately, malware. So we're excited to have Darren Chinen. He's the senior director of data science and engineering from Malwarebytes. Darren, welcome. >> Darren: Thank you. >> So for folks that aren't familiar with the company, give us just a little bit of background on Malwarebytes. >> So Malwarebytes is basically a next-generation anti-virus software. We started off as humble roots with our founder at 14 years old getting infected with a piece of malware, and he reached out into the community and, at 14 years old, wrote his first, with the help of some people, wrote his first lines of code to remediate a couple of pieces of malware. It grew from there and I think by the ripe old age of 18, founded the company. And he's now I want to say 26 or 27 and we're doing quite well. >> It was interesting, before we went live you were talking about his philosophy and how important that is to the company and now has turned into really a strategic asset, that no one should have to suffer from malware, and he decided to really offer a solution for free to help people rid themselves of this bad software. >> Darren: That's right. Yeah, so Malwarebytes was founded under the principle that Marcin believes that everyone has the right to a malware-free existence and so we've always offered a free version Malwarebytes that will help you to remediate if your machine does get infected with a piece of malware. And that's actually still going to this day. >> And that's now given you the ability to have a significant amount of inpoint data, transactional data, trend data, that now you can bake back into the solution. >> Darren: That's right. It's turned into a strategic advantage for the company, it's not something I don't think that we could have planned at 18 years old when he was doing this. But we've instrumented it so that we can get some anonymous-level telemetry and we can understand how malware proliferates. For many, many years we've been positioned as a second-opinion scanner and so we're able to see a lot of things, some trends happening in there and we can actually now see that in real time. >> So, starting out as a second-position scanner, you're basically looking at, you're finding what others have missed. And how can you, what do you have to do to become the first line of defense? >> Well, with our new product Malwarebytes 3.0, I think some of that landscape is changing. We have a very complete and layered offering. I'm not the product manager, so I don't think, as the data science guy, I don't know that I'm qualified to give you the ins and outs, but I think some of that is changing as we have, we've combined a lot of products and we have a much more complete sweep of layered protection built into the product. >> And so, maybe tell us, without giving away all the secret sauce, what sort of platform technologies did you use that enabled you to scale to these hundreds of millions of in points, and then to be fast enough at identifying things that were trending that are bad that you had to prioritize? >> Right, so traditionally, I think AV companies, they have these honeypots, right, where they go and the collect a piece of virus or a piece of malware, and they'll take the MD5 hash of that and then they'll basically insert that into a definition's database. And that's a very exact way to do it. The problem is is that there's so much malware or viruses out there in the wild, it's impossible to get all of them. I think one of the things that we did was we set up telemetry and we have a phenomenal research team where we're able to actually have our team catch entire families of malware, and that's really the secret sauce to Malwarebytes. There's several other levels but that's where we're helping out in the immediate term. What we do is we have, internally, we sort of jokingly call it a Lambda Two architecture. We had considered Lambda long ago, long ago and I say about a year ago when we first started this journey. But there's, Lambda is riddled with, as you know, a number of issues. If you've ever talked to Jay Kreps from Confluent, he has a lot of opinions on that, right? And one of the key problems with that is, that if you do a traditional Lambda, you have to implement your code in two places, it's very difficult, things get out of sync, you have to have replay frameworks. And these are some of the challenges with Lambda. So we do processing in a number of areas. The first thing that we did was we implemented Kafka to handle all of the streaming data. We use Kafka streams to do inline stateless transformations and then we also use Kafka Connect. And we write all of our data both into HBase, we use that, we may swap that out later for something like Redis, and that would be a thin speed layer. And then we also move the data into S3 and we use some ephemeral clusters to do very large-scale batch processing, and that really provides our data lab. >> When you call that Lambda Two, is that because you're still working essentially on two different infrastructures, so your code isn't quite the same? You still have to check the results on either on either fork. >> That's right, yeah, we didn't feel like it was, we did evaluate doing everything in the stream. But there are certain operations that are difficult to do with purely streamed processing, and so we did need a little bit, we did need to have a thin, what we call real time indicators, a speed layer, to supplement what we were doing in the stream. And so that's the differentiating factor between a traditional Lambda architecture where you'd want to have everything in the stream and everything in batch, and the batch is really more of a truing mechanism as opposed to, our real time is really directional, so in the traditional sense, if you look at traditional business intelligence, you'd have KPIs that would allow you to gauge the health of your business. We have RTIs, Real Time Indicators, that allow us to gauge directionally, what is important to look at this day, this hour, this minute? >> This thing is burning up the charts, >> Exactly. >> Therefore it's priority one. >> That's right, you got it. >> Okay. And maybe tell us a little more, because everyone I'm sure is familiar with Kafka but the streams product from them is a little newer as is Kafka Connect, so it sounds like you've got, it's not just the transport, but you've got some basic analytics and you've got the ability to do the ETL because you've got Connect that comes from sources and destinations, sources and syncs. Tell us how you've used that. >> Well, the streams product is, it's quite different than something like Spark Streaming. It's not working off micro-batching, it's actually working off the stream. And the second thing is, it's not a separate cluster. It's just a library, effectively a .jar file, right? And so because it works natively with Kafka, it handles certain things there quite well. It handles back pressure and when you expand the cluster, it's pretty good with things like that. We've found it to be a fairly stable technology. It's just a library and we've worked very closely with Confluent to develop that. Whereas Kafka Connect is really something that we use to write out to S3. In fact, Confluent just released a new, an S3 connector direct. We were using Stream X, which was a wrapper on top of an HDFS connector and they rigged that up to write to S3 for us. >> So tell us, as you look out, what sorts of technologies do you see as enabling you to build a platform that's richer, and then how would that show up in the functionality consumers like we would see? >> Darren: With respect to the architecture? >> Yeah. >> Well one of the things that we had to do is we had to evaluate where we wanted to spend our time. We're a very small team, the entire data science and engineering team is less than I think 10 months old. So all of us got hired, we've started this platform, we've gone very, very fast. And we had to decide, how are we going to, a, get, we've made this big investment, how are we going to get value to our end customer quickly, so that they're not waiting around and you get the traditional big-data story where, we've spent all this money and now we're not getting anything out of it. And so we had to make some of those strategic decisions and because of the fact that the data was really truly big data in nature, there's just a huge amount of work that has to be done in these open-source technologies. They're not baked, it's not like going out to Oracle and giving them a purchase order and you install it and away you go. There's a tremendous amount of work, and so we've made some strategic decisions on what we're going to do in open-source and what we're going to do with a third-party vendor solution. And one of those solutions that we decided was workload automation. So I just did a talk on this about how Control-M from BMC was really the tool that we chose to handle a lot of the coordination, the sophisticated coordination, and the workload automation on the batch side, and we're about to implement that in a data-quality monitoring framework. And that's turned out to be an incredibly stable solution for us. It's allowed us to not spend time with open-source solutions that do the same things like Airflow, which may or may not work well, but there's really no support around that, and focus our efforts on what we believe to be the really, really hard problems to tackle in Kafka, Kafka Streams, Connect, et cetera. >> Is it fair to say that Kafka plus Kafka Connect solves many of the old ETL problems or do you still need some sort of orchestration tool on top of it to completely commoditize, essentially moving and transforming data from OLTP or operational system to a decision support system? >> I guess the answer to that is, it depends on your use case. I think there's a lot of things that Kafka and the stream's job can solve for you, but I don't think that we're at the point where everything can be streaming. I think that's a ways off. There's legacy systems that really don't natively stream to you anyway, and there's just certain operations that are just more efficient to do in batch. And so that's why we've, I don't think batch for us is going away any time soon and that's one of the reasons why workload automation in the batch layer initially was so important and we've decided to extend that, actually, into building out a data-quality monitoring framework to put a collar around how accurate our data is on the real-time side. >> Cuz it's really horses for courses, it's not one or the other, it's application-specific, what's the best solution for that particular is. >> Yeah, I don't think that there's, if there was a one-size-fits-all it'd be a company, and there would be no need for architects, so I think that you have to look at your use case, your company, what kind of data, what style of data, what type of analysis do you need. Do you really actually need the data in real time and if you do put in all the work to get it in real time, are you going to be able to take action on it? And I think Malwarebytes was a great candidate. When it came in, I said, "Well, it does look like we can justify "the need for real time data, and the effort "that goes into building out a real-time framework." >> Jeff: Right, right. And we always say, what is real time? In time to do something about it, (all chuckle) and if there's not time to do something about it, depending on how you define real time, really what difference does it make if you can't do anything about it that fast. So as you look out in the future with IoT, all these connected devices, this is a hugely increased attack surface as we just read our essay a few weeks back. How does that work into your planning? What do you guys think about the future where there's so many more connected devices out on the edge and various degrees of intelligence and opportunities to hi-jack, if you will? >> Yeah, I think, I don't think I'm qualified to speak about the Malwarebytes product roadmap as far as IoT goes. >> But more philosophically, from a professional point of view, cuz every coin has two sides, there's a lot of good stuff coming from IoT and connected devices, but as we keep hearing over and over, just this massive attack surface expansion. >> Well I think, for us, the key is we're small and we're not operating, like I came from Apple where we operated on a budget of infinity, so we're not-- >> Have to build the infinity or the address infinity (Darren laughs) with an actual budget. >> We're small and we have to make sure that whatever we do creates value. And so what I'm seeing in the future is, as we get more into the IoT space and logs begin to proliferate and data just exponentiates in size, it's really how do we do the same thing and how are we going to manage that in terms of cost? Generally, big data is very low in information density. It's not like transactional systems where you get the data, it's effectively an Excel spreadsheet and you can go run some pivot tables and filters and away you go. I think big data in general requires a tremendous amount of massaging to get to the point where a data scientist or an analyst can actually extract some insight and some value. And the question is, how do you massage that data in a way that's going to be cost-effective as IoT expands and proliferates? So that's the question that we're dealing with. We're, at this point, all in with cloud technologies, we're leveraging quite a few of Amazon services, server-less technologies as well. We just are in the process of moving to the Athena, to Athena, as just an on-demand query service. And we use a lot of ephemeral clusters as well, and that allows us to actually run all of our ETL in about two hours. And so these are some of the things that we're doing to prepare for this explosion of data and making sure that we're in a position where we're not spending a dollar to gain a penny if that makes sense. >> That's his business. Well, he makes fun of that business model. >> I think you could do it, you want to drive revenue to sell dollars for 90 cents. >> That's the dot com model, I was there. >> Exactly, and make it up in volume. All right, Darren Chenin, thanks for taking a few minutes out of your day and giving us the story on Malwarebytes, sounds pretty exciting and a great opportunity. >> Thanks, I enjoyed it. >> Absolutely, he's Darren, he's George, I'm Jeff, you're watching The Cube. We're at Big Data SV at the Historic Pagoda Lounge. Thanks for watching, we'll be right back after this short break. (upbeat techno music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, and evolutions now with machine learning So for folks that aren't and he reached out into the community and, and how important that is to the company and so we've always offered a free version And that's now given you the ability it so that we can get what do you have to do to become and we have a much more complete sweep and that's really the secret the results on either and so we did need a little bit, and you've got the ability to do the ETL that we use to write out to S3. and because of the fact that the data and that's one of the reasons it's not one or the other, and if you do put in all the and opportunities to hi-jack, if you will? I don't think I'm qualified to speak and connected devices, or the address infinity and how are we going to Well, he makes fun of that business model. I think you could do it, and giving us the story on Malwarebytes, the Historic Pagoda Lounge.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Darren Chinen	PERSON	0.99+
Darren	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Darren Chenin	PERSON	0.99+
George	PERSON	0.99+
Jay Kreps	PERSON	0.99+
90 cents	QUANTITY	0.99+
two sides	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Athena	LOCATION	0.99+
Marcin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
San Jose	LOCATION	0.99+
BMC	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
first lines	QUANTITY	0.99+
Malwarebytes	ORGANIZATION	0.99+
Kafka	TITLE	0.99+
one	QUANTITY	0.99+
10 months	QUANTITY	0.99+
Kafka Connect	TITLE	0.99+
Oracle	ORGANIZATION	0.99+
Lambda	TITLE	0.99+
first	QUANTITY	0.99+
second thing	QUANTITY	0.99+
Gene	PERSON	0.99+
Excel	TITLE	0.99+
Confluent	ORGANIZATION	0.99+
The Cube	TITLE	0.98+
first line	QUANTITY	0.98+
27	QUANTITY	0.97+
26	QUANTITY	0.97+
Redis	TITLE	0.97+
Kafka Streams	TITLE	0.97+
S3	TITLE	0.97+
18	QUANTITY	0.96+
14 years old	QUANTITY	0.96+
18 years old	QUANTITY	0.96+
about two hours	QUANTITY	0.96+
g ago	DATE	0.96+
Connect	TITLE	0.96+
second-position	QUANTITY	0.95+
HBase	TITLE	0.95+
first thing	QUANTITY	0.95+
Historic Pagoda Lounge	LOCATION	0.94+
both	QUANTITY	0.93+
two different infrastructures	QUANTITY	0.92+
S3	COMMERCIAL_ITEM	0.91+
Big Data	EVENT	0.9+
The Cube	ORGANIZATION	0.88+
Lambda Two	TITLE	0.87+
Malwarebytes 3.0	TITLE	0.84+
Airflow	TITLE	0.83+
a year ago	DATE	0.83+
second-opinion	QUANTITY	0.82+
hundreds of millions of	QUANTITY	0.78+

Abhishek Mehta, Tresata - Big Data SV 17 - #BigDataSV - #theCUBE

>> Voiceover: From San Jose, California, it's The Cube, covering big data Silicon Valley 2017. >> Welcome back, everyone. Live in Silicon Valley for BigData SV, BigData Silicon Valley. This is Silicon Angles, The Cube's event in Silicon Valley, with our companion event, BigData NYC, in conjunction with O'Reilly, Strata, Hadoop, Hadoop World, our eighth year. I'm John Furrier, my co-host Jeff Frick, breaking down all the action, and our superguest, Abhi Mehta, the CEO of Tresata. He's been on every year since 2010, and the CEO of very successful Tresata, building out the vertical approach in financial net health. Welcome back, good to see you. Thank you, John, always good to see you. >> The annual pilgrimage to have you on The Cube. >> Abhi: This is literally a pilgrimage. I was exchanging messages with your co-host here, and he was pinging me, saying, "You got to come here, you got to get to this thing." I made it. The pilgrimage is successful. >> Yeah, a lot's happened, right? Data's the new oil. We've heard it over again. You had the seminal first interview in 2010, calling the oil refineries the data refineries. Turns out that was true. We always love to talk about that prediction every time you're on, but it's so much going on now. You can't believe the shift. Certainly, Hadoop has got a nice little niche position as Batch, but real time processing, you've seen the convergence of Batch, and streaming, and all that good stuff in real time, with the advances of clouds, certainly, more compute, Intel processors are getting more powerful, 5G over the top, you have connective cars, smart cities, on and on, IoT, Internet of things, all powering this new deep learning and AI trend. Man, it is game changes. I see this as a step-up function. What's your thoughts? This is going to create more data, more action. >> I agree with you. I always remind myself, John, especially when I talk to you guys, and we were chatting about this right before we went on air, which is, as smart as we as humans are, trends repeat themself. I'll be talking about AI. We all went to school, and did things in AI, you know? The whole neural networks thing has not been new. It's almost like fashion. Bell bottoms come in fashion every 20 years. I will never be seen in them again. Hopefully, neither will you. AI seems to be like that. I think the thing that hasn't changed, and yes, absolutely agree with you, that as escrows shift, as you've said, almost at this point a decade ago, there's a fundamentally new technology escrow shift under way, and escrow shifts take time. We will look back at this 10 years saying it was literally the first, second inning of this new escrow shift. I think we are entering the second innings where the conversation around Batch, real time storage, databases, the stacks, is becoming less important, and AI and deep learnings are examples of it, conversations on, how can you leverage cheaper, better, faster technology to solve and answer unanswered problems is becoming interesting. I think the basics haven't changed though. What we have spoken with you for almost eight years remain the same. The three basics around every technology trend remain the same. I think you guys will agree with me. Let me just play it by you and you can either contest it or agree with me. Data is the new competitive effort. It is unequivocally clear that the new asset, the most valuable enterprise asset has become data, and we've seen it in data companies, Facebook, Google, Uber, Airbnb, they're all fundamentally data companies. Data is the new competitive effort. The more you have of it, the better off you are. I always love people who say, "Big Data, this is a bad term." It isn't, because big data, fundamentally, in those two words, defines the very pieces of what we built Tresata on, which is, the more data you have, and if you can process and extract intelligence from it, borrowing your term, extract signal from the noise, you can make a lot of money on it. I think that fundamental basic hasn't changed. >> Big Data, to me, was always about big storage kind of a view. We coined the term Fast Data on The Cube, so that now speaks to the real time. It's interesting. I just see that the four main new areas that are being talked about outside of the Big Data world are autonomous vehicles, smart cities, smart home, and media and entertainment, and each one of those, I would say that the data is the new weaponization. There's an article that was great this month called "Weaponizing AI," and it had to do with Breitbart, and the election, and that's media and entertainment. You've got Netflix, all these new companies. Data is content, content is data. It's a digital asset. This AI component fits into autonomous vehicles, it fits into media and entertainment, fits into smart cities, and smart home. >> You also raise a very interesting point. I think that we can take comfort in the fact that we have seen this happen. This is not an idea anymore, or it's not just a wild idea anymore, which is, we have seen massive disruption happen in consumer industries. Google has created a brand new industry in how to market stuff, could be any stuff. Facebook created a brand new way of not just being in touch with your friends globally, 'cause people have thousands of friends, not true, but also, how do you monetize deep preferences, right? A twist on deep learning, but deep, deep preferences. If I know what Jeff likes, I can market to him better. I think we're about to see, the industries you just mention, is, where will success come from in enterprise software? I always ask myself that question when I come to any of these conferences, Strata, others, there's now an AI conference. What will the disruption that we have seen happen in consumer industries, we'll just mention automobiles, media entertainment, et cetera, what is going to happen to enterprise software? I think the time is ripe in the next five years to see the emergence of massive scale creation. I actually don't think it'll get disrupted. I think we will see, just like with Facebook, Google, Uber, the creation of brand new industries in enterprise software. I think that's going to be interesting. >> Mark Cuban said at South by Southwest this week, where The Cube was with the AI lounge with Intel, he was on stage saying, "The first tech trillionaire "will come out of deep learning," and deep learning is kind of the underpins for AI, if you look at all the geek stuff. To your point that a new shift of opportunity, whether it comes in from the enterprise side, or consumer, or algorithmic side, is that there's never been a trillionaire. >> Abhi: No, there hasn't. >> I want to push back a little bit, because I don't think it always was that way with data. We used to have sampling. It was all about sophistication on sampling, and data was expensive to store, expensive to collect, and expensive to manage. I think that's where the significant change is. The economics of collecting, and storing, and analyzing are such that sampling is no longer the preferred method. To your point, it's the bigness. >> Absolutely, you know exactly where I stand on that. >> Jeff: Now it's an asset. >> You know exactly where I stand on that. I said on The Cube, at this point, almost a decade ago, sampling is dead, and it's for that particular reason. I think the reality is that it has become a very tricky area to be in. Buzzwords aside, whether it's deep learning, AI, streaming, Batch, doesn't matter, Flash, all buzzwords aside, the very interesting thing is, are we seeing, as a community, the emergence of new enterprise software business models? I think ours is an example. We are now six years old. We announced Tresata on The Cube. We have celebrated our significant milestones on The Cube. We'll announce today that we are now a valuable member of society in terms of you pay tax as a company, another big milestone for a company. We have never raised venture money. We had a broad view when we started that every single thing we have learned as a industry enterprise software, the stack, databases, storage, BI, algorithms are free. Dave was talking about this earlier this week. Algorithms, analytical tools, will all become free. What is this new class of enterprise software that creates value that can then be sold as value? Buyers, corporations are becoming smart to realize and say, "Maybe I can't hire people "as smart as some of the web industries "on this side of the coast, "but I can still hire good talent, the tool set is free. "Should I build versus buy?" It fundamentally changes the conversation. Databases is a $2 trillion industry. Where does that value shift to if databases are free? I think that's what is going to be interesting to see, is, what model creates the new enterprise software industry? What is that going to be? I do agree with Mark Cuban's statement, that the answer is going to lie in, if the building blocks are free and commoditized, you guys know exactly where I stand on that one, if the building blocks are commoditized, how do you add value in the building block? It comes from the point you made, industry knowledge, data, owning data and domain knowledge. If you can combine deep domain expertise to be an advanced application that solve business problems, people don't want to know if the data is stored in a free HDFS system, or in some other system, or quantum computing, people don't care. >> I got to get your take on the data layer because this is where it's come. We had a lot of guests on saying, with the cloud, you can rent things, algorithms are free, so essentially, commoditization has happened, which is a good thing, more compute, everything else is all great, all the goodness around that. You still own your data. The data layer seems to be the LAN grab, metadata. How do you cross-connect the data layer to be consistent fabric? >> Here's how we think of it, and this is something we haven't shared publicly yet, but I believe you see us talk a lot more about this. We believe there are three new layers in the technology fabric. There is what we call the hardware operating system. The battle has been won by a company that we all like a lot, Red Hat, I think mostly won. Then there is what we call the data operating system, what you call the data layer. I think there's a new layer emerging where people like us sit. We call it the analytics operating system. The data layer will commoditize as much as the hardware operating system, what I call the layer, commoditized. The data operating system fight is moot. Metadata should not be charged for. Massive data management, draining the swamp, whatever you want to call it, every single thing in the data operating system is a commodity where you need volumes, you all are businessmen, you need volumes, in the P times V game, you need volumes to sustain a profit business model. The interesting action, in my opinion, is going to come in the analytics operating system. You are now automating hardcore, what I call, finding intelligence questions, whether it's using deep learning, AI, or whatever other buzzword the industry dreams up in the next five years, whatever the buzzwords may be, immaterial, the layer that automates the extraction of intelligence from massive amounts of data sitting in the data layer, no matter who owns it, our opinion is, Tresata, as an enterprise software player, is not interested to be a data owner. That game, I can't play anymore, right? You guys are a content company, though. You guys are data owners, and you have incredible value in the data you're building. For us, it is, I want to be the tool builder for this next gold rush. If you need the tools to extract intelligence from your data, who's going to give you those tools? I think all that value sits in what we call the analytics operating system. The world hasn't seen enough players in it yet. >> This is an interesting mind bender, if you think about it. When you said, "analytics operating system," that rings a few bells and gets the hair standing on the back of my head up because we're in a systems world now. We kind of talk about this in The Cube where operating systems concepts are very much in play. If you look at this ecosystem and who's winning, who's losing, who's struggling, who's falling away, is, the winners are nailing the integration game, and they're nailing the functional game, I think, a core functional component of an operating environment, AKA, the cloud, AKA data. >> Agreed. >> Having those functional systems, as an operating system game. What is your view of what an analytics operating system? What are some of those components? I mean, most operating systems have a linker, loader, filer, all these things going on. What's your thoughts on this analytical operating system? What is it made of? >> It's made of three core components that we have now invested six years in. The first one is exactly what you said. We don't use the word integration. We now call it the same word, we have been saying it for six years, we call it the factory, but it's very similar, which is, the ability to go to a company or enterprises with unique data assets, and enrich, I will borrow your term, integrate, enrich. We call it the data factory, the automation of 90% of the workload to make data sitting in a swamp usable data, part one. We call that creation of a data asset, a nice twist or separation from the word data warehousing we all grew up on. That's number one, the ability to make raw data usable. It's actually quite hard. If you haven't built a company squarely on data, you have to be able to buy it because building is very hard, number one. Number two is what I call the infusion of domain-centric knowledge. Can industries and industry players take expert systems and convert them into machine systems? The moment we convert expert systems into machine systems, we can do automation at very large scale. As you can imagine, the ability to add value is exponentially higher for each of those tiers, from data asset to now infusion of domain knowledge, to take an expert into a machine system, but the value trade is incredibly large as well. If you actually have the system built out, you can afford to sell it for all the value. That's number two, the ability to take expert system, go to machine systems. Number three is the most interesting, and we are very early in it. I use the term on The Cube, I'm going to be more forward-thinking over here, which is automation. Today, the best we can do with leveraging incredibly smart machines, algorithms, at scale on massive amounts of data is augmenting humans. I do fundamentally believe, just like self-driving cars, that the era where software will automate a tremendous amount of business processes in all industries is upon us. How long it takes, I think we will see it in our lifetimes too. When you and I have both a little bit more gray hair, we're saying, "Remember, we said about that? "I think automation's going to come." I do believe automation will happen. Currently, it's all about augmentation, but I do believe that business-- >> John: Cubebots are coming. We're going to have some Cubebots. >> We will have Cubebots. >> John: Automated Cube broadcasting. >> John, we'll give them your magnificent hair, and they know they'll do it. I do believe automation of complex human processes, the era of enlightenment, is upon us, where we will be able to take incredibly manual activities, like hailing a car today, to complex activities, looking at transaction information, trading information, in split second time, even quicker than real time, and making the right trading decision to make sure that Jeff's kids go to college in a robo-advisor-like mode. It's all early, but the augmentation will transform to automation, and that will take some time to do them at three tiers in the AOS. >> Then, if we are successful at converting the expert to machine system, will the value of that expert system quickly be driven to zero due to the same factors that automation has added to many other things that have been sucked in? >> You guys always blow my mind. You always push my thinking when I come here. >> I just love the concept, but then, will the same economics that have driven asumtotically approaching zero costs, then now go to these expert systems? >> You know the answer. The answer is absolutely, yes. The question then becomes, how long of an era is it? What we have learned in technology is escrow shifts take time. This era of enlightenment, what I'm calling the era of enlightenment, that enterprise software is about to enable, and leaving aside all other buzzwords, whether it's deep learning, AI, machines, chatbot, doesn't matter, the era of enlightenment is absolute. I think there'll be two things. First of all, it'll take time to mature. Yes, whether it's 50 years, 40 years, or 30 years, does it, at some point, become it's own commodity? Absolutely. The marginal value we can deliver with a machine, at some point, does go to zero, because it commoditizes it, at scale, it commoditizes it, absolutely, but does that mean the next 30 years will not be a renaissance in enterprise software? Absolutely not. I think we will see ... Let's take the enterprise IT market, what, two to three trillion dollars a year? All of it is up for grabs, and we will see in the next 20, 30, 40, 50 years that, as it is up for grabs, tremendous amount of value will be re-traded and recreated in completely new industry models. I think that's the exciting part. I won't live for 50 years, so it's okay. >> I know we got a minute or so left. I want to get your thoughts on something that we're seeing here, The Cube this year pointed out. We've kind of teased around it, but again, Batch and real time process streaming, all that's coming together. The center of that's IoT data and AI, is causing product gaps. There are some gaps that are developing, either a pure play Batch player, or your real time, some people have been one or the other, some are integrating in. When you try to blend it together, there's product gaps, organizational gaps, and then process gaps. Can you talk about how companies are solving that? Because one supplier might have a great Batch solution, data lake, some might have streaming and whatnot. Now there seems to be more of an integrated approach, bringing those worlds together, but it's causing some gaps. How do companies figure that out? >> I believe there's only one way, in the near term, and then potentially even moreso in the long term, to bridge that divide that you talk about. There absolutely is a divide. It's been very interesting for us especially. I'll use our example to answer your question. We have a very advanced health analytics application to go after diabetes. The challenge is, in order to run it, not only do you need lots and lots of data, IoT, streamed, real time from sensors you wear on your body, you need that. Not only do you need the ability and processing power to crunch all that data, not only do you need the specific algorithms to find insights that were not findable before, the unanswered questions, but the last point, you need to be able to then deliver it across all channels so you can monetize it. That is a end-to-end, what I call, business process around data monetization. Our customers don't care about it. They come to Tresata and they say, "I love your predictive diabetes outcomes application. "I have rented the system from the cloud," Amazon, Azure, I think at this point, only two players. We don't see Google much in it. I'm sure they're doing something in it. We have rented you the wheels, and the steering, and the body, so if you want to put it together to run your car on the track, you could. Everything else is containerized by us. I call them advanced analytics applications. They're fully managed. They run on any environment that is given to them because they are resource ready, whatever environment they play in, and they are completely backwards and forwards integrated. I think you will see the emergence of a class of enterprise software, what we call advanced analytics applications, that actually take away the pain from enterprises to worry about those gaps, 'cause in our case, in that example I just gave you, yes, there are gaps, but we have done it enough off a automation cycle on the business process itself, that we can title with the gaps. >> Abhi, we got to go. Glad we could squeeze you in. >> Abhi: Thank you. >> Quick 30 seconds, the show this year, what are you seeing? What's the buzz coming out of? What's the meat, what's the buzz from the show here? What's the story? >> I continue to believe that we are in an era that will redefine what we have seen humans do. The people at the show continue to surprise me because the questions they've been asking over the last eight years have slightly changed. I'm done with buzzwords. I don't pay attention to buzzwords anymore. I see a maturation. I think I said it to you before. I see more bald heads and big pates. When I see that in shows like these, it gives me hope that, when people who grew up in a different escrow have borrowed a new escrow, the pace would strengthen. As always, phenomenal show, great community. The community's changing and looking different in a good way. >> We feel your pain in the buzzword. As we proceed down this epic digital transformation, over the top, 5G, autonomous vehicles, Big Data analytics, moving the needle, all this headroom, future proofing, AI, machine learning, thanks for sharing. >> Abhi: Thank you so much, as always. >> More buzzwords, more signal from the noise here on The Cube. I'm John Furrier, Jeff Frick, and George Gilbert will be back right after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, covering big and the CEO of very successful Tresata, to have you on The Cube. "You got to come here, you 5G over the top, you have the better off you are. I just see that the four main new areas the industries you just mention, of the underpins for AI, and expensive to manage. Absolutely, you know exactly that the answer is going to lie in, I got to get your We call it the analytics operating system. and gets the hair standing I mean, most operating systems that the era where software will automate We're going to have some Cubebots. John: Automated and making the right trading decision You always push my but does that mean the next 30 years have been one or the other, and the body, so if you Glad we could squeeze you in. I think I said it to you before. moving the needle, all this signal from the noise

ENTITIES

Entity	Category	Confidence
Facebook	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Mark Cuban	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Abhishek Mehta	PERSON	0.99+
2010	DATE	0.99+
two	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
Abhi Mehta	PERSON	0.99+
Dave	PERSON	0.99+
John Furrier	PERSON	0.99+
90%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
George Gilbert	PERSON	0.99+
$2 trillion	QUANTITY	0.99+
50 years	QUANTITY	0.99+
Tresata	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
first	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
Abhi	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
two players	QUANTITY	0.99+
30 years	QUANTITY	0.99+
two words	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
40	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Today	DATE	0.99+
each	QUANTITY	0.99+
two things	QUANTITY	0.99+
30 seconds	QUANTITY	0.99+
today	DATE	0.99+
this year	DATE	0.98+
zero	QUANTITY	0.98+
one way	QUANTITY	0.98+
second innings	QUANTITY	0.98+
this week	DATE	0.98+
Silicon Angles	LOCATION	0.98+
both	QUANTITY	0.98+
three new layers	QUANTITY	0.98+
Netflix	ORGANIZATION	0.98+
30	QUANTITY	0.98+
three basics	QUANTITY	0.98+
First	QUANTITY	0.97+
a decade ago	DATE	0.97+
10 years	QUANTITY	0.97+
Azure	ORGANIZATION	0.97+
second inning	QUANTITY	0.96+
NYC	LOCATION	0.96+
one supplier	QUANTITY	0.96+
BigData	ORGANIZATION	0.95+
earlier this week	DATE	0.95+
zero costs	QUANTITY	0.95+
first one	QUANTITY	0.95+
first interview	QUANTITY	0.95+
a minute	QUANTITY	0.95+
three tiers	QUANTITY	0.93+
three trillion dollars a year	QUANTITY	0.93+
almost eight years	QUANTITY	0.92+
#BigDataSV	TITLE	0.92+
Hadoop	ORGANIZATION	0.91+

Scott Gnau, Hortonworks Big Data SV 17 #BigDataSV #theCUBE

>> Narrator: Live from San Jose, California it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly Strata Hadoop, of course we have our Big Data NYC event and we have our special popup event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier, with my co-host Jeff Frick and our next guest is Scott Gnau, CTO of Hortonworks. Great to have you on, good to see you again. >> Scott: Thanks for having me. >> You guys have an event coming up in Munich, so I know that there's a slew of new announcements coming up with Hortonworks in April, next month in Munich for your EU event and you're going to be holding a little bit of that back, but some interesting news this morning. We had Wei Wang yesterday with Microsoft Azure team HDInsight's. That's flowering nicely, a good bet there, but the question has always been at least from people in the industry and we've been questioning you guys on, hey, where's your cloud strategy? Because as a disture you guys have been very successful with your always open approach. Microsoft as your guy was basically like, that's why we go with Hortonworks because of pure open source, committed to that from day one, never wavered. The question is cloud first, AI, machine learning this is a sweet spot for IoT. You're starting to see the collision between cloud and data, and in the intersection of that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this. Your thoughts and your cloud strategy. >> Obviously we see cloud as an enabler for these use cases. In many instances the use cases can be femoral. They might not be tied immediately to an ROI, so you're going to go to the capital committee and all this kind of stuff, versus let me go prove some value very quickly. It's one of the key enablers core ingredients and when we say cloud first, we really mean it. It's something where the solutions work together. At the same time, cloud becomes important. Our cloud strategy and I think we've talked about this in many different venues is really twofold. One is we want to give a common experience to our customers across whatever footprint they chose, whether it be they roll their own, they do it on print, they do it in public cloud and they have choice of different public cloud vendors. We want to give them a similar experience, a good experience that is enterprise great, platform level experience, so not point solution kind of one function and then get rid of it, but really being able to extend the platform. What I mean by that of course, is being able to have common security, common governance, common operational management. Being able to have a blueprint of the footprint so that there's compatibility of applications that get written. And those applications can move as they decide to change their mind about where their platform hosting the data, so our goal really is to give them a great and common experience across all of those footprints number one. Then number two, to offer a lot of choices across all of those domains as well, whether it be, hey I want to do infrastructure as a service and I know what I want on one end of the spectrum to I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly. Boom, here's a platform as a service offer that runs and is available very easy to consume, comes preconfigured and kind of everywhere in between. >> By the way yesterday Wei was pointing out 99.99 SLAs on some of the stuff coming out. >> Are amazing and obviously in the platform as a service space, you also get the benefit of other cloud services that can plug in that wouldn't necessarily be something you'd expect to be typical of a core Hadoop platform. Getting the SLAs, getting the disaster recovery, getting all of the things that cloud providers can provide behind the scenes is some additional upside obviously as well in those deployment options. Having that common look and feel, making it easy, making it frictionless, are all of the core components of our strategy and we saw a lot of success with that in coming out of year end last year. We see rapid customer adoption. We see rapid customer success and frankly I see that I would say that 99.9% of customers that I talk to are hybrid where they have a foot in nonprem and they have a foot in cloud and they may have a foot in multiple clouds. I think that's indicative of what's going on in the world. Think about the gravity of data. Data movement is expensive. Analytics and multi-core chipsets give us the ability to process and crunch numbers at unprecedented rates, but movement of data is actually kind of hard. There's latency, it can be expensive. A lot of data in the future, IoT data, machine data is going to be created and live its entire lifecycle in the cloud, so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire lifecycle outside the four walls of the data center. >> You guys really did a good job I thought on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time you guys really nailed that and you also had the IoT edge in mind, we've talked I think two years ago and this was really not on everyone's radar, but you guys saw that, so you've made some good bets on the HDInsight and we talked about that yesterday with Wei on here and Microsoft. So edge analytics and data in motion a very key right now, because that batch streaming world's coming together and IoTs flooding it with all this kind of data. We've seen the success in the clouds where analytics have been super successful with powering by the clouds. I got to ask you with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion, specifically IoT too. It's the common question we're getting, not necessarily the Microsoft question, but okay I've got edge coming in strong-- >> Scott: Mm-hmm >> and I'm going to run certainly hybrid in a multi cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal with the edge? >> Wow, there's a lot there (laughs) >> John: You got 10 seconds, go! (laughs) You have Microsoft as your premier cloud and you have an Amazon relationship with a marketplace and what not. You've got a great relationship with Microsoft. >> Yeah. I think it boils down to a bigger macro thing and hopefully I'll peel into some specifics. I think number one, we as an industry kind of short change ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than but certainly than, right, and this is where we started with the whole connected platforms indicating of traditional Hadoop comes from traditional thinking of data at rest. So I've got some data, I've stored it and I want to run some analytics and I want to be able to scale it and all that kinds of stuff. Really good stuff, but only part of the issue. The other part of the issue is data that's moving, data that's being created outside of the four walls of the data center. Data that's coming from devices. How do I manage and move and handle all of that? Of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future. We said look it's really about the entire lifecycle of data from inception to demise of the data or data being delayed, delete it, which very infrequently happens these days. >> Or cold storage-- >> Cold storage, whatever. You know it's created at the edge, it moves through, it moves in different places, its landed, its analyzed, there are models built. But as models get deployed back out to the edge, that entire problem set is a problem set that I think we, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints because when you think about a customer that may have multiple cloud footprints and trying to tie the data together, it creates a unique opportunity, I think there's a reversal in the way people need to think about the future of compute. Where having been around for a little bit of time, it's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center, data are going to be distributed and data movement will become the expensive thing, so it will be very important to be able to have applications that are deployable across a grid, and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability five years from now, ten years from now, it's not going to be about how many exabytes I have in my cloud instance, that will be part of it, it will be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere in between. >> It's totally radical, but it's also innovative. You mentioned the cost of moving data will be the issue. >> Scott: Yeah. >> So that's going to change the architecture of the edge. What are you seeing with customers, cuz we're seeing a lot of people taking a protracted view like you were talking about and looking at the architectures, specifically around okay. There's some pressure, but there's no real gun to the head yet, but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you can share, anecdotal stories, customer references. >> You know the common thing is that customers go, "Yep, that's going to be interesting. "It's not hitting me right now, "but I know it's going to be important. "How can I ease into it and kind of without the suspenders "how can I prove this is going to work and all that." We've seen a lot of certainly interest in that. What's interesting is we're able to apply some of that futuristic IoT technology in Hortonworks data flow that includes NiFi and MiNiFi out to the edge to traditional problems like, let me get the data from the branches into the central office and have that roundtrip communication to a banker who's talking to a customer and has the benefit of all the analytics at home, but I can guarantee that roundtrip of data and analytics. Things that we thought were solid before, can be solved very easily and efficiently with this technology, which is then also extensible even out further to the edge. In many instances, I've been surprised by customer adoption with them saying, "Yeah, I get that, but gee this helps me "solve a problem that I've had for the last 20 years "and it's very easy and it sets me up "on the right architectural course, "for when I start to add in those edge devices, "I know exactly how I'm going to go do it." It's been actually a really good conversation that's very pragmatic with immediate ROI, but again positioning people for the future that they know is coming. Doing that, by the way, we're also able to prove the security. Think about security is a big issue that everyone's talking about, cyber security and everything. That's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices are now outside that fence, so security and privacy and provenance become really, really interesting in that world. It's been gratifying to be able to go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands it. >> That's such great validation when they come back to you with a different solution based on what you just proposed. >> Scott: Yep. >> That means they really start to understand, they really start to see-- >> Scott: Yep. >> How it can provide value to them. >> Absolutely, absolutely. That is all happening and again like I said this I think the notion of the bigger problem set, where it's not just storing data and analyzing data, but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation. The future successful deployments out there because those deployments and folks are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer, not waiting for that roundtrip of really being able to push out customized, tailored interactions, whether it be again if it's driving your car and stopping on time, which is kind of important, to getting a coupon when you're walking past a store and anywhere in between. >> It's good you guys have certainly been well positioned for being flexible, being an open source has been a great advantage. I got to ask you the final question for the folks watching, I'm sure you guys answer this either to investors or whatnot and customers. A lot's changed in the past five years and a lot's happening right now. You just illustrated it out, the scenario with the edge is very robust, dynamic, changing, but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you thinks worth highlighting to people watching that are your customers, investors, or people in the industry. >> I think you brought up a good point, the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we've got to go solve, that's kind of one piece of this storm. The other piece of the storm is the adoption and the wave of open, open community collaboration of developers versus integrated silo stacks of software. That's manifesting itself in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world. That's one requirement slash advantage of being in the open world. I think the other thing that's happening is the generation of workforce. When I think about when I got my first job, I typed a resume with a typewriter. I'm dating myself. >> White out. >> Scott: Yeah, with white out. (laughter) >> I wasn't a good typer. >> Resumes today is basically name and get GitHub address. Here's my body of work and it's out there for everybody to see, and that's the mentality-- >> And they have their cute videos up there as well, of course. >> Scott: Well yeah, I'm sure. (laughter) >> So it's kind of like that shift to this is now the new paradigm for software delivery. >> This is important. You've got theCUBE interview, but I mean you're seeing it-- >> Is that the open source? >> In the entertainment. No, we're seeing people put huge interviews on their LinkedIn, so this notion of collaboration in the software engineering mindset. You go back to when we grew up in software engineering, now it went to open source, now it's GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts, they apply to data engineering, data science is still early days. Media media creation what not so, I think that's a really key point in the data science tools are still in their infancy. >> I think open, and by the way I'm not here to suggest that everything will be open, but I think a majority and-- >> Collaborative the majority of the problem that we're solving will be collaborative, it will be ecosystem driven and where there's an extremely large market open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. >> Yep. You guys are all on the cloud now, you got the Microsoft, any other updates that you think worth sharing with folks. >> You've got to come back and see us in Munich then. >> Alright. We'll be there, theCUBE will be there in Munich in April. We have the Hortonworks coverage going on in Data Works, the conference is now called Data Works in Munich. This is theCUBE here with Scott Gnau, the CTO of Hortonworks. Breaking it down I'm John Furrier with Jeff Frick. More coverage from Big Data SV in conjunction with Strata Hadoop after the short break. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big good to see you again. and in the intersection of blueprint of the footprint on some of the stuff coming out. of customers that I talk to are hybrid I got to ask you with Microsoft and you have an Amazon relationship of the data center. and be able to be selective You mentioned the cost of and looking at the architectures, and has the benefit on what you just proposed. and further out to the edge I got to ask you the final and the whole groundswell Scott: Yeah, with white out. and that's the mentality-- And they have their cute videos Scott: Well yeah, I'm sure. So it's kind of like that shift to but I mean you're seeing it-- in the data science tools the majority of the you got the Microsoft, You've got to come back We have the Hortonworks

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
New York	LOCATION	0.99+
Munich	LOCATION	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
April	DATE	0.99+
yesterday	DATE	0.99+
10 seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
99.99	QUANTITY	0.99+
two places	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
first job	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
next month	DATE	0.99+
two years ago	DATE	0.98+
today	DATE	0.98+
99.9%	QUANTITY	0.98+
ten years	QUANTITY	0.97+
Big Data	EVENT	0.97+
five years	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.96+
this morning	DATE	0.95+
O'Reilly Strata Hadoop	ORGANIZATION	0.95+
One	QUANTITY	0.95+
Data Works	EVENT	0.94+
year end last year	DATE	0.94+
one	QUANTITY	0.93+
Hadoop	TITLE	0.93+
theCUBE	ORGANIZATION	0.93+
one piece	QUANTITY	0.93+
Wei Wang	PERSON	0.91+
NYC	LOCATION	0.9+
Wei	PERSON	0.88+
past five years	DATE	0.87+
first	QUANTITY	0.86+
CTO	PERSON	0.83+
four walls	QUANTITY	0.83+
Big Data SV	ORGANIZATION	0.83+
#BigDataSV	EVENT	0.82+
one function	QUANTITY	0.81+
Big Data SV 17	EVENT	0.78+
EU	LOCATION	0.73+
HDInsight	ORGANIZATION	0.69+
Strata Hadoop	PERSON	0.69+
one requirement	QUANTITY	0.68+
number two	QUANTITY	0.65+

Holden Karau, IBM Big Data SV 17 #BigDataSV #theCUBE

>> Announcer: Big Data Silicon Valley 2017. >> Hey, welcome back, everybody, Jeff Frick here with The Cube. We are live at the historic Pagoda Lounge in San Jose for Big Data SV, which is associated with Strathead Dupe World, across the street, as well as Big Data week, so everything big data is happening in San Jose, we're happy to be here, love the new venue, if you're around, stop by, back of the Fairmount, Pagoda Lounge. We're excited to be joined in this next segment by, who's now become a regular, any time we're at a Big Data event, a Spark event, Holden always stops by. Holden Karau, she's the principal software engineer at IBM. Holden, great to see you. >> Thank you, it's wonderful to be back yet again. >> Absolutely, so the big data meme just keeps rolling, Google Cloud Next was last week, a lot of talk about AI and ML and of course you're very involved in Spark, so what are you excited about these days? What are you, I'm sure you've got a couple presentations going on across the street. >> Yeah, so my two presentations this week, oh wow, I should remember them. So the one that I'm doing today is with my co-worker Seth Hendrickson, also at IBM, and we're going to be focused on how to use structured streaming for machine learning. And sort of, I think that's really interesting, because streaming machine learning is something a lot of people seem to want to do but aren't yet doing in production, so it's always fun to talk to people before they've built their systems. And then tomorrow I'm going to be talking with Joey on how to debug Spark, which is something that I, you know, a lot of people ask questions about, but I tend to not talk about, because it tends to scare people away, and so I try to keep the happy going. >> Jeff: Bugs are never fun. >> No, no, never fun. >> Just picking up on that structured streaming and machine learning, so there's this issue of, as we move more and more towards the industrial internet of things, like having to process events as they come in, make a decision. How, there's a range of latency that's required. Where does structured streaming and ML fit today, and where might that go? >> So structured streaming for today, latency wise, is probably not something I would use for something like that right now. It's in the like sub second range. Which is nice, but it's not what you want for like live serving of decisions for your car, right? That's just not going to be feasible. But I think it certainly has the potential to get a lot faster. We've seen a lot of renewed interest in ML liblocal, which is really about making it so that we can take the models that we've trained in Spark and really push them out to the edge and sort of serve them in the edge, and apply our models on end devices. So I'm really excited about where that's going. To be fair, part of my excitement is someone else is doing that work, so I'm very excited that they're doing this work for me. >> Let me clarify on that, just to make sure I understand. So there's a lot of overhead in Spark, because it runs on a cluster, because you have an optimizer, because you have the high availability or the resilience, and so you're saying we can preserve the predict and maybe serve part and carve out all the other overhead for running in a very small environment. >> Right, yeah. So I think for a lot of these IOT devices and stuff like that it actually makes a lot more sense to do the predictions on the device itself, right. These models generally are megabytes in size, and we don't need a cluster to do predictions on these models, right. We really need the cluster to train them, but I think for a lot of cases, pushing the prediction out to the edge node is actually a pretty reasonable use case. And so I'm really excited that we've got some work going on there. >> Taking that one step further, we've talked to a bunch of people, both like at GE, and at their Minds and Machines show, and IBM's Genius of Things, where you want to be able to train the models up in the cloud where you're getting data from all the different devices and then push the retrained model out to the edge. Can that happen in Spark, or do we have to have something else orchestrating all that? >> So actually pushing the model out isn't something that I would do in Spark itself, I think that's better served by other tools. Spark is not really well suited to large amounts of internet traffic, right. But it's really well suited to the training, and I think with ML liblocal it'll essentially, we'll be able to provide both sides of it, and the copy part will be left up to whoever it is that's doing their work, right, because like if you're copying over a cell network you need to do something very different as if you're broadcasting over a terrestrial XM or something like that, you need to do something very different for satellite. >> If you're at the edge on a device, would you be actually running, like you were saying earlier, structured streaming, with the prediction? >> Right, I don't think you would use structured streaming per se on the edge device, but essentially there would be a lot of code share between structured streaming and the code that you'd be using on the edge device. And it's being vectored out now so that we can have this code sharing and Spark machine learning. And you would use structured streaming maybe on the training side, and then on the serving side you would use your custom local code. >> Okay, so tell us a little more about Spark ML today and how we can democratize machine learning, you know, for a bigger audience. >> Right, I think machine learning is great, but right now you really need a strong statistical background to really be able to apply it effectively. And we probably can't get rid of that for all problems, but I think for a lot of problems, doing things like hyperparameter tuning can actually give really powerful tools to just like regular engineering folks who, they're smart, but maybe they don't have a strong machine learning background. And Spark's ML pipelines make it really easy to sort of construct multiple stages, and then just be like, okay, I don't know what these parameters should be, I want you to do a search over what these different parameters could be for me, and it makes it really easy to do this as just a regular engineer with less of an ML background. >> Would that be like, just for those of us who are, who don't know what hyperparameter tuning is, that would be the knobs, the variables? >> Yeah, it's going to spin the knobs on like our regularization parameter on like our regression, and it can also spin some knobs on maybe the engram sizes that we're using on the inputs to something else, right. And it can compare how these knobs sort of interact with each other, because often you can tune one knob but you actually have six different knobs that you want to tune and you don't know, if you just explore each one individually, you're not going to find the best setting for them working together. >> So this would make it easier for, as you're saying, someone who's not a data scientist to set up a pipeline that lets you predict. >> I think so, very much. I think it does a lot of the, brings a lot of the benefits from sort of the SciPy world to the big data world. And SciPy is really wonderful about making machine learning really accessible, but it's just not ready for big data, and I think this does a good job of bringing these same concepts, if not the code, but the same concepts, to big data. >> The SciPy, if I understand, is it a notebook that would run essentially on one machine? >> SciPy can be put in a notebook environment, and generally it would run on, yeah, a single machine. >> And so to make that sit on Spark means that you could then run it on a cluster-- >> So this isn't actually taking SciPy and distributing it, this is just like stealing the good concepts from SciPy and making them available for big data people. Because SciPy's done a really good job of making a very intuitive machine learning interface. >> So just to put a fine sort of qualifier on one thing, if you're doing the internet of things and you have Spark at the edge and you're running the model there, it's the programming model, so structured streaming is one way of programming Spark, but if you don't have structured streaming at the edge, would you just be using the core batch Spark programming model? >> So at the edge you'd just be using, you wouldn't even be using batch, right, because you're trying to predict individual events, right, so you'd just be calling predict with every new event that you're getting in. And you might have a q mechanism of some type. But essentially if we had this batch, we would be adding additional latency, and I think at the edge we really, the reason we're moving the models to the edge is to avoid the latency. >> So just to be clear then, is the programming model, so it wouldn't be structured streaming, and we're taking out all the overhead that forced us to use batch with Spark. So the reason I'm trying to clarify is a lot of people had this question for a long time, which is are we going to have a different programming model at the edge from what we have at the center? >> Yeah, that's a great question. And I don't think the answer is finished yet, but I think the work is being done to try and make it look the same. Of course, you know, trying to make it look the same, this is Boosh, it's not like actually barking at us right now, even though she looks like a dog, she is, there will always be things which are a little bit different from the edge to your cluster, but I think Spark has done a really good job of making things look very similar on single node cases to multi node cases, and I think we can probably bring the same things to ML. >> Okay, so it's almost time, we're coming back, Spark took us from single machine to cluster, and now we have to essentially bring it back for an edge device that's really light weight. >> Yeah, I think at the end of the day, just from a latency point of view, that's what we have to do for serving. For some models, not for everyone. Like if you're building a website with a recommendation system, you don't need to serve that model like on the edge node, that's fine, but like if you've got a car device we can't depend on cell latency, right, you have to serve that in car. >> So what are some of the things, some of the other things that IBM is contributing to the ecosystem that you see having a big impact over the next couple years? >> So there's a lot of really exciting things coming out of IBM. And I'm obviously pretty biased. I spend a lot of time focused on Python support in Spark, and one of the most exciting things is coming from my co-worker Brian, I'm not going to say his last name in case I get it wrong, but Brian is amazing, and he's been working on integrating Arrow with Spark, and this can make it so that it's going to be a lot easier to sort of interoperate between JVM languages and Python and R, so I'm really optimistic about the sort of Python and R interfaces improving a lot in Spark and getting a lot faster as well. And we're also, in addition to the Arrow work, we've got some work around making it a lot easier for people in R and Python to get started. The R stuff is mostly actually the Microsoft people, thanks Felix, you're awesome. I don't actually know which camera I should have done that to but that's okay. >> I think you got it! >> But Felix is amazing, and the other people working on R are too. But I think we've both been pursuing sort of making it so that people who are in the R or Python spaces can just use like Pit Install, Conda Install, or whatever tool it is they're used to working with, to just bring Spark into their machine really easily, just like they would sort of any other software package that they're using. Because right now, for someone getting started in Spark, if you're in the Java space it's pretty easy, but if you're in R or Python you have to do sort of a lot of weird setup work, and it's worth it, but like if we can get rid of that friction, I think we can get a lot more people in these communities using Spark. >> Let me see, just as a scenario, the R server is getting fairly well integrated into Sequel server, so would it be, would you be able to use R as the language with a Spark execution engine to somehow integrate it into Sequel server as an execution engine for doing the machine learning and predicting? >> You definitely, well I shouldn't say definitely, you probably could do that. I don't necessarily know if that's a good idea, but that's the kind of stuff that this would enable, right, it'll make it so that people that are making tools in R or Python can just use Spark as another library, right, and it doesn't have to be this really special setup. It can just be this library and they point out the cluster and they can do whatever work it wants to do. That being said, the Sequel server R integration, if you find yourself using that to do like distributed computing, you should probably take a step back and like rethink what you're doing. >> George: Because it's not really scale out. >> It's not really set up for that. And you might be better off doing this with like, connecting your Spark cluster to your Sequel server instance using like JDBC or a special driver and doing it that way, but you definitely could do it in another inverted sort of way. >> So last question from me, if you look out a couple years, how will we make machine learning accessible to a bigger and bigger audience? And I know you touched on the tuning of the knobs, hyperparameter tuning, what will it look like ultimately? >> I think ML pipelines are probably what things are going to end up looking like. But I think the other part that we'll sort of see is we'll see a lot more examples of how to work with certain kinds of data, because right now, like, I know what I need to do when I'm ingesting some textural data, but I know that because I spent like a week trying to figure out what the hell I was doing once, right. And I didn't bother to write it down. And it looks like no one else bothered to write it down. So really I think we'll see a lot of tools that look very similar to the tools we have today, they'll have more options and they'll be a bit easier to use, but I think the main thing that we're really lacking right now is good documentation and sort of good books and just good resources for people to figure out how to use these tools. Now of course, I mean, I'm biased, because I work on these tools, so I'm like, yeah, they're pretty great. So there might be other people who are like, Holden, no, you're wrong, we need to rethink everything. But I think this is, we can go very far with the pipeline concept. >> And then that's good, right? The democratization of these things opens it up to more people, you get more creative people solving more different problems, that makes the whole thing go. >> You can like install Spark easily, you can, you know, set up an ML pipeline, you can train your model, you can start doing predictions, you can, people that haven't been able to do machine learning at scale can get started super easily, and build a recommendation system for their small little online shop and be like, hey, you bought this, you might also want to buy Boosh, he's really cute, but you can't have this one. No no no, not this one. >> Such a tease! >> Holden: I'm sorry, I'm sorry. >> Well Holden, that will, we'll say goodbye for now, I'm sure we will see you in June in San Francisco at the Spark Summit, and look forward to the update. >> Holden: I look forward to chatting with you then. >> Absolutely, and break a leg this afternoon at your presentation. >> Holden: Thank you. >> She's Holden Karau, I'm Jeff Frick, he's George Gilbert, you're watching The Cube, we're at Big Data SV, thanks for watching. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

Announcer: Big Data We're excited to be joined to be back yet again. so what are you excited about these days? but I tend to not talk about, like having to process and really push them out to the edge and carve out all the other overhead We really need the cluster to train them, model out to the edge. and the copy part will be left up to and then on the serving side you would use you know, for a bigger audience. and it makes it really easy to do this that you want to tune and you don't know, that lets you predict. but the same concepts, to big data. and generally it would run the good concepts from SciPy the models to the edge So just to be clear then, from the edge to your cluster, machine to cluster, like on the edge node, that's fine, R and Python to get started. and the other people working on R are too. but that's the kind of stuff not really scale out. to your Sequel server instance and they'll be a bit easier to use, that makes the whole thing go. and be like, hey, you bought this, look forward to the update. to chatting with you then. Absolutely, and break you're watching The Cube,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Brian	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Holden Karau	PERSON	0.99+
Holden	PERSON	0.99+
Felix	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Joey	PERSON	0.99+
Jeff	PERSON	0.99+
IBM	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Seth Hendrickson	PERSON	0.99+
Spark	TITLE	0.99+
Python	TITLE	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
San Francisco	LOCATION	0.99+
June	DATE	0.99+
six different knobs	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
Boosh	PERSON	0.99+
Pagoda Lounge	LOCATION	0.99+
one knob	QUANTITY	0.99+
both sides	QUANTITY	0.99+
two presentations	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
The Cube	ORGANIZATION	0.98+
Java	TITLE	0.98+
both	QUANTITY	0.97+
one thing	QUANTITY	0.96+
one	QUANTITY	0.96+
Big Data week	EVENT	0.96+
single machine	QUANTITY	0.95+
R	TITLE	0.95+
SciPy	TITLE	0.95+
Big Data	EVENT	0.95+
single machine	QUANTITY	0.95+
each one	QUANTITY	0.94+
JDBC	TITLE	0.93+
Spark ML	TITLE	0.89+
JVM	TITLE	0.89+
The Cube	TITLE	0.88+
single	QUANTITY	0.88+
Sequel	TITLE	0.87+
Big Data Silicon Valley 2017	EVENT	0.86+
Spark Summit	LOCATION	0.86+
one machine	QUANTITY	0.86+
a week	QUANTITY	0.84+
Fairmount	LOCATION	0.83+
liblocal	TITLE	0.83+

Gaurav Dhillon | Big Data SV 17

>> Hey, welcome back everybody. Jeff Rick here with the Cube. We are live in downtown San Jose at the historic Pagoda Lounge, part of Big Data SV, which is part of Strata + Hadoop Conference, which is part of Big Data Week because everything big data is pretty much in San Jose this week. So we're excited to be here. We're here with George Gilbert, our big data analyst from Wikibon, and a great guest, Gaurav Dhillon, Chairman and CEO of SnapLogic. Gaurav, great to see you. >> Pleasure to be here, Jeff. Thank you for having me. George, good to see you. >> You guys have been very busy since we last saw you about a year ago. >> We have. We had a pretty epic year. >> Yeah, give us an update, funding, and customers, and you guys have a little momentum. >> It's a good thing. It's a good thing, you know. A friend and a real mentor to us, Dan Wormenhoven, the Founder and CEO of NetApp for a very long time, longtime CEO of NetApp, he always likes to joke that growth cures all startup problems. And you know what, that's the truth. >> Jeff: Yes. >> So we had a scorching year, you know. 2016 was a year of continuing to strengthen our products, getting a bunch more customers. We got about 300 new customers. >> Jeff: 300 new customers? >> Yes, and as you know, we don't sell to small business. We sell to the enterprise. >> Right, right. >> So, this is the who's who of pharmaceuticals, continued strength in high-tech, continued strength in retail. You know, all the way from Subway Sandwich to folks like AstraZeneca and Amgen and Bristol-Myers Squibb. >> Right. >> So, some phenomenal growth for the company. But, you know, we look at it very simply. We want to double our company every year. We want to do it in a responsible way. In other words, we are growing our business in such a way that we can sail over to cash flow break-even at anytime. So responsibly doubling your business is a wonderful thing. >> So when you look at it, obviously, you guys are executing, you've got good products, people are buying. But what are some of the macro-trends that you're seeing talking to all these customers that are really helping push you guys along? >> Right, right. So what we see is, and it used to be the majority of our business. It's now getting to be 50/50. But still I would say, historically, the primary driver for 2016 of our business was a digital transformation at a boardroom level causing a rethinking of the appscape and people bringing in cloud applications like Workday. So, one of the big drivers of our growth is helping fit Workday into the new fabric in many enterprises: Vassar College, into Capital One, into finance and various other sectors. Where people bring in Workday, they want to make that work with what they have and what they're going to buy in the future, whether it's more applications or new types of data strategies. And that is the primary driver for growth. In the past, it was probably a secondary driver, this new world of data warehousing. We like to think of it as a post-modern era in the use of data and the use of analytics. But this year, it's trending to be probably 50/50 between apps and data. And that is a shift towards people deploying in the same way that they moved from on-premise apps to SAS apps, a move towards looking at data platforms in the cloud for all the benefits of racking and stacking and having the capability rather than being in the air-conditioning, HVAC, and power consumption business. And that has been phenomenal. We've seen great growth with some of the work from Microsoft Azure with the Insights products, AWS's Redshift is a fantastic growth area for us. And these sorts of technologies, we think are going to be of significant impact to the everyday, the work clothing types of analytics. Maybe the more exotic stuff will stay on prem, but a lot of the regular business-like stuff, you know, stuff in suits and ties is moving into the cloud at a rapid pace. >> And we just came off the Google Next show last week. And Google really is helping continue to push kind of ML and AI out front. And so, maybe it's not the blue suit analytics. >> Gaurav: Indeed, yes. >> But it does drive expectations. And you know, the expectations of what we can get, what we should get, what we should be moving towards is rapidly changing. >> Rapidly changing, for example, we saw at The New York Times, which as many of Google's flagship enterprise customers are media-related. >> Jeff: Right. >> No accident, they're so proficient themselves being in the consumer internet space. So as we encountered in places like The New York Times, is there's a shift away from a legacy data warehouse, which people like me and others in the last century, back in my time in Informatica, might have sold them towards a cloud-first strategy of using, in their case, Google products, Bigtable, et cetera. And also, they're doing that because they aspirationally want to get at consumer prices without having to have a campus and the expense of Google's big brain. They want to benefit from some of those things like TensorFlow, et cetera, through the machine learning and other developer capabilities that are now coming along with that in the cloud. And by the way, Microsoft has amazing machine learning capability in its Azure for Microsoft Research as well. >> So Gaurav, it's interesting to hear sort of the two drivers. We know PeopleSoft took off starting with HR first and then would add on financials and stumble a little bit with manufacturing. So, when someone wants to bring in Workday, is it purely an efficiency value prop? And then, how are you helping them tie into the existing fabric of applications? >> Look, I think you have to ask Dave or Aneel or ask them together more about that dynamic. What I know, as a friend of the firm and as somebody we collaborate with, and, you know, this is an interesting statistic, 20 percent of Workday's financial customers are using SnapLogic, 20 percent. Now, it's a nascent business for them and you and I were around in the last century of ERP. We saw the evolution of functional winners. Some made it into suites and some didn't. Siebel never did. PeopleSoft at least made a significant impact on a variety of other things. Yes, there was Bonn and other things that prevented their domination of manufacturing and, of course, the small company in Walldorf did a very good job on it too. But that said, what we find is it's very typical, in a sense, how people using TIBCO and Informatica in the last century are looking at SnapLogic. And it's no accident because we saw Workdays go to market motion, and in a sense, are following, trying to do the same thing Dave and Aneel have done, but we're trying to do the same thing, being a bunch of ex-Informatica guys. So here's what it is. When you look at your legacy installation, and you want to modernize it, what are your choices? You can do a big old upgrade because it's on-premise software. Or you can say, "You know what? "For 20% more, I could just get the new thing." And guess what? A lot of people want to get the new thing. And that's what you're going to see all the time. And that's what's happening with companies like SnapLogic and Workday is, you know, someone. Right here locally, Adobe, it's an icon in technology and certainly in San Jose that logo is very big. A few years ago, they decided to make the jump from legacy middleware, TIBCO, Informatica, WebMethods, and they've replaced everything globally with SnapLogic. So in that same way, instead of trying to upgrade this version and that version and what about what we do in Japan, what do we do in Sweden, why don't you just find a platform as a service that lets you elevate your success and go towards a better product, more of a self-service better UX, millennial-friendly type of product? So that's what's happening out there. >> But even that three-letter company from Walldorf was on-stage last week. You can now get SAP on the Google Cloud Platform which I thought was pretty amazing. And the other piece I just love but there's still a few doubters out there on the SAS platform is now there's a really visual representation. >> Gaurav: There is. >> Of the dominance of that style going up in downtown San Francisco. It's 60 stories high, and it's taken over the landscape. So if there's ever any a doubt of enterprise adaptation of SAS, and if anything, I would wonder if kind of the proliferation of apps now within the SAS environment inside the enterprise starts to become a problem in and of its own self. Because now you have so many different apps that you're working on and working. God help if the internet goes down, right? >> It's true, and you know, and how do you make e pluribus unim, out of many one, right? So it's hilarious. It is almost at proliferation at this point. You know, our CFO tapped me the other day. He said, "Hey, you've got to check this out." "They're using a SAS application which they got "from a law firm to track stock options "inside the company." I'm like, "Wow, that is a job title and a vertical." So only high growth private venture backed companies need this, and typically it's high tech. And you have very capable SAS, even in the small grid squares in the enterprise. >> Jeff: Right, right. >> So, a sign, and I think that's probably another way to think about the work that we do at SnapLogic and others. >> Jeff: Right, right. >> Other people in the marketplace like us. What we do essentially is we give you the ERP of one. Because if you could choose things that make sense for you and they could work together in a very good way to give you very good fabric for your purposes, you've essentially bought a bespoke suit at rack prices. Right? Without that nine times multiplier of the last century of having to have just consultants without end, darkened the sky with consultants to make that happen. You know? So that, yes, SAS proliferation is happening. That is the opportunity, also the problem. For us, it's an opportunity where that glass is half-full we come in with SnapLogic and knit it together for you to give you fabric back. And people love that because the businesses can buy what they want, and the enterprise gets a comprehensive solution. >> Jeff: Right, right. >> Well, at the risk of taking a very short tangent, that comment about darkening the skies, if I recall, was the battle of the Persians threatening the 300 Greeks at the battle of Thermopylae. >> Gaurav: Yes. >> And they said, "We'll darken the skies with our arrows." And so the Greek. >> Gaurav: Come and get 'em. >> No, no. >> The famous line was, he said, "Give us your weapons." And the guy says, "Come and get 'em." (laughs) >> We got to that point, the Greek general says, "Well, we'll fight in the shade." (all laughing) But I wanted to ask you. >> This is the movie 300 as well, right? >> Yes. >> The famous line is, "Give us your weapons." He said, "Come and get 'em." (all laughing) >> But I'm thinking also of the use case where a customer brings in Workday and you help essentially instrument it so it can be a good citizen. So what does that make, or connect it so it can be a good citizen. How much easier does that mean or does that make fitting in other SAS apps or any other app into the fabric, application fabric? >> Right, right. Look, George. As you and I know, we both had some wonderful runs in the last century, and here we are doing version 2.0 in many ways, again, very similar to the Workday management. The enterprise is hip to the fact that there is a Switzerland nature to making things work together. So they want amazing products like Workday. They want amazing products like the SAP Cloud Suite, now with Concur, SuccessFactors in there. Some very cool things happening in the analytics world which you'll see at Sapphire and so on. So some very, very capable products coming from, I mean, Oracle's bought 80 SAS companies or 87 SAS companies. And so, what you're seeing is the enterprise understands that there's going to be red versus blue and a couple other stripes and colors and that they want their businesspeople to buy whatever works for them. But they want to make them work together. All right? So there is a natural sort of geographic or structural nature to this business where there is a need for Switzerland and there is a need for amazing technology, some of which can only come from large companies with big balance sheets and vertical understanding and a legacy of success. But if a customer like an AstraZeneca where you have a CIO like Dave Smoley who transformed Flextronics, is now doing the same thing at AstraZeneca bringing cloud apps, is able to use companies like SnapLogic and then deploy Workday appropriately, SAP appropriately, have his own custom development, some domestic, some overseas, all over the world, then you've got the ability again to get something very custom, and you can do that at a fraction of the cost of overconsulting or darkening the skies in the way that things were done in the last century. >> So, then tell us about maybe the convergence of the new age data warehousing, the data science pipeline, and then this bespoke collection of applications, not bespoke the way Oracle tried it 20 years ago where you had to upgrade every app tied into every other app on prem, but perhaps the integration, more from many to one because they're in the cloud. There's only one version of each. How do you tie those two worlds together? >> You know, it's like that old bromide, "Know when to hold 'em. "Know when to fold them." There is a tendency when programming becomes more approachable, you have more millennials who are able to pick up technology in a way. I mean, it's astounding what my children can do. So what you want to do is as a enterprise, you want to very carefully build those things that you want to build, make sure you don't overbuild. Or, say, if you have a development capability, then every problem looks like a development nail and you have a hammer called development. "Let's hire more Java programmers." That's not the answer. Conversely, you don't want to lose sight of the fact that to really be successful in this millennium, you have to have a core competence around technology. So you want to carefully assemble and build your capability. Now, nobody should ever outsource management. That's a bad idea. (chuckles) But what you want to do is you want to think about those things that you want to buy as a package. Is that a core competence? So, there are excellent products for finance, for human capital management, for travel expense management. Coupa just announced today their for managing your spend. Some of the work at Ariba, now the Ariba Cloud at SAP, are excellent products to help you do certain job titles really well. So you really shouldn't be building those things. But what you should be doing is doing the right element of build and buy. So now, what does that mean for the world of analytics? In my view, people building data platforms or using a lot of open source and a lot of DevOps labor and virtualization engineering and all that stuff may be less valuable over time because where the puck is going is where a lot of people should skate to is there is a nature of developing certain machine language and certain kind of AI capabilities that I think are going to be transformational for almost every industry. It is hard to imagine anything in a more mechanized back office, moving paper, manufacturing, that cannot go through a quantum of improvement through AI. There are obviously moral and certain humanity dystopia issues around that to be dealt with. But what people should be doing is I think building out the AI capabilities because those are very custom to that business. Those have to do with the business's core competence, its milieu of markets and competitors. But there should be, in a sense, stroking a purchase order in the direction of a SAS provider, a cloud data provider like Microsoft Azure or Redshift, and shrinking down their lift-and-shift bill and their data center bill by doing that. >> It's fascinating how long it took enterprises to figure out that. Just like they've been leveraging ADP for God knows how many years, you know, there's a lot of other SAS applications you can use to do your non-differentiated heavy lifting, but they're clearly all in now. So Gaurav, we're running low on time. I just want to say, when we get you here next year, what's top of your plate? What's top of priorities for 2017? Cause obviously you guys are knocking down things left and right. >> Thank you, Jeff. Look, priority for us is growth. We're a growth company. We grow responsibly. We've seen a return to quality on the part of investors, on the part of public and private investors. And you know, you'll see us continue to sort of go at that growth opportunity in a manner consistent with our core values of building product with incredible success. 99% of our customers are new to our products last quarter. >> Jeff: Ninety-nine percent? >> Yes sir. >> That says it all. >> And in the world of enterprise software where there's a lot of snake oil, I'm proud to say that we are building new product with old-fashioned values, and that's what you see from us. >> Well 99% customer retention, you can't beat that. >> Gaurav: Hard to beat! There's no way but down from there, right? (laughing) >> Exactly. Alright Gaurav, well, thanks. >> Pleasure. >> For taking a few minutes out of your busy day. >> Thank you, Jeff. >> And I really appreciate the time. >> Thank you, Jeff, thank you, George. >> Alright, he's George Gilbert. I'm Jeff Rick. You're watching the Cube from the historic Pagoda Lounge in downtown San Jose. Thanks for watching.

Published Date : Mar 15 2017

SUMMARY :

at the historic Pagoda Thank you for having me. since we last saw you about a year ago. We had a pretty epic year. and customers, and you guys the Founder and CEO of So we had a scorching year, you know. Yes, and as you know, we You know, all the way from Subway Sandwich growth for the company. So when you look at it, And that is the primary driver for growth. the blue suit analytics. And you know, the expectations of Google's flagship enterprise customers and the expense of Google's big brain. sort of the two drivers. What I know, as a friend of the firm And the other piece I just love if kind of the proliferation of apps now even in the small grid that we do at SnapLogic and others. and the enterprise gets at the battle of Thermopylae. And so the Greek. And the guy says, "Come and get 'em." the Greek general says, "Give us your weapons." and you help essentially instrument it a fraction of the cost of the new age data warehousing, of the fact that to really be successful we get you here next year, And you know, you'll see us continue And in the world of enterprise software retention, you can't beat that. Alright Gaurav, well, thanks. out of your busy day. the historic Pagoda Lounge

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Dave Smoley	PERSON	0.99+
Dan Wormenhoven	PERSON	0.99+
Jeff	PERSON	0.99+
Dave	PERSON	0.99+
Gaurav Dhillon	PERSON	0.99+
George	PERSON	0.99+
2017	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
AstraZeneca	ORGANIZATION	0.99+
Jeff Rick	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amgen	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Ariba	ORGANIZATION	0.99+
PeopleSoft	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
Gaurav	PERSON	0.99+
San Jose	LOCATION	0.99+
Vassar College	ORGANIZATION	0.99+
2016	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Sweden	LOCATION	0.99+
20%	QUANTITY	0.99+
20 percent	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
Walldorf	LOCATION	0.99+
80	QUANTITY	0.99+
Aneel	PERSON	0.99+
SnapLogic	ORGANIZATION	0.99+
TIBCO	ORGANIZATION	0.99+
87	QUANTITY	0.99+
next year	DATE	0.99+
Informatica	ORGANIZATION	0.99+
300 new customers	QUANTITY	0.99+
last week	DATE	0.99+
Bristol-Myers Squibb	ORGANIZATION	0.99+
60 stories	QUANTITY	0.99+
Ninety-nine percent	QUANTITY	0.99+
Adobe	ORGANIZATION	0.99+
Switzerland	LOCATION	0.99+
last century	DATE	0.99+
Wikibon	ORGANIZATION	0.99+
SAP	ORGANIZATION	0.99+
Coupa	ORGANIZATION	0.98+
two drivers	QUANTITY	0.98+
WebMethods	ORGANIZATION	0.98+
two worlds	QUANTITY	0.98+
Flextronics	ORGANIZATION	0.98+
Sapphire	ORGANIZATION	0.98+
SAP Cloud Suite	TITLE	0.98+
this year	DATE	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Big Data SV 17: