Kunal Agarwal, Unravel Data | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCube! Presenting Big Data: Silicon Valley Brought to you by SiliconANGLE Media and its ecosystem partners. (techno music) >> Welcome back to theCube. We are live on our first day of coverage at our event BigDataSV. I am Lisa Martin with my co-host George Gilbert. We are at this really cool venue in downtown San Jose. We invite you to come by today, tonight for our cocktail party. It's called Forager Tasting Room and Eatery. Tasty stuff, really, really good. We are down the street from the Strata Data Conference, and we're excited to welcome to theCube a first-time guest, Kunal Agarwal, the CEO of Unravel Data. Kunal, welcome to theCube. >> Thank you so much for having me. >> So, I'm a marketing girl. I love the name Unravel Data. (Kunal laughs) >> Thank you. >> Two year old company. Tell us a bit about what you guys do and why that name... What's the implication there with respect to big data? >> Yeah, we are a application performance management company. And big data applications are just very complex. And the name Unravel is all about unraveling the mysteries of big data and understanding why things are not performing well and not really needing a PhD to do so. We're simplifying application performance management for the big data stack. >> Lisa: Excellent. >> So, so, um, you know, one of the things that a lot of people are talking about with Hadoop, originally it was this cauldron of innovation. Because we had the "let a thousand flowers bloom" in terms of all the Apache projects. But then once we tried to get it into operation, we discovered there's a... >> Kunal: There's a lot of problems. (Kunal laughs) >> There's an overhead, there's a downside to it. >> Maybe tell us, tell us why you both need to know, you need to know how people have done this many, many times. >> Yeah. >> How you need to learn from experience and then how you can apply that even in an environment where someone hasn't been doing it for that long. >> Right. So, if I back a little bit. Big data is powerful, right? It's giving companies an advantage that they never had, and data's an asset to all of these different companies. Now they're running everything from BI, machine learning, artificial intelligence, IOT, streaming applications on top of it for various reasons. Maybe it is to create a new product to understand the customers better, etc., But as you rightly pointed out, when you start to implement all of these different applications and jobs, it's very, very hard. It's because big data is very complex. With that great power comes a lot of complexity, and what we started to see is a lot of companies, while they want to create these applications and provide that differentiation to their company, they just don't have enough expertise as well in house to go and write good applications, maintain these applications, and even manage the underlying infrastructure and cluster that all these applications are running on. So we took it upon ourselves where we thought, Hey, if we simplify application performance management and if we simplify ongoing management challenges, then these companies would run more big data applications, they would be able to expand their use cases, and not really be fearful of, Hey, we don't know how to go and solve these problems. Do we actually rely on our system that is so complex and new? And that's the gap the Unravel fills, which is we monitor and manage not only one componenent of the big data ecosystem, but like you pointed out, it's a, it's a full zoo of all of these systems. You have Hadoop, and you have Spark, and you have Kafka for data injection. You may have some NoSQL systems and newer MPP platforms as well. So the vision of Unravel is really to be that one place where you can come in and understand what's happening with your applications and your system overall and be able to resolve those problems in an automatic, simple way. >> So, all right, let's start at the concrete level of what a developer might get out of >> Kunal: Right. >> something that's wrapped in Unravel and then tell us what the administrator experiences. >> Kunal: Absolutely. So if you are a big data developer you've got in a business requirement that, Hey, go and make this application that understands our customers better, right? They may choose a tool of their liking, maybe Hive, maybe Spark, maybe Kafka for data injection. And what they'll do is they'll write an app first in dev, in their dev environment or the QA environment. And they'll say, Hey, maybe this application is failing, or maybe this application is not performing as fast as I want it to, or even worse that this application is starting to hog a lot of resources, which may slow down my other applications. Now to understand what's causing these kind of problems today developers really need a PhD to go and decipher them. They have to look at tons of law rogs, uh, raw logs metrics, configuration settings and then try to stitch the story up in their head, trying to figure out what is the effect, what is the cause? Maybe it's this problem, maybe it's some other problem. And then do trial and error to try, you know to solving that particular issue. Now what we've seen is big data developers come in variety of flavors. You have the hardcore developers who truly understand Spark and Hadoop and everything, but then 80% of the people submitting these applications are data scientist or business analysts, who may understand SQL, who may know Python, but don't necessarily know what distributed computing and parallel processing and all of these things really are, and where can inefficiencies and problems really lie. So we give them this one view, which will connect all of these different data sources and then tell them in plain English, this is the problem, this is why this problem happened, and this is how you can go and resolve it, thereby getting them unstuck and making it very simple for them to go in and get the performance that they're getting. >> So, these, these, um, they're the developers up front and you're giving them a whole new, sort of, toolchain or environment to solve the operational issues. >> Kunal: Right. >> So that the, if it's DevOps, its really dev is much more sufficient. >> Yes, yes, I mean, all companies want to run fast. They don't want to be slowed down. If you have a problem today, they'll file a ticket, it'll go to the operations team, you wait a couple of days to get some more information back. That just means your business has slowed down. If things are simple enough where the application developers themselves can resolve a lot of these issues, that'll get the business unstuck and get them moving on further. Now, to the other point which you were asking, which is what about the operations and the app support people? So, Unravel's a great tool for them too because that helps them see what's happening holistically in the cluster. How are other applications behaving with each other? It's usually a multitenant, multiapplication environment that these big data jobs are running on. So, is my apps slowing down George's apps? Am I stealing resources from your applications? More so, not just about an individual application issue itself. So Unravel will give you visibility into each app, as well as the overall cluster to help you understand cluster-wide problems. >> Love to get at, maybe peel apart your target audience a little bit. You talked about DevOps. But also the business analysts, data scientists, and we talk about big data. Data is, has such tremendous power to fuel a company and, you know, like you said use it to deliver and, create and deliver new products. Are you talking with multiple audiences within a company? Do you start at DevOps and they bring in their peers? Or do you actually start, maybe, at the Chief Data Officer level? What's that kind of entrance for Unravel? >> So the word I use to describe this is DataOps, instead of DevOps, right? So in the older world you had developers, and you had operations people. Over here you have a data team and operations people, and that data team can comprise of the developers, the data scientists, the business analysts, etc., as well. But you're right. Although we first target the operations role because they have to manage and monitor the system and make sure everything is running like a well-oiled machine, they are now spreading it out to be end-users, meaning the developers themselves saying, "Don't come to me for every problem. "Look at Unravel, try solve it here, "and if you cannot, then come to me." This is all, again, improving agility within the company, making sure that people have the necessary tools and insights to carry on with their day. >> Sounds like an enabler, >> Yeah, absolutely. >> That operations would push down to the DevOp, the developers themselves. >> And even the managers and the CDOs, for example, they want to see their ROI that they're getting from their big data investments. They want to see, they have put in these millions of dollars, have got an infrastructure and these services set up, but how are we actually moving the needle forward? Are there any applications that we're actually putting in business, and is that driving any business value? So we will be able to give them a very nice dashboard helping them understand what kind of throughput are you getting from your system, how many applications were you able to develop last week and onboard to your production environment? And what's the rate of innovation that's really happening inside your company on those big data ecosystems? >> It sort of brings up an interesting question on two prongs. One is the well-known, but inexact number about how many big data projects, >> Kunal: Yeah, yeah. >> I don't know whether they fail or didn't pay off. So there's going in and saying, "Hey, we can help you manage this "because it was too complicated." But then there's also the, all the folks who decided, "Well, we really don't want "to run it all on-prem. "We're not going to throw away everything we did there, "but we're going to also put a lot of new investment >> Kunal: Exactly, exactly. >> in the cloud. Now, Wikibon has a term for that, which true private cloud, which is when you have the operational processes that you use in the public cloud and you can apply them on-prem. >> Right. >> George: But there's not many products that help you do that. How can Unravel work...? >> Kunal: That's a very good questions, George. We're seeing the world move more and more to a cloud environment, or I should say an on-demand environment where you're not so bothered about the infrastructure and the services, but you want Spark as a dial tone. You want Kafka as a dial tone. You want a machine-learning platform as a dial tone. You want to come in there, you want to put in your data, and you want to just start running it. Unravel has been designed from the ground up to monitor and manage any of these environments. So, Unravel can solve problems for your applications running on-premise and similarly all the applications that are running on cloud. Now, on the cloud there are other levels of problems as well so, of course, you'd have applications that are slow, applications that are failing; we can solve those problems. But if you look at a cloud environment, a lot of these now provide you an autoscaling capability, meaning, Hey, if this app doesn't run in the amount of time that we were hoping it to run, let's add extra hardware and run this application. Well, if you just keep throwing machines at the problem, it's not going to solve your issue. Now, it doesn't decrease the time that it will take linearly with how many servers that you're actually throwing in there, so what we can help companies understand is what is the resource requirement of a particular application? How should we be intelligently allocating resources to make sure that you're able to meet your time SLAs, your constraints of, here I need to finish this with x number of minutes, but at the same time be intelligent about how much cost you're spending over there. Do you actually need 500 containers to go and run this app? Well, you may have needed 200. How do you know that? So, Unravel will also help you get efficient with your run, not just faster, but also can it be a good multitenant citizen, can it use limited resources to actually run this applications as well? >> So, Kunal, some of the things I'm hearing from a customer's standpoint that are potential positive business outcomes are internal: performance boost. >> Kunal: Yeah. >> It also sounds like, sort of... productivity improvements internally. >> And then also the opportunity to have the insight to deliver new products, but even I'm thinking of, you know, helping make a retailer, for example, be able to do more targeted marketing, so >> the business outcomes and the impact that Unravel can make really seem to have pretty strong internal and external benefits. >> Kunal: Yes. >> Is there a favorite customer story, (Kunal laughs) don't have to mention names, that you really think speaks to your capabilities? >> So, 100% Improving performance is a very big factor of what Unravel can do. Decreasing costs by improving productivity, by limiting the amount of resources that you're using, is a very, very big factor. Now, amongst all of these companies that we work with, one key factor is improving reliability, which means, Hey, it's fine that he can speed up this application, but sometimes I know the latency that I expect from an app, maybe it's a second, maybe it's a minute, depending on the type of application. But what businesses cannot tolerate is this app taking five x amount more time today. If it's going to finish in a minute, tell me it'll finish in a minute and make sure it finishes in a minute. And this is a big use case for all of the big data vendors because a lot of the customers are moving from Teradata, or from Vertica, or from other relation databases, on to Hortonworks or Cloudera or Amazon EMR. Why? Because it's one tenth the amount of cost for running these workloads. But, all the customers get frustrated and say, "I don't mind paying 10 x more money, "but because over there it used to work. "Over here, there are just so many complications, "and I don't have reliability with these applications." So that's a big, big factor of, you know, how we actually help these customers get value out of the Unravel product. >> Okay, so, um... A question I'm, sort of... why aren't there so many other Unravels? >> Kunal: Yeah. (Kunal laughs) >> From what I understood from past conversations. >> Kunal: Yeah. >> You can only really build the models that are at the heart of your capabilities based on tons and tons of telemetry >> Kunal: Yeah. >> that cloud providers or, or, sort of, internet scale service providers have accumulated in that, because they all have sort of a well-known set of configurations and well-known kind of typology. In other words, there're not a million degrees of freedom on any particular side that you can, you have a well-scoped problem, and you have tons of data. So it's easier to build the models. So who, who else could do this? >> Yeah, so the difference between Unravel and other monitoring products is Unravel is not a monitoring product. It's an intelligent performance management suite. What that means is we don't just give you graphs and metrics and say, "Here are all the raw information, "you go figure it out." Instead, we have to take it a step further where we are actually giving people answers. In order to develop something like that, you need full stack information; that's number one. Meaning information from applications all the way down to infrastructure and everything in between. Why? Because problems can lie anywhere. And if you don't have that full stack info, you're blind-siding yourself, or limiting the scope of the problems that you can actually search for. Secondly is, like you were rightly pointing out, how do I create answers from all this raw data? So you have to think like how an expert with big data would think, which is if there is a problem what are the kinds of checks, balances, places that that person would look into, and how would that person establish that this is indeed the root cause of the problem today? And then, how would that person actually resolve this particular problem? So, we have a big team of scientists, researchers. In fact, my co-founder is a professor of computer science at Duke University who has been researching data-based optimization techniques for the last decade. We have about 80 plus publications in this area, Starfish being one of them. We have a bunch of other publications, which talk about how do you automate problem discovery, root cause analysis, as well as resolution, to get best performance out of these different databases? And you're right. A lot of work has gone on the research side, but a lot of work has gone in understanding the needs of the customers. So we worked with some of the biggest companies out there, which have some of the biggest big data clusters, to learn from them, what are some everyday, ongoing management challenges that you face, and then taking that problem to our datasets and figuring out, how can we automate problem discovery? How can we proactively spot a lot of these errors? I joke around and I tell people that we're big data for big data. Right? All these companies that we serve, they are gathering all of this data, and they're trying to find patterns, and they're trying to find, you know, some sort of an insight with their data. Our data is system generated data, performance data, application data, and we're doing the exact same thing, which is figuring out inefficiencies, problems, cause and effect of things, to be able to solve it in a more intelligent, smart way. >> Well, Kunal, thank you so much for stopping by theCube >> Kunal: Of course. >> And sharing how Unravel Data is helping to unravel the complexities of big data. (Kunal laughs) >> Thank you so much. Really appreciate it. >> Now you're a Cube almuni. (Kunal laughs) >> Absolutely. Thanks so much for having me. >> Kunal, thanks. >> Yeah, and we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at our own event BigData SV in downtown San Jose, California. Stick around. George and I will be right back with our next guest. (quiet crowd noise) (techno music)

Published Date : Mar 8 2018

SUMMARY :

Brought to you by SiliconANGLE Media We invite you to come by today, I love the name Unravel Data. Tell us a bit about what you guys do and not really needing a PhD to do so. So, so, um, you know, one of the things that Kunal: There's a lot of problems. there's a downside to it. tell us why you both need to know, and then how you can apply that even in an environment of the big data ecosystem, but like you pointed out, and then tell us what the administrator experiences. and this is how you can go and resolve it, and you're giving them a whole new, sort of, So that the, if it's DevOps, Now, to the other point which you were asking, to fuel a company and, you know, like you said So in the older world you had developers, DevOp, the developers themselves. and is that driving any business value? One is the well-known, but inexact number "Hey, we can help you manage this and you can apply them on-prem. that help you do that. and you want to just start running it. So, Kunal, some of the things I'm hearing It also sounds like, sort of... that Unravel can make really seem to have So that's a big, big factor of, you know, A question I'm, sort of... and you have tons of data. What that means is we don't just give you graphs to unravel the complexities of big data. Thank you so much. Now you're a Cube almuni. Thanks so much for having me. Yeah, and we want to thank you

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Kunal Agarwal	PERSON	0.99+
George	PERSON	0.99+
Kunal	PERSON	0.99+
Lisa	PERSON	0.99+
80%	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Unravel Data	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
today	DATE	0.99+
500 containers	QUANTITY	0.99+
One	QUANTITY	0.99+
Two year	QUANTITY	0.99+
two prongs	QUANTITY	0.99+
last week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
tonight	DATE	0.99+
200	QUANTITY	0.99+
first day	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Spark	TITLE	0.99+
Cloudera	ORGANIZATION	0.99+
each app	QUANTITY	0.99+
Python	TITLE	0.98+
a minute	QUANTITY	0.98+
English	OTHER	0.98+
one	QUANTITY	0.98+
Duke University	ORGANIZATION	0.98+
five	QUANTITY	0.98+
Kafka	TITLE	0.98+
Hadoop	TITLE	0.98+
BigData SV	EVENT	0.97+
first-time	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
one key factor	QUANTITY	0.96+
millions of dollars	QUANTITY	0.95+
about 80 plus publications	QUANTITY	0.95+
SQL	TITLE	0.95+
DevOps	TITLE	0.94+
first	QUANTITY	0.94+
BigDataSV	EVENT	0.94+
tons and tons	QUANTITY	0.94+
both	QUANTITY	0.94+
Unravel	ORGANIZATION	0.93+
Secondly	QUANTITY	0.91+
million degrees	QUANTITY	0.91+
San Jose, California	LOCATION	0.91+
Hive	TITLE	0.91+
last decade	DATE	0.91+
Unravel	TITLE	0.9+

Wikibon Conversation with John Furrier and George Gilbert

(upbeat electronic music) >> Hello, everyone. Welcome to the Cube Studios in Palo Alto, California. I'm John Furrier, the co-host of the Cube and co-founder of SiliconANGLE Media Inc. I'm here with George Gilbert for a Wikibon conversation on the state of the big data. George Gilbert is the analyst at Wikibon covering big data. George, great to see you. Looking good. (laughing) >> Good to see you, John. >> So George, you're obviously covering big data. Everyone knows you. You always ask the tough questions, you're always drilling down, going under the hood, and really inspecting all the trends, and also looking at the technology. What are you working on these days as the big data analyst? What's the hot thing that you're covering? >> OK, so, what's really interesting is we've got this emerging class of applications. The name that we've used so far is modern operational analytic applications. Operational in the sense that they help drive business operations, but analytical in the sense that the analytics either inform or drive transactions, or anticipate and inform interactions with people. That's the core of this class of apps. And then there are some sort of big challenges that customers are having in trying to build, and deploy, and operate these things. That's what I want to go through. >> George, you know, this is a great piece. I can't wait to (mumbling) some of these questions and ask you some pointed questions. But I would agree with you that to me, the number one thing I see customers either fumbling with or accelerating value with is how to operationalize some of the data in a way that they've never done it before. So you start to see disciplines come together. You're starting to see people with a notion of digital business being something that's not a department, it's not a marketing department. Data is everywhere, it's horizontally scalable, and the smart executives are really looking at new operational tactics to do that. With that, let me kick off the first question to you. People are trying to balance the cloud, On Premise, and The Edge, OK. And that's classic, you're seeing that now. I've got a data center, I have to go to the cloud, a hybrid cloud. And now the edge of the network. We were just taking about Block Chain today, there's this huge problem. They've got the balance that, but they've got to balance it versus leveraging specialized services. How do you respond to that? What is your reaction? What is your presentation? >> OK, so let's turn it into something really concrete that everyone can relate to, and then I'll generalize it. The concrete version is for a number of years, everyone associated Hadoop with big data. And Hadoop, you tried to stand up on a cluster on your own premises, for the most part. It was on had EMR, but sort of the big company activity outside, even including the big tech companies was stand up a Hadoop cluster as a pilot and start building a data lake. Then see what you could do with sort of huge amounts of data that you couldn't normally sort of collect and analyze. The operational challenges of standing up that sort of cluster was rather overwhelming, and I'll explain that later, so sort of park that thought. Because of that complexity, more and more customers, all but the most sophisticated, are saying we need a cloud strategy for that. But once you start taking Hadoop into the cloud, the components of this big data analytic system, you have tons more alternatives. So whereas in Cloudera's version of Hadoop you had Impala as your MPP sequel database. On Amazon, you've got Amazon Redshift, you've got Snowflake, you've got dozens up MPP sequel databases. And so the whole playing field shifts. And not only that, Amazon has instrumented their, in that particular case, their application, to be more of a more managed service, so there's a whole lot less for admins to do. And you take that on sort of, if you look at the slides, you take every step in that pipeline. And when you put it on a different cloud, it's got different competitors. And even if you take the same step in a pipeline, let's say Spark on HDFS to do your ETL, and your analysis, and your shaping of data, and even some of the machine learning, you put that on Azure and on Amazon, it's actually on different storage foundation. So even if you're using the same component, it's different. There's a lot of complexity and a lot of trade off that you got to make. >> Is that a problem for customers? >> Yes, because all of a sudden, they have to evaluate what those trade offs are. They have to evaluate the trade off between specialization. Do I use the best to breed thing on one platform. And if I do, it's not compatible with what I might be running on prem. >> That'll slow a lot of things down. I can tell you right now, people want to have the same code base on all environments, and then just have the same seamless operational role. OK, that's a great point, George. Thanks for sharing that. The second point here is harmonizing and simplifying management across hybrid clouds. Again, back to your point. You set that up beautifully. Great example, open source innovation hits a roadblock. And the roadblock is incompatible components in multiple clouds. That's a problem. It's a management nightmare. How do harmonization about hybrid cloud work? >> You couldn't have asked it better. Let me put it up in terms of an X Y chart where on the x-axis, you have the components of an analytic pipeline. Ingest, process, analyze, predict, serve. But then on the y-axis, this is for an admin, not a developer. These are just some of the tasks they have to worry about. Data governance, performance monitoring, scheduling and orchestration, availability and recovery, that whole list. Now, if you have a different product for each step in that pipeline, and each product has a different way of handling all those admin tasks, you're basically taking all the unique activities on the y-axis, multiplying it by all the unique products on the x-axis, and you have overwhelming complexity, even if these are managed services on the cloud. Here now you've got several trade offs. Do I use the specialized products that you would call best to breed? Do I try and do end to end integration so I get simplification across the pipeline? Or do I use products that I had on-prem, like you were saying, so that I have seamless compatibility? Or do I use the cloud vendors? That's a tough trade off. There's another similar one for developers. Again, on the y-axis, for all the things that a developer would have to deal with, not all of them, just a sample. The data model and the data itself, how to address it, the programing model, the persistence. So on that y-axis, you multiply all those different things you have to master for each product. And then on the x-axis, all the different products and the pipeline. And you have that same trade off, again. >> Complexity is off the charts. >> Right. And you can trade end to end integration to simplify the complexity, but we don't really have products that are fully fleshed out and mature that stretch from one end of the pipeline to the other, so that's a challenge. Alright. Let's talk about another way of looking at management. This was looking at the administrators and the developers. Now, we're getting better and better software for monitoring performance and operations, and trying to diagnose root cause when something goes wrong and then remediate it. There's two real approaches. One is you go really deep, but on a narrow part of your application and infrastructure landscape. And that narrow part might be, you know, your analytic pipeline, your big data. The broad approach is to get end to end visibility across Edge with your IOT devices, across on-prem, perhaps even across multiple clouds. That's the breadth approach, end to end visibility. Now, there's a trade off here too as in all technology choices. When you go deep, you have bounded visibility, but that bounded visibility allows you to understand exactly what is in that set of services, how they fit together, how they work. Because the vendor, knowing that they're only giving you management of your big data pipeline, they can train their models, their machine learning models, so that whenever something goes wrong, they know exactly what caused it and they can filter out all the false positives, the scattered errors that can confuse administrators. Whereas if you want breadth, you want to see end to end your entire landscape so that you can do capacity planning and see if there was an error way upstream, something might be triggered way downstream or a bunch of things downstream. So the best way to understand this is how much knowledge do you have of all the pieces work together, and how much knowledge you have of all the pieces, the software pieces fit together. >> This is actually an interesting point. So if I kind of connect the dots for you here is the bounded root cause analysis that we see a lot of machine learning, that's where the automation is. >> George: Yeah. >> The unbounded, the breadth, that's where the data volume is. But they can work together, that's what you're saying. >> Yes. And actually, I hadn't even got to that, so thanks for taking it out. >> John: Did I jump ahead on that one? (laughing) >> No, no, you teed it out. (laughing) Because ultimately-- >> Well a lot of people want to know where it's going to be automated away. All the undifferentiated labored and scale can be automated. >> Well, when you talk about them working together. So for the deep depth first, there's a small company called Unravel Data that sort of modeled eight million jobs or workloads of big data workloads from high tech companies, so they know how all that fits together and they can tell you when something goes wrong exactly what goes wrong and how to remediate it. So take something like Rocana or Splunk, they look end to end. The interesting thing that you brought up is at some point, that end to end product is going to be like a data warehouse and the depth products are going to sit on top of it. So you'll have all the contextual data of your end to end landscape, but you'll have the deep knowledge of how things work and what goes wrong sitting on it. >> So just before we jump to the machine learning question which I want to ask you, what you're saying is the industry is evolving to almost looking like a data warehouse model, but in a completely different way. >> Yeah. Think of it as, another cue. (laughing) >> John: That's what I do, George. I help you out with the cues. (laughing) No, but I mean the data warehouse, everyone knows what that was. A huge industry, created a lot of value, but then the world got rocked by unstructured data. And then their bounded, if you will, view has got democratized. So creative destruction happened which is another word for new entrants came in and incumbents got rattled. But now it's kind of going back to what looks like a data warheouse, but it's completely distributed around. >> Yes. And I was going to do one of my movie references, but-- >> No, don't do it. Save us the judge. >> If you look at this starting in the upper right, that's the data lake where you're collecting all the data and it's for search, it's exploratory. As you get more structure, you get to the descriptive place where you can build dashboards to monitor what's going on. And you get really deep, that's when you have the machine learning. >> Well, the machine learning is hitting the low hanging fruit, and that's where I want to get to next to move it along. Sourcing machine learning capability, let's discuss that. >> OK, alright. Just to set contacts before we get there, notice that when you do end to end visibility, you're really seeing across a broad landscape. And when I'm showing my public cloud big data, that would be depth first just for that component. But you would do breadth first, you could do like a Rocana or a Splunk that then sees across everything. The point I wanted to make was when you said we're reverting back to data warehouses and revisiting that dream again, the management applications started out as saying we know how to look inside machine data and tell you what's going on with your landscape. It turns out that machine data and business operations data, your application data, are really becoming one and the same. So what used to be a transaction, there was one transaction. And that, when you summarized them, that went into the data warehouse. Then we had with systems of engagement, you had about 100 interaction events that you tracked or sort of stored for everything business transaction. And then when we went out to the big data world, it's so resource intensive that we actually had 1,000 to 10,000 infrastructure events for every business transaction. So that's why the data volumes have grown so much and why we had to go back first to data lake, and then curate it to the warehouse. >> Classic innovation story, great. Machine learning. Sourcing machine learning capabilities 'cause that's where the rubber starts hitting the road. You're starting to see clear skies when it comes to where machine learning is starting fit in. Sourcing machine learning capabilities. >> You know, even though we sort of didn't really rehearse this, you're helping cue me on perfectly. Let me make the assertion that with machine learning, we have the same shortage of really trained data scientists that we had when we were trying to stand up Hadoop clusters and do big data analytics. We did not have enough administrators because these were open source components built from essentially different projects, and putting them all together required a huge amount of skills. Data science requires, really, knowledge of algorithms that even really sophisticated programmers will tell you, "Jeez, now I need a PhD "to really understand how this stuff works." So the shortage, that means we're not going to get a lot of hand-built machine learning applications for a while. >> John: In a lot of libraries out there right now, you see TensorFlow from Google. Big traction with that application. >> George: But for PhDs, for PhDs. My contention is-- >> John: Well developers too, you could argue developers, but I'm just putting it out there. >> George: I will get to that, actually. A slide just on that. Let me do this one first because my contention is the first big application, widespread application of machine learning, is going to be the depth first management because it comes with a model built in of how all the big data workloads, services, and infrastructure fit together and work together. And if you look at how the machine learning model operates, when it knows something goes wrong, let's say an analytic job takes 17 hours and then just falls over and crashes, the model can actually look at the data layout and say we have way too much on one node, and it can change the settings and change the layout or the data because it knows how all the stuff works. The point about this is the vendor. In this particular example, Unravel Data, they built into their model an understanding of how to keep a big data workload running as opposed to telling the customer, "You have to program it." So that fits into the question you were just asking which is where do you get this talent. When you were talking about like TensorFlow, and Cafe, and Torch, and MXnet, those are all like assembly language. Yes, those are the most powerful places you could go to program machine learning. But the number of people is inversely proportional to the power of those. >> John: Yeah, those are like really unique specialty people. High, you know, the top guys. >> George: Lab coats, rocket scientists. >> John: Well yeah, just high end tier one coders, tier one brains coding away, AI gurus. This is not your working developer. >> George: But if you go up two levels. So go up one level is Amazon machine learning, Spark machine learning. Go up another level, and I'm using Amazon as an example here. Amazon has a vision service called Recognition. They have a speech generation service, Natural Language. Those are developer ready. And when I say developer ready, I mean developer just uses an API, you know, passes in the data that comes out. He doesn't have to know how the model works. >> John: It's kind of like what DevOps was for cloud at the end of the day. This slide is completely accurate in my opinion. And we're at the early days and you're starting to see the platforms develop. It's the classic abstraction layer. Whoever can extract away the complexity as AI and machine learning grows is going to be the winning platform, no doubt about it. Amazon is showing some good moves there. >> George: And you know how they abstracted away. In traditional programming, it was just building higher and higher APIs, more accessible. In machine learning, you can't do that. You have to actually train the models which means you need data. So if you look at the big cloud vendors right now. So Google, Microsoft, Amazon, and IBM. Most of them, the first three, they have a lot of data from their B to C businesses. So you know, people talking to Echo, people talking to Google Assistant or Siri. That's where they get enough of their speech. >> John: So data equals power? >> George: Yes. >> By having data, you have the ingredients. And the more data that you have, the more data that you know about, the more data that has information around it, the more effective it can be to train machine learning algorithms. >> Yes. >> And the benefit comes back to the people who have the data. >> Yes. And so even though your capabilities get narrower, 'cause you could do anything on TensorFlow. >> John: Well, that's why Facebook is getting killed right now just to kind of change tangents. They have all this data and people are very unhappy, they just released that the Russians were targeting anti-semitic advertising, they enabled that. So it's hard to be a data platform and still provide user utility. This is what's going on. Whoever has the data has the power. It was a Frankenstein moment for Facebook. So there's that out there for everyone. How do companies do the right thing? >> And there's also the issue of customer intellectual property protection. As consumers, we're like you can take our voice, you can take all our speech to Siri or to Echo or whatever and get better at recognizing speech because we've given up control of that 'cause we want those services for free. >> Whoever can shift the data value to the users. >> George: To the developers. >> Or to the developers, or communities, better said, will win. >> OK. >> In my opinion, that's my opinion. >> For the most part, Amazon, Microsoft, and Google have similar data assets. For the most part, so far. IBM has something different which is they work closely with their industry customers and they build progressively. They're working with Mercedes, they're working with BMW. They'll work on the connected car, you know, the autonomous car, and they build out those models slowly. >> So George, this slide is really really interesting and I think this should be a roadmap for all customers to look at to try to peg where they are in the machine learning journey. But then the question comes in. They do the blocking and tackling, they have the foundational low level stuff done, they're building the models, they're understanding the mission, they have the right organizational mindset and personnel. Now, they want to orchestrate it and implement it into action. That's the final question. How do you orchestrate the distributed machine learning feedback and the data coherency? How do you get this thing scaling? How do these machines and the training happen so you have the breadth, and then you could bring the machine learning up the curve into the dashboard? >> OK. We've saved the best for last. It's not easy. When I show the chevrons, that's the analytic data pipeline. And imagine in the serve and predict at the very end, let's take an IOT app, a very sophisticated one. which would be an autonomous car. And it doesn't actually have to be an autonomous one, you could just be collected a lot of information off the car to do a better job insuring it, the insurance company. But the key then is you're collecting data on a fleet of cars, right? You're collecting data off each one, but you're also collecting then the fleet. And that, in the cloud, is where you keep improving your model of how the car works. You run simulations to figure out not just how to design better ones in the future, but how to tune and optimize the ones that are on the road now. That's number three. And then in four, you push that feedback back out to the cars on the road. And you have to manage, and this is tricky, you have to make sure that the models that you trained in step three are coherent, or the same, when you take out the fleet data and then you put the model for a particular instance of a car back out on the highway. >> George, this is a great example, and I think this slide really represents the modern analytical operational role in digital business. You can't look further than Tesla, this is essentially Tesla, and now all cars as a great example 'cause it's complex, it's an internet (mumbling) device, it's on the edge of the network, it's mobility, it's using 5G. It encapsulates everything that you are presenting, so I think this is example, is a great one, of the modern operational analytic applications that supports digital business. Thanks for joining this Wikibon conversaion. >> Thank you, John. >> George Gilbert, the analyst at Wikibon covering big data and the modern operational analytical system supporting digital business. It's data driven. The people with the data can train the machines that have the power. That's the mandate, that's the action item. I'm John Furrier with George Gilbert. Thanks for watching. (upbeat electronic music)

Published Date : Sep 23 2017

SUMMARY :

George Gilbert is the analyst at Wikibon covering big data. and really inspecting all the trends, that the analytics either inform or drive transactions, With that, let me kick off the first question to you. And even if you take the same step in a pipeline, they have to evaluate what those trade offs are. And the roadblock is These are just some of the tasks they have to worry about. that stretch from one end of the pipeline to the other, So if I kind of connect the dots for you here But they can work together, that's what you're saying. And actually, I hadn't even got to that, No, no, you teed it out. All the undifferentiated labored and scale can be automated. and the depth products are going to sit on top of it. to almost looking like a data warehouse model, Think of it as, another cue. And then their bounded, if you will, view And I was going to do one of my movie references, but-- No, don't do it. that's when you have the machine learning. is hitting the low hanging fruit, and tell you what's going on with your landscape. You're starting to see clear skies So the shortage, that means we're not going to get you see TensorFlow from Google. George: But for PhDs, for PhDs. John: Well developers too, you could argue developers, So that fits into the question you were just asking High, you know, the top guys. This is not your working developer. George: But if you go up two levels. at the end of the day. So if you look at the big cloud vendors right now. And the more data that you have, And the benefit comes back to the people 'cause you could do anything on TensorFlow. Whoever has the data has the power. you can take all our speech to Siri or to Echo or whatever Or to the developers, you know, the autonomous car, and then you could bring the machine learning up the curve or the same, when you take out the fleet data It encapsulates everything that you are presenting, and the modern operational analytical system

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Mercedes	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
John	PERSON	0.99+
BMW	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
1,000	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
SiliconANGLE Media Inc.	ORGANIZATION	0.99+
first	QUANTITY	0.99+
second point	QUANTITY	0.99+
17 hours	QUANTITY	0.99+
Siri	TITLE	0.99+
Wikibon	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
first question	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
eight million jobs	QUANTITY	0.99+
Echo	COMMERCIAL_ITEM	0.99+
two levels	QUANTITY	0.99+
Tesla	ORGANIZATION	0.99+
One	QUANTITY	0.99+
each product	QUANTITY	0.99+
each step	QUANTITY	0.99+
first three	QUANTITY	0.98+
Cube Studios	ORGANIZATION	0.98+
one level	QUANTITY	0.98+
one platform	QUANTITY	0.98+
Rocana	ORGANIZATION	0.98+
one transaction	QUANTITY	0.97+
about 100 interaction	QUANTITY	0.97+
dozens	QUANTITY	0.96+
four	QUANTITY	0.96+
Cube	ORGANIZATION	0.96+
one end	QUANTITY	0.96+
each one	QUANTITY	0.96+
Google Assistant	TITLE	0.96+
two real approaches	QUANTITY	0.94+
Unravel Data	ORGANIZATION	0.94+
one	QUANTITY	0.93+
today	DATE	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Unravel: