Bill Schmarzo, Hitachi Vantara | CUBE Conversation, August 2020

>> Announcer: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is a CUBE conversation. >> Hey, welcome back, you're ready. Jeff Frick here with theCUBE. We are still getting through the year of 2020. It's still the year of COVID and there's no end in sight I think until we get to a vaccine. That said, we're really excited to have one of our favorite guests. We haven't had him on for a while. I haven't talked to him for a long time. He used to I think have the record for the most CUBE appearances of probably any CUBE alumni. We're excited to have him joining us from his house in Palo Alto. Bill Schmarzo, you know him as the Dean of Big Data, he's got more titles. He's the chief innovation officer at Hitachi Vantara. He's also, we used to call him the Dean of Big Data, kind of for fun. Well, Bill goes out and writes a bunch of books. And now he teaches at the University of San Francisco, School of Management as an executive fellow. He's an honorary professor at NUI Galway. I think he's just, he likes to go that side of the pond and a many time author now, go check him out. His author profile on Amazon, the "Big Data MBA," "The Art of Thinking Like A Data Scientist" and another Big Data, kind of a workbook. Bill, great to see you. >> Thanks, Jeff, you know, I miss my time on theCUBE. These conversations have always been great. We've always kind of poked around the edges of things. A lot of our conversations have always been I thought, very leading edge and the title Dean of Big Data is courtesy of theCUBE. You guys were the first ones to give me that name out of one of the very first Strata Conferences where you dubbed me the Dean of Big Data, because I taught a class there called the Big Data MBA and look what's happened since then. >> I love it. >> It's all on you guys. >> I love it, and we've outlasted Strata, Strata doesn't exist as a conference anymore. So, you know, part of that I think is because Big Data is now everywhere, right? It's not the standalone thing. But there's a topic, and I'm holding in my hands a paper that you worked on with a colleague, Dr. Sidaoui, talking about what is the value of data? What is the economic value of data? And this is a topic that's been thrown around quite a bit. I think you list a total of 28 reference sources in this document. So it's a well researched piece of material, but it's a really challenging problem. So before we kind of get into the details, you know, from your position, having done this for a long time, and I don't know what you're doing today, you used to travel every single week to go out and visit customers and actually do implementations and really help people think these through. When you think about the value, the economic value, how did you start to kind of frame that to make sense and make it kind of a manageable problem to attack? >> So, Jeff, the research project was eyeopening for me. And one of the advantages of being a professor is, you have access to all these very smart, very motivated, very free research sources. And one of the problems that I've wrestled with as long as I've been in this industry is, how do you figure out what is data worth? And so what I did is I took these research students and I stick them on this problem. I said, "I want you to do some research. Let me understand what is the value of data?" I've seen all these different papers and analysts and consulting firms talk about it, but nobody's really got this thing clicked. And so we launched this research project at USF, professor Mouwafac Sidaoui and I together, and we were bumping along the same old path that everyone else got, which was inched on, how do we get data on our balance sheet? That was always the motivation, because as a company we're worth so much more because our data is so valuable, and how do I get it on the balance sheet? So we're headed down that path and trying to figure out how do you get it on the balance sheet? And then one of my research students, she comes up to me and she says, "Professor Schmarzo," she goes, "Data is kind of an unusual asset." I said, "Well, what do you mean?" She goes, "Well, you think about data as an asset. It never depletes, it never wears out. And the same dataset can be used across an unlimited number of use cases at a marginal cost equal to zero." And when she said that, it's like, "Holy crap." The light bulb went off. It's like, "Wait a second. I've been thinking about this entirely wrong for the last 30 some years of my life in this space. I've had the wrong frame. I keep thinking about this as an act, as an accounting conversation. An accounting determines valuation based on what somebody is willing to pay for." So if you go back to Adam Smith, 1776, "Wealth of Nations," he talks about valuation techniques. And one of the valuation techniques he talks about is valuation and exchange. That is the value of an asset is what someone's willing to pay you for it. So the value of this bottle of water is what someone's willing to pay you for it. So everybody fixates on this asset, valuation in exchange methodology. That's how you put it on balance sheet. That's how you run depreciation schedules, that dictates everything. But Adam Smith also talked about in that book, another valuation methodology, which is valuation in use, which is an economics conversation, not an accounting conversation. And when I realized that my frame was wrong, yeah, I had the right book. I had Adam Smith, I had "Wealth of Nations." I had all that good stuff, but I hadn't read the whole book. I had missed this whole concept about the economic value, where value is determined by not how much someone's willing to pay you for it, but the value you can drive by using it. So, Jeff, when that person made that comment, the entire research project, and I got to tell you, my entire life did a total 180, right? Just total of 180 degree change of how I was thinking about data as an asset. >> Right, well, Bill, it's funny though, that's kind of captured, I always think of kind of finance versus accounting, right? And then you're right on accounting. And we learn a lot of things in accounting. Basically we learn more that we don't know, but it's really hard to put it in an accounting framework, because as you said, it's not like a regular asset. You can use it a lot of times, you can use it across lots of use cases, it doesn't degradate over time. In fact, it used to be a liability. 'cause you had to buy all this hardware and software to maintain it. But if you look at the finance side, if you look at the pure play internet companies like Google, like Facebook, like Amazon, and you look at their valuation, right? We used to have this thing, we still have this thing called Goodwill, which was kind of this capture between what the market established the value of the company to be. But wasn't reflected when you summed up all the assets on the balance sheet and you had this leftover thing, you could just plug in goodwill. And I would hypothesize that for these big giant tech companies, the market has baked in the value of the data, has kind of put in that present value on that for a long period of time over multiple projects. And we see it captured probably in goodwill, versus being kind of called out as an individual balance sheet item. >> So I don't think it's, I don't know accounting. I'm not an accountant, thank God, right? And I know that goodwill is one of those things if I remember from my MBA program is something that when you buy a company and you look at the value you paid versus what it was worth, it stuck into this category called goodwill, because no one knew how to figure it out. So the company at book value was a billion dollars, but you paid five billion for it. Well, you're not an idiot, so that four billion extra you paid must be in goodwill and they'd stick it in goodwill. And I think there's actually a way that goodwill gets depreciated as well. So it could be that, but I'm totally away from the accounting framework. I think that's distracting, trying to work within the gap rules is more of an inhibitor. And we talk about the Googles of the world and the Facebooks of the world and the Netflix of the world and the Amazons and companies that are great at monetizing data. Well, they're great at monetizing it because they're not selling it, they're using it. Google is using their data to dominate search, right? Netflix is using it to be the leader in on-demand videos. And it's how they use all the data, how they use the insights about their customers, their products, and their operations to really drive new sources of value. So to me, it's this, when you start thinking about from an economics perspective, for example, why is the same car that I buy and an Uber driver buys, why is that car more valuable to an Uber driver than it is to me? Well, the bottom line is, Uber drivers are going to use that car to generate value, right? That $40,000, that car they bought is worth a lot more, because they're going to use that to generate value. For me it sits in the driveway and the birds poop on it. So, right, so it's this value in use concept. And when organizations can make that, by the way, most organizations really struggle with this. They struggle with this value in use concept. They want to, when you talk to them about data monetization and say, "Well, I'm thinking about the chief data officer, try not to trying to sell data, knocking on doors, shaking their tin cup, saying, 'Buy my data.'" No, no one wants your data. Your data is more valuable for how you use it to drive your operations then it's a sell to somebody else. >> Right, right. Well, on of the other things that's really important from an economics concept is scarcity, right? And a whole lot of economics is driven around scarcity. And how do you price for scarcity so that the market evens out and the price matches up to the supply? What's interesting about the data concept is, there is no scarcity anymore. And you know, you've outlined and everyone has giant numbers going up into the right, in terms of the quantity of the data and how much data there is and is going to be. But what you point out very eloquently in this paper is the scarcity is around the resources to actually do the work on the data to get the value out of the data. And I think there's just this interesting step function between just raw data, which has really no value in and of itself, right? Until you start to apply some concepts to it, you start to analyze it. And most importantly, that you have some context by which you're doing all this analysis to then drive that value. And I thought it was really an interesting part of this paper, which is get beyond the arguing that we're kind of discussing here and get into some specifics where you can measure value around a specific business objective. And not only that, but then now the investment of the resources on top of the data to be able to extract the value to then drive your business process for it. So it's a really different way to think about scarcity, not on the data per se, but on the ability to do something with it. >> You're spot on, Jeff, because organizations don't fail because of a lack of use cases. They fail because they have too many. So how do you prioritize? Now that scarcity is not an issue on the data side, but it is this issue on the people resources side, you don't have unlimited data scientists, right? So how do you prioritize and focus on those opportunities that are most important? I'll tell you, that's not a data science conversation, that's a business conversation, right? And figuring out how you align organizations to identify and focus on those use cases that are most important. Like in the paper we go through several different use cases using Chipotle as an example. The reason why I picked Chipotle is because, well, I like Chipotle. So I could go there and I could write it off as research. But there's a, think about the number of use cases where a company like Chipotle or any other company can leverage your data to drive their key business initiatives and their key operational use cases. It's almost unbounded, which by the way, is a huge challenge. In fact, I think part of the problem we see with a lot of organizations is because they do such a poor job of prioritizing and focusing, they try to solve the entire problem with one big fell swoop, right? It's slightly the old ERP big bang projects. Well, I'm just going to spend $20 million to buy this analytic capability from company X and I'm going to install it and then magic is going to happen. And then magic is going to happen, right? And then magic is going to happen, right? And magic never happens. We get crickets instead, because the biggest challenge isn't around how do I leverage the data, it's about where do I start? What problems do I go after? And how do I make sure the organization is bought in to basically use case by use case, build out your data and analytics architecture and capabilities. >> Yeah, and you start backwards from really specific business objectives in the use cases that you outline here, right? I want to increase my average ticket by X. I want to increase my frequency of visits by X. I want to increase the amount of items per order from X to 1.2 X, or 1.3 X. So from there you get a nice kind of big revenue hit that you can plan around and then work backwards into the amount of effort that it takes and then you can come up, "Is this a good investment or not?" So it's a really different way to get back to the value of the data. And more importantly, the analytics and the work to actually call out the information. >> The technologies, the data and analytic technologies available to us. The very composable nature of these allow us to take this use case by use case approach. I can build out my data lake one use case at a time. I don't need to stuff 25 data sources into my data lake and hope there's someone more valuable. I can use the first use case to say, "Oh, I need these three data sources to solve that use case. I'm going to put those three data sources in the data lake. I'm going to go through the entire curation process of making sure the data has been transformed and cleansed and aligned and enriched and met of, all the other governance, all that kind of stuff this goes on. But I'm going to do that use case by use case, 'cause a use case can tell me which data sources are most important for that given situation. And I can build up my data lake and I can build up my analytics then one use case at a time. And there is a huge impact then, huge impact when I build out use case by use case. That does not happen. Let me throw something that's not really covered in the paper, but it is very much covered in my new book that I'm working on, which is, in knowledge-based industries, the economies of learning are more powerful than the economies of scale. Now think about that for a second. >> Say that again, say that again. >> Yeah, the economies of learning are more powerful than the economies of scale. And what that means is what I learned on the first use case that I build out, I can apply that learning to the second use case, to the third use case, to the fourth use case. So when I put my data into my data lake for my first use case, and the paper covers this, well, once it's in my data lake, the cost of reusing that data in a second, third and fourth use cases is basically, you know marginal cost is zero. So I get this ability to learn about what data sets are most important and to reapply that across the organization. So this learning concept, I learn use case by use case, I don't have to do a big economies of scale approach and start with 25 datasets of which only three or four might be useful. But I'm incurring the overhead for all those other non-important data sets because I didn't take the time to go through and figure out what are my most important use cases and what data do I need to support those use cases. >> I mean, should people even think of the data per se or should they really readjust their thinking around the application of the data? Because the data in and of itself means nothing, right? 55, is that fast or slow? Is that old or young? Well, it depends on a whole lot of things. Am I walking or am I in a brand new Corvette? So it just, it's funny to me that the data in and of itself really doesn't have any value and doesn't really provide any direction into a decision or a higher order, predictive analytics until you start to manipulate the data. So is it even the wrong discussion? Is data the right discussion? Or should we really be talking about the capabilities to do stuff within and really get people focused on that? >> So Jeff, there's so many points to hit on there. So the application of data is what's the value, and the queue of you guys used to be famous for saying, "Separating noise from the signal." >> Signal from the noise. Signal from a noise, right. Well, how do you know in your dataset what's signal and what's noise? Well, the use case will tell you. If you don't know the use case and you have no way of figuring out what's important. One of the things I use, I still rail against, and it happens still. Somebody will walk up my data science team and say, "Here's some data, tell me what's interesting in it." Well, how do you separate signal from noise if I don't know the use case? So I think you're spot on, Jeff. The way to think about this is, don't become data-driven, become value-driven and value is driven from the use case or the application or the use of the data to solve that particular use case. So organizations that get fixated on being data-driven, I hate the term data-driven. It's like as if there's some sort of frigging magic from having data. No, data has no value. It's how you use it to derive customer product and operational insights that drive value,. >> Right, so there's an interesting step function, and we talk about it all the time. You're out in the weeds, working with Chipotle lately, and increase their average ticket by 1.2 X. We talk more here, kind of conceptually. And one of the great kind of conceptual holy grails within a data-driven economy is kind of working up this step function. And you've talked about it here. It's from descriptive, to diagnostic, to predictive. And then the Holy grail prescriptive, we're way ahead of the curve. This comes into tons of stuff around unscheduled maintenance. And you know, there's a lot of specific applications, but do you think we spend too much time kind of shooting for the fourth order of greatness impact, instead of kind of focusing on the small wins? >> Well, you certainly have to build your way there. I don't think you can get to prescriptive without doing predictive, and you can't do predictive without doing descriptive and such. But let me throw a really one at you, Jeff, I think there's even one beyond prescriptive. One we're talking more and more about, autonomous, a ton of analytics, right? And one of the things that paper talked about that didn't click with me at the time was this idea of orphaned analytics. You and I kind of talked about this before the call here. And one thing we noticed in the research was that a lot of these very mature organizations who had advanced from the retrospective analytics of BI to the descriptive, to the predicted, to the prescriptive, they were building one off analytics to solve a problem and getting value from it, but never reusing this analytics over and over again. They were done one off and then they were thrown away and these organizations were so good at data science and analytics, that it was easier for them to just build from scratch than to try to dig around and try to find something that was never actually ever built to be reused. And so I have this whole idea of orphaned analytics, right? It didn't really occur to me. It didn't make any sense into me until I read this quote from Elon Musk, and Elon Musk made this statement. He says, " I believe that when you buy a Tesla, you're buying an asset that appreciates in value, not depreciates through usage." I was thinking, "Wait a second, what does that mean?" He didn't actually say it, "Through usage." He said, "He believes you're buying an asset that appreciates not depreciates in value." And of course the first response I had was, "Oh, it's like a 1964 and a half Mustang. It's rare, so everybody is going to want these things. So buy one, stick it in your garage. And 20 years later, you're bringing it out and it's worth more money." No, no, there's 600,000 of these things roaming around the streets, they're not rare. What he meant is that he is building an autonomous asset. That the more that it's used, the more valuable it's getting, the more reliable, the more efficient, the more predictive, the more safe this asset's getting. So there is this level beyond prescriptive where we can think about, "How do we leverage artificial intelligence, reinforcement, learning, deep learning, to build these assets that the more that they are used, the smarter they get." That's beyond prescriptive. That's an environment where these things are learning. In many cases, they're learning with minimal or no human intervention. That's the real aha moment. That's what I miss with orphaned analytics and why it's important to build analytics that can be reused over and over again. Because every time you use these analytics in a different use case, they get smarter, they get more valuable, they get more predictive. To me that's the aha moment that blew my mind. I realized I had missed that in the paper entirely. And it took me basically two years later to realize, dough, I missed the most important part of the paper. >> Right, well, it's an interesting take really on why the valuation I would argue is reflected in Tesla, which is a function of the data. And there's a phenomenal video if you've never seen it, where they have autonomous vehicle day, it might be a year or so old. And he's got his number one engineer from, I think the Microprocessor Group, The Computer Vision Group, as well as the autonomous driving group. And there's a couple of really great concepts I want to follow up on what you said. One is that they have this thing called The Fleet. To your point, there's hundreds of thousands of these things, if they haven't hit a million, that are calling home reporting home every day as to exactly how everyone took the Northbound 101 on-ramp off of University Avenue. How fast did they go? What line did they take? What G-forces did they take? And every one of those cars feeds into the system, so that when they do the autonomous update, not only are they using all their regular things that they would use to map out that 101 Northbound entry, but they've got all the data from all the cars that have been doing it. And you know, when that other car, the autonomous car couple years ago hit the pedestrian, I think in Phoenix, which is not good, sad, killed a person, dark tough situation. But you know, we are doing an autonomous vehicle show and the guy who made a really interesting point, right? That when something like that happens, typically if I was in a car wreck or you're in a car wreck, hopefully not, I learned the person that we hit learns and maybe a couple of witnesses learn, maybe the inspector. >> But nobody else learns. >> But nobody else learns. But now with the autonomy, every single person can learn from every single experience with every vehicle contributing data within that fleet. To your point, it's just an order of magnitude, different way to think about things. >> Think about a 1% improvement compounded 365 times, equals I think 38 X improvement. The power of 1% improvements over these 600,000 plus cars that are learning. By the way, even when the autonomous FSD, the full self-driving mode module isn't turned on, even when it's not turned on, it runs in shadow mode. So it's learning from the human drivers, the human overlords, it's constantly learning. And by the way, not only they're collecting all this data, I did a little research, I pulled out some of their job search ads and they've built a giant simulator, right? And they're there basically every night, simulating billions and billions of more driven miles because of the simulator. They are building, he's going to have a simulator, not only for driving, but think about all the data he's capturing as these cars are riding down the road. By the way, they don't use Lidar, they use video, right? So he's driving by malls. He knows how many cars are in the mall. He's driving down roads, he knows how old the cars are and which ones should be replaced. I mean, he has this, he's sitting on this incredible wealth of data. If anybody could simulate what's going on in the world and figure out how to get out of this COVID problem, it's probably Elon Musk and the data he's captured, be courtesy of all those cars. >> Yeah, yeah, it's really interesting, and we're seeing it now. There's a new autonomous drone out, the Skydio, and they just announced their commercial product. And again, it completely changes the way you think about how you use that tool, because you've just eliminated the complexity of driving. I don't want to drive that, I want to tell it what to do. And so you're saying, this whole application of air force and companies around things like measuring piles of coal and measuring these huge assets that are volume metric measured, that these things can go and map out and farming, et cetera, et cetera. So the autonomy piece, that's really insightful. I want to shift gears a little bit, Bill, and talk about, you had some theories in here about thinking of data as an asset, data as a currency, data as monetization. I mean, how should people think of it? 'Cause I don't think currency is very good. It's really not kind of an exchange of value that we're doing this kind of classic asset. I think the data as oil is horrible, right? To your point, it doesn't get burned up once and can't be used again. It can be used over and over and over. It's basically like feedstock for all kinds of stuff, but the feedstock never goes away. So again, or is it that even the right way to think about, do we really need to shift our conversation and get past the idea of data and get much more into the idea of information and actionable information and useful information that, oh, by the way, happens to be powered by data under the covers? >> Yeah, good question, Jeff. Data is an asset in the same way that a human is an asset. But just having humans in your company doesn't drive value, it's how you use those humans. And so it's really again the application of the data around the use cases. So I still think data is an asset, but I don't want to, I'm not fixated on, put it on my balance sheet. That nice talk about put it on a balance sheet, I immediately put the blinders on. It inhibits what I can do. I want to think about this as an asset that I can use to drive value, value to my customers. So I'm trying to learn more about my customer's tendencies and propensities and interests and passions, and try to learn the same thing about my car's behaviors and tendencies and my operations have tendencies. And so I do think data is an asset, but it's a latent asset in the sense that it has potential value, but it actually has no value per se, inputting it into a balance sheet. So I think it's an asset. I worry about the accounting concept medially hijacking what we can do with it. To me the value of data becomes and how it interacts with, maybe with other assets. So maybe data itself is not so much an asset as it's fuel for driving the value of assets. So, you know, it fuels my use cases. It fuels my ability to retain and get more out of my customers. It fuels ability to predict what my products are going to break down and even have products who self-monitor, self-diagnosis and self-heal. So, data is an asset, but it's only a latent asset in the sense that it sits there and it doesn't have any value until you actually put something to it and shock it into action. >> So let's shift gears a little bit and start talking about the data and talk about the human factors. 'Cause you said, one of the challenges is people trying to bite off more than they can chew. And we have the role of chief data officer now. And to your point, maybe that mucks things up more than it helps. But in all the customer cases that you've worked on, is there a consistent kind of pattern of behavior, personality, types of projects that enables some people to grab those resources to apply to their data to have successful projects, because to your point there's too much data and there's too many projects and you talk a lot about prioritization. But there's a lot of assumptions in the prioritization model that you can, that you know a whole lot of things, especially if you're comparing project A over in group A with project B, with group B and the two may not really know the economics across that. But from an individual person who sees the potential, what advice do you give them? What kind of characteristics do you see, either in the type of the project, the type of the boss, the type of the individual that really lends itself to a higher probability of a successful outcome? >> So first off you need to find somebody who has a vision for how they want to use the data, and not just collect it. But how they're going to try to change the fortunes of the organization. So it always takes a visionary, may not be the CEO, might be somebody who's a head of marketing or the head of logistics, or it could be a CIO, it could be a chief data officer as well. But you've got to find somebody who says, "We have this latent asset we could be doing more with, and we have a series of organizational problem challenges against which I could apply this asset. And I need to be the matchmaker that brings these together." Now the tool that I think is the most powerful tool in marrying the latent capabilities of data with all the revenue generating opportunities in the application side, because there's a countless number, the most important tool that I found doing that is design thinking. Now, the reason why I think design thinking is so important, because one of the things that design thinking does a great job is it gives everybody a voice in the process of identifying, validating, valuing, and prioritizing use cases you're going to go after. Let me say that again. The challenge organizations have is identifying, validating, valuing, and prioritizing the use cases they want to go after. Design thinking is a marvelous tool for driving organizational alignment around where we're going to start and what's going to be next and why we're going to start there and how we're going to bring everybody together. Big data and data science projects don't die because of technology failure. Most of them die because of passive aggressive behaviors in the organization that you didn't bring everybody into the process. Everybody's voice didn't get a chance to be heard. And that one person who's voice didn't get a chance to get heard, they're going to get you. They may own a certain piece of data. They may own something, but they're just waiting and lay, they're just laying there waiting for their chance to come up and snag it. So what you got to do is you got to proactively bring these people together. We call this, this is part of our value engineering process. We have a value engineering process around envisioning where we bring all these people together. We help them to understand how data in itself is a latent asset, but how it can be used from an economics perspective, drive all those value. We get them all fired up on how these can solve any one of these use cases. But you got to start with one, and you've got to embrace this idea that I can build out my data and analytic capabilities, one use case at a time. And the first use case I go after and solve, makes my second one easier, makes my third one easier, right? It has this ability that when you start going use case by use case two really magical things happen. Number one, your marginal cost flatten. That is because you're building out your data lake one use case at a time, and you're bringing all the important data lake, that data lake one use case at a time. At some point in time, you've got most of the important data you need, and the ability that you don't need to add another data source. You got what you need, so your marginal costs start to flatten. And by the way, if you build your analytics as composable, reusable, continuous learning analytic assets, not as orphaned analytics, pretty soon you have all the analytics you need as well. So your marginal cost flatten, but effect number two is that you've, because you've have the data and the analytics, I can accelerate time to value, and I can de-risked projects as I go use case by use case. And so then the biggest challenge becomes not in the data and the analytics, it's getting the all the business stakeholders to agree on, here's a roadmap we're going to go after. This one's first, and this one is going first because it helps to drive the value of the second and third one. And then this one drives this, and you create a whole roadmap of rippling through of how the data and analytics are driving this value to across all these use cases at a marginal cost approaching zero. >> So should we have chief design thinking officers instead of chief data officers that really actually move the data process along? I mean, I first heard about design thinking years ago, actually interviewing Dan Gordon from Gordon Biersch, and they were, he had just hired a couple of Stanford grads, I think is where they pioneered it, and they were doing some work about introducing, I think it was a a new apple-based alcoholic beverage, apple cider, and they talked a lot about it. And it's pretty interesting, but I mean, are you seeing design thinking proliferate into the organizations that you work with? Either formally as design thinking or as some derivation of it that pulls some of those attributes that you highlighted that are so key to success? >> So I think we're seeing the birth of this new role that's marrying capabilities of design thinking with the capabilities of data and analytics. And they're calling this dude or dudette the chief innovation officer. Surprise. >> Title for someone we know. >> And I got to tell a little story. So I have a very experienced design thinker on my team. All of our data science projects have a design thinker on them. Every one of our data science projects has a design thinker, because the nature of how you build and successfully execute a data science project, models almost exactly how design thinking works. I've written several papers on it, and it's a marvelous way. Design thinking and data science are different sides of the same coin. But my respect for data science or for design thinking took a major shot in the arm, major boost when my design thinking person on my team, whose name is John Morley introduced me to a senior data scientist at Google. And I was bottom coffee. I said, "No," this is back in, before I even joined Hitachi Vantara, and I said, "So tell me the secret to Google's data science success? You guys are marvelous, you're doing things that no one else was even contemplating, and what's your key to success?" And he giggles and laughs and he goes, "Design thinking." I go, "What the hell is that? Design thinking, I've never even heard of the stupid thing before." He goes, "I'd make a deal with you, Friday afternoon let's pop over to Stanford's B school and I'll teach you about design thinking." So I went with him on a Friday to the d.school, Design School over at Stanford and I was blown away, not just in how design thinking was used to ideate and bring and to explore. But I was blown away about how powerful that concept is when you marry it with data science. What is data science in its simplest sense? Data science is about identifying the variables and metrics that might be better predictors of performance. It's that might phrase that's the real key. And who are the people who have the best insights into what values or metrics or KPIs you might want to test? It ain't the data scientists, it's the subject matter experts on the business side. And when you use design thinking to bring this subject matter experts with the data scientists together, all kinds of magic stuff happens. It's unbelievable how well it works. And all of our projects leverage design thinking. Our whole value engineering process is built around marrying design thinking with data science, around this prioritization, around these concepts of, all ideas are worthy of consideration and all voices need to be heard. And the idea how you embrace ambiguity and diversity of perspectives to drive innovation, it's marvelous. But I feel like I'm a lone voice out in the wilderness, crying out, "Yeah, Tesla gets it, Google gets it, Apple gets it, Facebook gets it." But you know, most other organizations in the world, they don't think like that. They think design thinking is this Wufoo thing. Oh yeah, you're going to bring people together and sing Kumbaya. It's like, "No, I'm not singing Kumbaya. I'm picking their brains because they're going to help make their data science team much more effective and knowing what problems we're going to go after and how I'm going to measure success and progress. >> Maybe that's the next Dean for the next 10 years, the Dean of design thinking instead of data science, and who knew they're one and the same? Well, Bill, that's a super insightful, I mean, it's so, is validated and supported by the trends that we see all over the place, just in terms of democratization, right? Democratization of the tools, more people having access to data, more opinions, more perspective, more people that have the ability to manipulate the data and basically experiment, does drive better business outcomes. And it's so consistent. >> If I could add one thing, Jeff, I think that what's really powerful about design thinking is when I think about what's happening with artificial intelligence or AI, there's all these conversations about, "Oh, AI is going to wipe out all these jobs. Is going to take all these jobs away." And what we're actually finding is that if we think about machine learning, driven by AI and human empowerment, driven by design thinking, we're seeing the opportunity to exploit these economies of learning at the front lines where every customer engagement, every operational execution is an opportunity to gather not only more data, but to gather more learnings, to empower the humans at the front lines of the organization to constantly be seeking, to try different things, to explore and to learn from each of these engagements. I think it's, AI to me is incredibly powerful. And I think about it as a source of driving more learning, a continuous learning and continuously adapting an organization where it's not just the machines that are doing this, but it's the humans who've been empowered to do that. And my chapter nine in my new book, Jeff, is all about team empowerment, because nothing you do with AI is going to matter of squat if you don't have empowered teams who know how to take and leverage that continuous learning opportunity at the front lines of customer and operational engagement. >> Bill, I couldn't set a better, I think we'll leave it there. That's a great close, when is the next book coming out? >> So today I do my second to last final review. Then it goes back to the editor and he does a review and we start looking at formatting. So I think we're probably four to six weeks out. >> Okay, well, thank you so much, congratulations on all the success. I just love how the Dean is really the Dean now, teaching all over the world, sharing the knowledge and attacking some of these big problems. And like all great economics problems, often the answer is not economics at all. It's completely really twist the lens and don't think of it in that, all that construct. >> Exactly. >> All right, Bill. Thanks again and have a great week. >> Thanks, Jeff. >> All right. He's Bill Schmarzo, I'm Jeff Frick. You're watching theCUBE. Thanks for watching, we'll see you next time. (gentle music)

Published Date : Aug 3 2020

SUMMARY :

leaders all around the world. And now he teaches at the of the very first Strata Conferences into the details, you know, and how do I get it on the balance sheet? of the data, has kind of put at the value you paid but on the ability to And how do I make sure the analytics and the work of making sure the data has the time to go through that the data in and of itself and the queue of you is driven from the use case And one of the great kind And of course the first and the guy who made a really But now with the autonomy, and the data he's captured, and get past the idea of of the data around the use cases. and the two may not really and the ability that you don't need into the organizations that you work with? the birth of this new role And the idea how you embrace ambiguity people that have the ability of the organization to is the next book coming out? Then it goes back to the I just love how the Dean Thanks again and have a great week. we'll see you next time.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Sidaoui	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
John Morley	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Amazons	ORGANIZATION	0.99+
five billion	QUANTITY	0.99+
1%	QUANTITY	0.99+
$20 million	QUANTITY	0.99+
$40,000	QUANTITY	0.99+
August 2020	DATE	0.99+
365 times	QUANTITY	0.99+
Adam Smith	PERSON	0.99+
Phoenix	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
second	QUANTITY	0.99+
NUI Galway	ORGANIZATION	0.99+
four	QUANTITY	0.99+
third	QUANTITY	0.99+
Schmarzo	PERSON	0.99+
billions	QUANTITY	0.99+
Chipotle	ORGANIZATION	0.99+
Friday afternoon	DATE	0.99+
The Art of Thinking Like A Data Scientist	TITLE	0.99+
University Avenue	LOCATION	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
one	QUANTITY	0.99+
three	QUANTITY	0.99+
28 reference sources	QUANTITY	0.99+
Elon Musk	PERSON	0.99+
Bill	PERSON	0.99+
Boston	LOCATION	0.99+
180	QUANTITY	0.99+
The Computer Vision Group	ORGANIZATION	0.99+
four billion	QUANTITY	0.99+
first use case	QUANTITY	0.99+
Dan Gordon	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
first	QUANTITY	0.99+
1776	DATE	0.99+
zero	QUANTITY	0.99+
third use case	QUANTITY	0.99+
180 degree	QUANTITY	0.99+
Elon Musk	PERSON	0.99+
38 X	QUANTITY	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
hundreds of thousands	QUANTITY	0.99+
Microprocessor Group	ORGANIZATION	0.99+
25 data sources	QUANTITY	0.99+
six weeks	QUANTITY	0.99+
USF	ORGANIZATION	0.99+
fourth use case	QUANTITY	0.99+

StrongyByScience Podcast | Bill Schmarzo Part One

produced from the cube studios this is strong by science in-depth conversations about science based training sports performance and all things health and wellness here's your host max smart [Music] [Applause] [Music] all right thank you guys tune in today I have the one and only Dean of big data the man the myth the legend bill Schwarz oh also my dad is the CTO of Hitachi van Tara and IOC in analytics he has a very interesting background because he is the well he's known as the Dean of big data but also the king of the court and all things basketball related when it comes to our household and unlike most people in the data world and I want to say most as an umbrella term but a some big bill has an illustrious sports career playing at Coe College the Harvard of the Midwest my alma mater as well but I think having that background of not just being computer science but where you have multiple disciplines involved when it comes to your jazz career you had basketball career you have obviously the career Iran now all that plays a huge role in being able to interpret and take multiple domains and put it into one so thank you for being here dad yeah thanks max that's a great introduction I rep reciate that no it's it's wonderful to have you and for our listeners who are not aware bill is referring him is Bill like my dad but I call my dad the whole time is gonna drive me crazy bill has a mind that thinks not like most so he he sees things he thinks about it not just in terms of the single I guess trajectory that could be taken but the multiple domains that can go so both vertically and horizontally and when we talk about data data is something so commonly brought up in sports so commonly drop in performance and athletic development big data is probably one of the biggest guess catchphrases or hot words or sayings that people have nowadays but doesn't always have a lot of meaning to it because a lot of times we get the word big data and then we don't have action out of big data and bill specialty is not just big data but it's giving action out of big data with that going forward I think a lot of this talk to be talking about how to utilize Big Data how do you guys data in general how to organize it how to put yourself in a situation to get actionable insights and so just to start it off Becky talked a little bit on your background some of the things you've done and how you develop the insights that you have thanks max I have kind of a very nos a deep background but I've been doing data analytics a long time and I was very fortunate one of those you know Forrest Gump moments in life where in the late 1980s I was involved in a project at Procter & Gamble I ran the project where we brought in Walmart's point of sales data for the first time into a what we would now call a data warehouse and for many of this became the launching point of the data warehouse bi marketplace and we can trace the effect the origins of many of the BI players to that project at Procter & Gamble in 87 and 88 and I spent a big chunk of my life just a big believer in business intelligence and data warehousing and trying to amass data together and trying to use that data to report on what's going on and writing insights and I did that for 20 25 years of my life until as you probably remember max I was recruited out Business Objects where I was the vice president of analytic applications I was recruited out of there by Yahoo and Yahoo had a very interesting problem which is they needed to build analytics for their advertisers to help those advertisers to optimize or spend across the Yahoo ad network and what I learned there in fact what I unlearned there was that everything that I had learned about bi and data warehouse and how you constructed data warehouses how you were so schema centric how everything was evolved around tabular data at Yahoo there was an entirely different approach the of my first introduction to Hadoop and the concept of a data Lake that was my first real introduction into data science and how to do predictive analytics and prescriptive analytics and in fact it was it was such a huge change for me that I was I was asked to come back to the TD WI data world Institute right was teaching for many years and I was asked to do a keynote after being at Yahoo for a year or so to share sort of what were the observations what did I learn and I remember I stood up there in front of about 600 people and I started my presentation by saying everything I've taught you the past 20 years is wrong and it was well I didn't get invited back for 10 years so that probably tells you something but it was really about unlearning a lot about what I had learned before and probably max one of the things that was most one of the aha moments for me was bi was very focused on understanding the questions that people were trying to ask an answer davus science is about us to understand the decisions they're trying to take action on questions by their very nature our informative but decisions are actionable and so what we did at Yahoo in order to really drive the help our advertisers optimize your spend across the Yahoo ad network is we focus on identifying the decisions the media planners and buyers and the campaign managers had to make around running a campaign know what what how much money to allocate to what sides how much how many conversions do I want how many impressions do I want so all the decisions we built predictive analytics around so that we can deliver prescriptive actions to these two classes of stakeholders the media planners and buyers and the campaign managers who had no aspirations about being analysts they're trying to be the best digital marketing executives or you know or people they could possibly be they didn't want to be analysts so and that sort of leads me to where I am today and my my teaching my books my blogs everything I do is very much around how do we take data and analytics and help organizations become more effective so everything I've done since then the books I've written the teaching I do with University of San Francisco and next week at the National University of Ireland and Galway and all the clients I work with is really how do we take data and analytics and help organizations become more effective at driving the decisions that optimize their business and their operational models it's really about decisions and how do we leverage data and analytics to drive those decisions so what would how would you define the difference between a question that someone's trying to answer versus a decision but they're trying to be better informed on so here's what I'd put it I call it the Sam test I am and that is it strategic is it actionable is it material and so you can ask questions that are provocative but you might not fast questions that are strategic to the problems you're trying to solve you may not be able to ask questions that are actionable in a sense you know what to do and you don't necessarily ask questions that are material in the sense that the value of that question is greater than the cost of answering that question right and so if I think about the Sam test when I apply it to data science and decisions when I start mining the data so I know what decisions are most important I'm going through a process to identify to validate the value and prioritize those decisions right I understand what decisions are most important now when I start to dig through the data all this structured unstructured data across a number different data sources I'm looking for I'm trying to codify patterns and relationships buried in that data and I'm applying the Sam test is that against those insights is it strategic to the problem I'm trying to solve can I actually act on it and is it material in the sense that it's it's it's more valuable to act than it is to create the action around it so that's the to me that big difference is by their very nature decisions are actually trying to make a decision I'm going to take an action questions by their nature are informative interesting they could be very provocative you know questions have an important role but ultimately questions do not necessarily lead to actions so if I'm a a sport coach I'm writing a professional basketball team some of the decisions I'm trying to make are I'm deciding on what program best develops my players what metrics will help me decide who the best prospect is is that the right way of looking at it yeah so we did an exercise at at USF too to have the students go through an exercise - what question what decisions does Steve Kerr need to make over the next two games he's playing right and we go through an exercise of the identifying especially in game decisions exercise routes oh no how often are you gonna play somebody no how long are they gonna play what are the right combinations what are the kind of offensive plays that you're gonna try to run so there's a know a bunch of decisions that Steve Kerr is coach of the Warriors for example needs to make in the game to not only try to win the game but to also minimize wear and tear on his players and by the way that's a really good point to think about the decisions good decisions are always a conflict of other ideas right win the game while minimizing wear and tear on my players right there's there are there are all the important decisions in life have two three or four different variables that may not be exactly the same which is by this is where data science comes in the data science is going to look across those three or four very other metrics against what you're going to measure success and try to figure out what's the right balance of those given the situation I'm in so if going back to the decision about about playing time well think about all the data you might want to look at in order to optimize that so when's the next game how far are they in this in this in the season where do they currently sit ranking wise how many minutes per game has player X been playing looking over the past few years what's there you know what's their maximum point so there's there's a there's not a lot of decisions that people are trying to make and by the way the beauty of the decisions is the decisions really haven't changed in years right what's changed is not the decisions it's the answers and the answers have changed because we have this great bound of data available to us in game performance health data you know all DNA data all kinds of other data and then we have all these great advanced analytic techniques now neural networks and unstructured supervised machine learning on right all this great technology now that can help us to uncover those relationships and patterns that are buried in the data that we can use to help individualize those decisions one last point there the point there to me at the end when when people talk about Big Data they get fixated on the big part the volume part it's not the volume of big data that I'm going to monetize it's the granularity and what I mean by that is I now have the ability to build very detailed profiles going back to our basketball example I can build a very detailed performance profile on every one of my players so for every one of the players on the Warriors team I can build a very detailed profile it the details out you know what's their optimal playing time you know how much time should they spend before a break on the feet on the on the on the court right what are the right combinations of players in order to generate the most offense or the best defense I can build these very detailed individual profiles and then I can start mission together to find the right combination so when we talk about big it's not the volume it's interesting it's the granularity gotcha and what's interesting from my world is so when you're dealing with marketing and business a lot of that when you're developing whether it be a company that you're trying to find more out about your customers or your startup trying to learn about what product you should develop there's tons of unknowns and a lot of big data from my understanding it can help you better understand some patterns within customers how to market you know in your book you talk about oh we need to increase sales at Chipotle because we understand X Y & Z our current around us now in the sports science world we have our friend called science and science has helped us early identify certain metrics that are very important and correlated to different physiological outcomes so it almost gives us a shortcut because in the big data world especially when you're dealing with the data that you guys are dealing with and trying to understand customer decisions each customer is individual and you're trying to compile all together to find patterns no one's doing science on that right it's not like a lab work where someone is understanding muscle protein synthesis and the amount of nutrients you need to recover from it so in my position I have all these pillars that maybe exist already where I can begin my search there's still a bunch of unknowns with that kind of environment do you take a different approach or do you still go with the I guess large encompassing and collect everything you can and siphon after maybe I'm totally wrong I'll let you take it away no that's it's a it's a good question and what's interesting about that max is that the human body is governed by a series of laws we'll say in each me see ology and the things you've talked about physics they have laws humans as buyers you know shoppers travelers we have propensity x' we don't have laws right I have a propensity that I'm gonna try to fly United because I get easier upgrades but I might fly you know Southwest because of schedule or convenience right I have propensity x' I don't have laws so you have laws that work to your advantage what's interesting about laws that they start going into the world of IOT and this concept called digital twins they're governed by laws of physics I have a compressor or a chiller or an engine and it's got a bunch of components in it that have been engineered together and I can actually apply the laws I can actually run simulations against my digital twins to understand exactly when is something likely to break what's the remaining useful life in that product what's the severity of the the maintenance I need to do on that so the human body unlike the human psyche is governed by laws human behaviors are really hard right and we move the las vegas is built on the fact that human behaviors are so flawed but body mate but bat body physics like the physics that run these devices you can actually build models and one simulation to figure out exactly how you know what's the wear and tear and what's the extensibility of what you can operate in gotcha yeah so that's when from our world you start looking at subsystems and you say okay this is your muscular system this is your autonomic nervous system this is your central nervous system these are ways that we can begin to measure it and then we can wrote a blog on this that's a stress response model where you understand these systems and their inferences for the most part and then you apply a stress and you see how the body responds and even you determine okay well if I know the body I can only respond in a certain number of ways it's either compensatory it's gonna be you know returning to baseline and by the mal adaptation but there's only so many ways when you look at a cell at the individual level that that cell can actually respond and it's the aggregation of all these cellular responses that end up and manifest in a change in a subsystem and that subsystem can be measured inferential II through certain technology that we have but I also think at the same time we make a huge leap and that leap is the word inference right we're making an assumption and sometimes those assumptions are very dangerous and they lead to because that assumptions unknown and we're wrong on it then we kind of sway and missed a little bit on our whole projection so I like the idea of looking at patterns and look at the probabilistic nature of it and I'm actually kind of recently change my view a little bit from my room first I talked about this I was much more hardwired and laws but I think it's a law but maybe a law with some level of variation or standard deviation and it we have guardrails instead so that's kind of how I think about it personally is that something that you say that's on the right track for that or how would you approach it yeah actually there's a lot of similarities max so your description of the human body made up of subsystems when we talk to organizations about things like smart cities or smart malls or smart hospitals a smart city is comprised of a it's made up of a series of subsystems right I've got subsystems regarding water and wastewater traffic safety you know local development things like this look there's a bunch of subsystems that make a city work and each of those subsystems is comprised of a series of decisions or clusters of decisions with equal use cases around what you're trying to optimize so if I'm trying to improve traffic flow if one of my subsystems is practically flow there are a bunch of use cases there about where do I do maintenance where do I expand the roads you know where do I put HOV lanes right so and so you start taking apart the smart city into the subsystems and then know the subsystems are comprised of use cases that puts you into really good position now here's something we did recently with a client who is trying to think about building the theme park of the future and how do we make certain that we really have a holistic view of the use cases that I need to go after it's really easy to identify the use cases within your own four walls but digital transformation in particular happens outside the four walls of an organization and so what we what we're doing is a process where we're building journey maps for all their key stakeholders so you've got a journey map for a customer you have a journey map for operations you have a journey map for partners and such so you you build these journey maps and you start thinking about for example I'm a theme park and at some point in time my guest / customer is going to have a pity they want to go do something you want to go on vacation at that point in time that theme park is competing against not only all the other theme parks but it's competing against major league baseball who's got things it's competing against you know going to the beach in Sanibel Island just hanging around right there they're competing at that point and if they only start engaging the customer when the customers actually contacted them they must a huge part of the market they made you miss a huge chance to influence that person's agenda and so one of the things that think about I don't know how this applies to your space max but as we started thinking about smart entities we use design thinking and customer journey match there's a way to make certain that we're not fooling ourselves by only looking within the four walls of our organization that we're knocking those walls down making them very forest and we're looking at what happens before somebody engages it with us and even afterwards so again going back to the theme park example once they leave the theme park they're probably posting on social media what kind of fun they had or fun they didn't have they're probably making plans for next year they're talking to friends and other things so there's there's a bunch of stuff we're gonna call it afterglow that happens after event that you want to make certain that you're in part of influencing that so again I don't know how when you combined the data science of use cases and decisions with design thinking of journey Maps what that might mean to do that your business but for us in thinking about smart cities it's opened up all kinds of possibilities and most importantly for our customers it's opened up all kinds of new areas where they can create new sources of value so anyone listening to this need to understand that when the word client or customer is used it can be substituted for athlete and what I think is really important is that when we hear you talk about your the the amount of infrastructure you do for an idea when you approach a situation is something that sports science for in my opinion especially across multiple domains it's truly lacking what happens is we get a piece of technology and someone says go do science while you're taking the approach of let's actually think out what we're doing beforehand let's determine our key performance indicators let's understand maybe the journey that this piece of technology is going to take with the athlete or how the athletes going to interact with this piece of technology throughout their four years if you're in the private sector right that afterglow effect might be something that you refer to as a client retention and their ability to come back over and over and spread your own word for you if you're in the sector with student athletes maybe it's those athletes talking highly about your program to help with recruiting and understanding that developing athletes is going to help you know make that college more enticing to go to or that program or that organization but what really stood out was the fact that you have this infrastructure built beforehand and the example I give I spoke with a good number of organizations and teams about data utilization is that if if you're to all of a sudden be dropped in the middle of the woods and someone says go build a cabin now how was it a giant forest I could use as much wood as I want I could just keep chopping down trees until I had something that had with a shelter of some sort right even I could probably do that well if someone said you know what you have three trees to cut down to make a cabin you could become very efficient and you're going to think about each chop in each piece of wood and how it's going to be used and your interaction with that wood and conjunction with that woods interaction with yourself and so when we start looking at athlete development and we're looking at client retention or we're looking at general health and wellness it's not just oh this is a great idea right we want to make the world's greatest theme park and we want to make the world's greatest training facility but what infrastructure and steps you need to take and you said stakeholders so what individuals am i working with am I talking with the physical therapist am i talking with the athletic trainer am I talking with the skill coach how does the skill coach want the data presented to them maybe that's different than how the athletic trainer is going to have a day to present it to them maybe the sport coach doesn't want to see the data unless something a red flag comes up so now you have all these different entities just like how you're talking about developing this customer journey throughout the theme park and making sure that they have a you know an experience that's memorable and causes an afterglow and really gives that experience meaning how can we now take data and apply it in the same way so we get the most value like you said on the granular aspect of data and really turn that into something valuable max you said something really important and one of the things that let me share one of many horror stories that that that comes up in my daily life which is somebody walking up to me and saying hey I got a client here's their data you know go do some science on it like well well what the heck right so when we created this thing called the hypothesis development canvas our sales teams hate it or do the time our data science teams love it because we do all this pre work we just say we make sure we understand the problem we're going after the decision they're trying to make the KPI is it's what you're going to measure success in progress what are they the operational and financial business benefits what are the data sources we want to consider here's something by the way that's it's important that maybe I wish Boeing would have thought more about which is what are the costs of false positives and false negatives right do you really understand where your risks points are and the reason why false positive and false negatives are really important in data science because data size is making predictions and by virtue of making predictions we are never 100% certain that's right or not predictions hath me built on I'm good enough well when is good enough good enough and a lot of that determination as to when is good enough good enough is really around the cost of false positives and false negatives think about a professional athlete like the false the you know the ramifications of overtraining professional athlete like a Kevin Durant or Steph Curry and they're out for the playoffs as huge financial implications them personally and for the organization so you really need to make sure you understand exactly what's the cost of being wrong and so this hypothesis development canvas is we do a lot of this work before we ever put science to the data that yeah it's it's something that's lacking across not just sports science but many fields and what I mean by that is especially you referred to the hypothesis canvas it's a piece of paper that provides a common language right it's you can sit it out before and for listeners who aren't aware a hypothesis canvas is something bill has worked and developed with his team and it's about 13 different squares and boxes and you can manipulate it based on your own profession and what you're diving into but essentially it goes through the infrastructure that you need to have setup in order for this hypothesis or idea or decision to actually be worth a damn and what I mean by that is that so many times and I hate this but I'm gonna go in a little bit of a rant and I apologize that people think oh I get an idea and they think Thomas Edison all son just had an idea and he made a light bulb Thomas Edison's famous for saying you know I did you know make a light bulb I learned was a 9000 ways to not make a light bulb and what I mean by that is he set an environment that allowed for failure and allowed for learning but what happens often people think oh I have an idea they think the idea comes not just you know in a flash because it always doesn't it might come from some research but they also believe that it comes with legs and it comes with the infrastructure supported around it that's kind of the same way that I see a lot of the data aspect going in regards to our field is that we did an idea we immediately implement and we hope it works as opposed to set up a learning environment that allows you to go okay here's what I think might happen here's my hypothesis here's I'm going to apply it and now if I fail because I have the infrastructure pre mapped out I can look at my infrastructure and say you know what that support beam or that individual box itself was the weak link and we made a mistake here but we can go back and fix it

Published Date : Mar 25 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Steve Kerr	PERSON	0.99+
Kevin Durant	PERSON	0.99+
Procter & Gamble	ORGANIZATION	0.99+
Steph Curry	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Sanibel Island	LOCATION	0.99+
10 years	QUANTITY	0.99+
Procter & Gamble	ORGANIZATION	0.99+
Chipotle	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
three	QUANTITY	0.99+
a year	QUANTITY	0.99+
9000 ways	QUANTITY	0.99+
Boeing	ORGANIZATION	0.99+
Hitachi van Tara	ORGANIZATION	0.99+
Bill Schmarzo	PERSON	0.99+
two	QUANTITY	0.99+
100%	QUANTITY	0.99+
four	QUANTITY	0.99+
Becky	PERSON	0.99+
Thomas Edison	PERSON	0.99+
IOC	ORGANIZATION	0.99+
each piece	QUANTITY	0.99+
Warriors	ORGANIZATION	0.99+
University of San Francisco	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
each	QUANTITY	0.99+
each chop	QUANTITY	0.99+
next year	DATE	0.98+
Thomas Edison	PERSON	0.98+
four years	QUANTITY	0.98+
first	QUANTITY	0.98+
next week	DATE	0.98+
today	DATE	0.98+
bill	PERSON	0.98+
late 1980s	DATE	0.98+
Forrest Gump	PERSON	0.98+
20 25 years	QUANTITY	0.97+
first time	QUANTITY	0.97+
two classes	QUANTITY	0.97+
Harvard	ORGANIZATION	0.97+
first introduction	QUANTITY	0.96+
four different variables	QUANTITY	0.96+
single	QUANTITY	0.94+
Coe College	ORGANIZATION	0.94+
each customer	QUANTITY	0.94+
two games	QUANTITY	0.94+
both	QUANTITY	0.94+
Dean	PERSON	0.93+
about 600 people	QUANTITY	0.93+
years	QUANTITY	0.92+
USF	ORGANIZATION	0.92+
ta world Institute	ORGANIZATION	0.92+
one	QUANTITY	0.91+
one of my subsystems	QUANTITY	0.9+
about 13 different squares	QUANTITY	0.89+
a day	QUANTITY	0.88+
Galway	LOCATION	0.86+
88	DATE	0.86+
National University of Ireland	ORGANIZATION	0.85+
StrongyByScience	TITLE	0.82+
Bill	PERSON	0.81+
Southwest	LOCATION	0.81+
TD WI	ORGANIZATION	0.81+
tons of unknowns	QUANTITY	0.81+
Sam test	TITLE	0.8+
bill Schwarz	PERSON	0.8+
lot of times	QUANTITY	0.78+
87	DATE	0.78+
three trees	QUANTITY	0.78+
boxes	QUANTITY	0.77+
many times	QUANTITY	0.74+
United	ORGANIZATION	0.72+
one last point	QUANTITY	0.7+
one of the things	QUANTITY	0.68+
past 20 years	DATE	0.67+
Part One	OTHER	0.67+
other metrics	QUANTITY	0.65+
Iran	ORGANIZATION	0.65+
four walls	QUANTITY	0.63+
past few years	DATE	0.62+
max	PERSON	0.62+

Greg Benson, SnapLogic | SnapLogic Innovation Day 2018

>> Narrator: From San Mateo, California, it's theCUBE, covering SnapLogic Innovation Day 2018. Brought to you by SnapLogic. >> Welcome back, Jeff Frick here with theCUBE. We're at the Crossroads, that's 92 and 101 in the Bay Area if you've been through it, you've had time to take a minute and look at all the buildings, 'cause traffic's usually not so great around here. But there's a lot of great software companies that come through here. It's interesting, I always think back to the Siebel Building that went up and now that's Rakuten, who we all know from the Warrior jerseys, the very popular Japanese retailer. But that's not why we're here. We're here to talk to SnapLogic. They're doing a lot of really interesting things, and they have been in data, and now they're doing a lot of interesting things in integration. And we're excited to have a many time CUBE alum. He's Greg Benson, let me get that title right, chief scientist at SnapLogic and of course a professor at University of San Francisco. Greg great to see you. >> Great to see you, Jeff. >> So I think the last time we see you was at Fleet Forward. Interesting open-source project, data, ad moves. The open-source technologies and the technologies available for you guys to use just continue to evolve at a crazy breakneck speed. >> Yeah, it is. Open source in general, as you know, has really revolutionized all of computing, starting with Linux and what that's done for the world. And, you know, in one sense it's a boon, but it introduces a challenge, because how do you choose? And then even when you do choose, do you have the expertise to harness it? You know, the early social companies really leveraged off of Hadoop and Hadoop technology to drive their business and their objectives. And now we've seen a lot of that technology be commercialized and have a lot of service around it. And SnapLogic is doing that as well. We help reduce the complexity and make a lot of this open-source technology available to our customers. >> So, I want to talk about a lot of different things. One of the things is Iris. So Iris is your guys' leverage of machine learning and artificial intelligence to help make integration easier. Did I get that right? >> That's correct, yeah. Iris is the umbrella terms for everything that we do with machine learning and how we use it to enhance the user experience. And one way to think about it is when you're interacting with our product, we've made the SnapLogic designer a web-based UI, drag-and-drop interface to construct these integration pipelines. We connect these things called Snaps. It's like building with Legos to build out these transformations on your data. And when you're doing that, when you're interacting with the designer, we would like to believe that we've made it one of the simplest interfaces to do this type of work, but even with that, there are many times we have to make decisions, like what type of transformation do you do next? How do you configure that transformation if you're talking to an Oracle database? How do you configure it? What's your credentials if you talk to SalesForce? If I'm doing a transformation on data, which fields do I need? What kind of operations do I need to apply to those fields? So as you can imagine, there's lots of situations as you're building out these data integration pipelines to make decisions. And one way to think about Iris is Iris is there to help reduce the complexity, help reduce what kind of decision you have to make at any point in time. So it's contextually aware of what you're doing at that moment in time, based on mining our thousands of existing pipelines and scenarios in which SnapLogic has been used. We leverage that to train models to help make recommendations so that you can speed through whatever task you're trying to do as quickly as possible. >> It's such an important piece of information, because if I'm doing an integration project using the tool, I don't have the experience of the vast thousands and thousands, and actually you're doing now, what, a trillion document moves last month? I just don't have that expertise. You guys have the expertise, and truth be told, as unique as I think I am, and as unique as I think my business processes are, probably, a lot of them are pretty much the same as a lot of other people that are hooking up to SalesForce to Oracle or hooking up Marketta to their CRM. So you guys have really taken advantage of that using the AI and ML to help guide me along, which is probably a pretty high-probability prediction of what my next move's going to be. >> Yeah, absolutely, and you know, back in the day, we used to consider, like, wizards or these sorts of things that would walk you through it. And really that was, it seemed intelligent, but it wasn't really intelligence or machine learning. It was really just hard-coded facts or heuristics that hopefully would be right for certain situations. The difference today is we're using real data, gigabytes of metadata that we can use to train our models. The nice thing about that it's not hard-coded it's adaptive. It's adaptive both for new customers but also for existing customers. We have customers that have hundreds of people that just use SnapLogic to get their business objectives done. And as they're building new pipelines, as they are putting in new expressions, we are learning that for them within their organization. So like their coworkers, the next day, they can come in and then they get the advantages of all the intellectual work that was done to figure something out will be learned and then will be made available through Iris. >> Right. I love this idea of operationalizing machine learning and the augmented intelligence. So how do you apply it? Don't just talk about it, don't give it a name of some dead smart person, but actually apply it to an application where you can start to see the benefit. And that's really what Iris is all about. So what's changed the most in the last year since you launched it? >> You know, one thing I'll say: The most interesting thing that we discovered when we first launched Iris, and I should say one of the first Iris technologies that we introduced was something called the integration assistant. And this was an assistant that would make, make recommendations of the next Snap as you're building out your pipeline, so the next transformation or the next connector, and before we launched it, we did lots of experimentation with different machine learning models. We did different training to get the best accuracy possible. And what we really thought was that this was going to be most useful for the new user, somebody who hasn't really used the product and it turns out, when we looked at our data, and we looked at how it got used, it turns out that yes, new users did use it, but existing or very skilled users were using it just as much if not more, 'cause it turned out that it was so good at making recommendations that it was like a shortcut. Like, even if they knew the product really well, it's still actually a little more work to go through our catalog of 400 plus Snaps and pick something out when if it's just sitting right there and saying, "Hey, the next thing you need to do," you don't even have to think. You just have to click, and it's right there. Then it just speeds up the expert user as well. That was an interesting sort of revelation about machine learning and our application of it. In terms of what's changed over the last year, we've done a number of things. Probably the operationalizing it so that instead of training off of SnapShot, we're now training on a continuous basis so that we get that adaptive learning that I was talking about earlier. The other thing that we have done, and this is kind of getting into the weeds, we were using a decision tree model, which is a type of machine learning algorithm, and we switched to neural nets now, so now we use neural nets to achieve higher accuracy, and also a more adaptive learning experience. The neural net allowed us to bring in sort of like this organizational information so that your recommendations would be more tailored to your specific organization. The other thing we're just on the cusp of releasing is, in the integration assistant, we're working on sort of a, sort of, from beginning-to-end type recommendation, where you were kind of working forward. But what we found is, in talking to people in the field, and our customers who use the product, is there's all kinds of different ways that people interact with a product. They might know know where they want the data to go, and then they might want to work backwards. Or they might know that the most important thing I need this to do is to join some data. So like when you're solving a puzzle with the family, you either work on the edges or you put some clumps in the middle and work to get to. And that puzzle solving metaphor is where we're moving integration assistance so that you can fill in the pieces that you know, and then we help you work in any direction to make the puzzle complete. That's something that we've been adding to. We recently started recommending, based on your context, the most common sources and destinations you might need, but we're also about to introduce this idea of working backwards and then also working from the inside out. >> We just had Gaurav on, and he's talking about the next iteration of the vision is to get to autonomous, to get to where the thing not only can guess what you want to do, has a pretty good idea, but it actually starts to basically do it for you, and I guess it would flag you if there's some strange thing or it needs an assistant, and really almost full autonomy in this integration effort. It's a good vision. >> I'm the one who has to make that vision a reality. The way I like to explain is that customers or users have a concept of what they want to achieve. And that concept is as a thought in their head, and the goal is how to get that concept or thought into something that is machine executable. What's the pathway to achieve that? Or if somebody's using SnapLogic for a lot of their organizational operations or for their data integration, we can start looking at what you're doing and make recommendations about other things you should or might be doing. So it's kind of like this two-way thing where we can give you some suggestions but people also know what they want to do conceptually but how do we make that realizable as something that's executable. So I'm working on a number of research projects that is getting us closer to that vision. And one that I've been very excited about is we're working a lot with NLP, Natural Language Processing, like many companies and other products are investigating. For our use in particular is in a couple of different ways. To be sort of concrete, we've been working on a research project in which, rather than, you know, having to know the name of a Snap. 'Cause right now, you get this thing called a Snap catalog, and like I said, 400 plus Snaps. To go through the whole list, it's pretty long. You can start to type a name, and yeah, it'll limit it, but you still have to know exactly what that Snap is called. What we're doing is we're applying machine learning in order to allow you to either speak or type what the intention is of what you're looking for. I want to parse a CSV file. Now, we have a file reader, and we have a CSV parser, but if you just typed, parse a CSV file, it may not find what you're looking for. But we're trying to take the human description and then connect that with the actual Snaps that you might need to complete your task. That's one thing we're working on. I have two more. The second one is a little bit more ambitious, but we have some preliminary work that demonstrates this idea of actually saying or typing what you want an entire pipeline to do. I might say I want to read data from SalesForce, I want to filter out only records from the last week, and then I want to put those records into Redshift. And if you were to just say or type what I just said, we would give you a pipeline that maybe isn't entirely complete, but working and allows you to evolve it from there. So you didn't have to go through all the steps of finding each individual Snap and connecting them together. So this is still very early on, but we have some exciting results. And then the last thing we're working on with NLP is, in SnapLogic, we have a nice view eye, and it's really good. A lot of the heavy lifting in building these pipelines, though, is in the actual manipulation of the data. And to actually manipulate the data, you need to construct expressions. And expressions in SnapLogic, we have a JavaScript expression language, so you have to write these expressions to do operations, right. One of our next goals is to use natural language to help you describe what you want those expressions to do and then generate those expressions for you. To get at that vision, we have to chisel. We have to break down the barriers on each one of these and then collectively, this will get us closer to that vision of truly autonomous integration. >> What's so cool about it, and again, you say autonomous and I can't help but think autonomous vehicles. We had a great interview, he said, if you have an accident in your car, you learn, the person you had an accident learns a little bit, and maybe the insurance adjuster learns a little bit. But when you have an accident in an autonomous vehicle, everybody learns, the whole system learns. That learning is shared orders of magnitude greater, to greater benefit of the whole. And that's really where you guys are sitting in this cloud situation. You've got all this integration going on with customers, you have all this translation and movement of data. Everybody benefits from the learning that's gained by everybody's participation. That's what is so exciting, and why it's such a great accelerator to how things used to be done before by yourself, in your little company, coding away trying to solve your problems. Very very different kind of paradigm, to leverage all that information of actual use cases, what's actually happening with the platform. So it puts you guys in a pretty good situation. >> I completely agree. Another analogy is, look, we're not going to get rid of programmers anytime soon. However, programming's a complex, human endeavor. However, the Snap pipelines are kind of like programs, and what we're doing in our domain, our space, is trying to achieve automated programming so that, you're right, as you said, learning from the experience of others, learning from the crowd, learning from mistakes and capturing that knowledge in a way that when somebody is presented with a new task, we can either make it very quick for them to achieve that or actually provide them with exactly what they need. So yeah, it's very exciting. >> So we're running out of time. Before I let you go, I wanted to tie it back to your professor job. How do you leverage that? How does that benefit what's going on here at SnapLogic? 'Cause you've obviously been doing that for a long time, it's important to you. Bill Schmarzo, great fan of theCUBE, I deemed him the dean of big data a couple of years ago, he's now starting to teach. So there's a lot of benefits to being involved in academe, so what are you doing there in academe, and how does it tie back to what you're doing here in SnapLogic? >> So yeah, I've been a professor for 20 years at the University of San Francisco. I've long done research in operating systems and distributed systems, parallel computing programming languages, and I had the opportunity to start working with SnapLogic in 2010. And it was this great experience of, okay, I've done all this academic research, I've built systems, I've written research papers, and SnapLogic provided me with an opportunity to actually put a lot of this stuff in practice and work with real-world data. I think a lot of people on both sides of the industry academia fence will tell you that a lot of the real interesting stuff in computer science happens in industry because a lot of what we do with computer science is practical. And so I started off bringing in my expertise in working on innovation and doing research projects, which I continue to do today. And at USF, we happened to have a vehicle already set up. All of our students, both undergraduates and graduates, have to do a capstone senior project or master's project in which we pair up the students with industry sponsors to work on a project. And this is a time in their careers where they don't have a lot of professional experience, but they have a lot of knowledge. And so we bring the students in, and we carve out a project idea. And the students under my mentorship and working with the engineering team work toward whatever project we set up. Those projects have resulted in numerous innovations now that are in the product. The most recent big one is Iris came out of one of these research projects. >> Oh, it did? >> It was a machine learning project about, started around three years ago. We continuously have lots of other projects in the works. On the flip side, my experience with SnapLogic has allowed me to bring sort of this industry experience back to the classroom, both in terms of explaining to students and understanding what their expectations will be when they get out into industry, but also being able to make the examples more real and relevant in the classroom. For me, it's been a great relationship that's benefited both those roles. >> Well, it's such a big and important driver to what goes on in the Bay Area. USF doesn't get enough credit. Clearly Stanford and Cal get a lot, they bring in a lot of smart people every year. They don't leave, they love the weather. It is really a significant driver. Not to mention all the innovation that happens and cool startups that come out. Well, Greg thanks for taking a few minutes out of your busy day to sit down with us. >> Thank you, Jeff. >> All right, he's Greg, I'm Jeff. You're watching theCUBE from SnapLogic in San Mateo, California. Thanks for watching.

Published Date : May 21 2018

SUMMARY :

Brought to you by SnapLogic. and look at all the buildings, So I think the last time we see you was at Fleet Forward. And then even when you do choose, and artificial intelligence to help make integration easier. to help make recommendations so that you can So you guys have really taken advantage of that Yeah, absolutely, and you know, and the augmented intelligence. "Hey, the next thing you need to do," and I guess it would flag you if there's some strange thing and the goal is how to get that concept or thought the person you had an accident learns a little bit, and what we're doing in our domain, our space, and how does it tie back to of the industry academia fence will tell you that We continuously have lots of other projects in the works. and cool startups that come out. SnapLogic in San Mateo, California.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Greg Benson	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Greg	PERSON	0.99+
2010	DATE	0.99+
Stanford	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
SnapLogic	ORGANIZATION	0.99+
USF	ORGANIZATION	0.99+
San Mateo, California	LOCATION	0.99+
Cal	ORGANIZATION	0.99+
Bay Area	LOCATION	0.99+
One	QUANTITY	0.99+
last week	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
both sides	QUANTITY	0.99+
Legos	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Rakuten	ORGANIZATION	0.99+
thousands	QUANTITY	0.98+
two-way	QUANTITY	0.98+
101	LOCATION	0.98+
last month	DATE	0.98+
400 plus Snaps	QUANTITY	0.98+
Linux	TITLE	0.98+
Iris	TITLE	0.98+
SnapLogic Innovation Day 2018	EVENT	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
second one	QUANTITY	0.97+
University of San Francisco	ORGANIZATION	0.97+
SnapLogic	TITLE	0.97+
today	DATE	0.96+
NLP	ORGANIZATION	0.95+
Siebel Building	LOCATION	0.95+
SnapShot	TITLE	0.95+
Gaurav	PERSON	0.95+
hundreds of people	QUANTITY	0.95+
Fleet Forward	ORGANIZATION	0.94+
92	LOCATION	0.93+
JavaScript	TITLE	0.93+
next day	DATE	0.92+
couple of years ago	DATE	0.91+
one way	QUANTITY	0.9+
Warrior	ORGANIZATION	0.9+
each one	QUANTITY	0.87+
one thing	QUANTITY	0.86+
SalesForce	ORGANIZATION	0.86+
Marketta	ORGANIZATION	0.85+
each individual	QUANTITY	0.84+
Iris	PERSON	0.84+
Iris	ORGANIZATION	0.84+
Natural Language Processing	ORGANIZATION	0.83+
around three years ago	DATE	0.81+
Snap	ORGANIZATION	0.79+

Greg Benson, SnapLogic | SnapLogic Innovation Day 2018

>> Narrator: From San Mateo, California, it's theCUBE, covering SnapLogic Innovation Day 2018. Brought to you by SnapLogic. >> Welcome back, Jeff Frick here with theCUBE. We're at the Crossroads, that's 92 and 101 in the Bay Area if you've been through it, you've had time to take a minute and look at all the buildings, 'cause traffic's usually not so great around here. But there's a lot of great software companies that come through here. It's interesting, I always think back to the Siebel Building that went up and now that's Rakuten, who we all know from the Warrior jerseys, the very popular Japanese retailer. But that's not why we're here. We're here to talk to SnapLogic. They're doing a lot of really interesting things, and they have been in data, and now they're doing a lot of interesting things in integration. And we're excited to have a many time Cube alum. He's Greg Benson, let me get that title right, chief scientist at SnapLogic and of course a professor at University of San Francisco. Greg great to see you. >> Great to see you, Jeff. >> So I think the last time we see you was at Fleet Forward. Interesting open-source project, data, ad moves. The open-source technologies and the technologies available for you guys to use just continue to evolve at a crazy breakneck speed. >> Yeah, it is. Open source in general, as you know, has really revolutionized all of computing, starting with Linux and what that's done for the world. And, you know, in one sense it's a boon, but it introduces a challenge, because how do you choose? And then even when you do choose, do you have the expertise to harness it? You know, the early social companies really leveraged off of Hadoop and Hadoop technology to drive their business and their objectives. And now we've seen a lot of that technology be commercialized and have a lot of service around it. And SnapLogic is doing that as well. We help reduce the complexity and make a lot of this open-source technology available to our customers. >> So, I want to talk about a lot of different things. One of the things is Iris. So Iris is your guys' leverage of machine learning and artificial intelligence to help make integration easier. Did I get that right? >> That's correct, yeah. Iris is the umbrella terms for everything that we do with machine learning and how we use it to enhance the user experience. And one way to think about it is when you're interacting with our product, we've made the SnapLogic designer a web-based UI, drag-and-drop interface to construct these integration pipelines. We connect these things called Snaps. It's like building with Legos to build out these transformations on your data. And when you're doing that, when you're interacting with the designer, we would like to believe that we've made it one of the simplest interfaces to do this type of work, but even with that, there are many times we have to make decisions, like what type of transformation do you do next? How do you configure that transformation if you're talking to an Oracle database? How do you configure it? What's your credentials if you talk to SalesForce? If I'm doing a transformation on data, which fields do I need? What kind of operations do I need to apply to those fields? So as you can imagine, there's lots of situations as you're building out these data integration pipelines to make decisions. And one way to think about Iris is Iris is there to help reduce the complexity, help reduce what kind of decision you have to make at any point in time. So it's contextually aware of what you're doing at that moment in time, based on mining our thousands of existing pipelines and scenarios in which SnapLogic has been used. We leverage that to train models to help make recommendations so that you can speed through whatever task you're trying to do as quickly as possible. >> It's such an important piece of information, because if I'm doing an integration project using the tool, I don't have the experience of the vast thousands and thousands, and actually you're doing now, what, a trillion document moves last month? I just don't have that expertise. You guys have the expertise, and truth be told, as unique as I think I am, and as unique as I think my business processes are, probably, a lot of them are pretty much the same as a lot of other people that are hooking up to SalesForce to Oracle or hooking up Marketta to their CRM. So you guys have really taken advantage of that using the AI and ML to help guide me along, which is probably a pretty high-probability prediction of what my next move's going to be. >> Yeah, absolutely, and you know, back in the day, we used to consider, like, wizards or these sorts of things that would walk you through it. And really that was, it seemed intelligent, but it wasn't really intelligence or machine learning. It was really just hard-coded facts or heuristics that hopefully would be right for certain situations. The difference today is we're using real data, gigabytes of metadata that we can use to train our models. The nice thing about that it's not hard-coded it's adaptive. It's adaptive both for new customers but also for existing customers. We have customers that have hundreds of people that just use SnapLogic to get their business objectives done. And as they're building new pipelines, as they are putting in new expressions, we are learning that for them within their organization. So like their coworkers, the next day, they can come in and then they get the advantages of all the intellectual work that was done to figure something out will be learned and then will be made available through Iris. >> Right. I love this idea of operationalizing machine learning and the augmented intelligence. So how do you apply it? Don't just talk about it, don't give it a name of some dead smart person, but actually apply it to an application where you can start to see the benefit. And that's really what Iris is all about. So what's changed the most in the last year since you launched it? >> You know, one thing I'll say: The most interesting thing that we discovered when we first launched Iris, and I should say one of the first Iris technologies that we introduced was something called the integration assistant. And this was an assistant that would make, make recommendations of the next Snap as you're building out your pipeline, so the next transformation or the next connector, and before we launched it, we did lots of experimentation with different machine learning models. We did different training to get the best accuracy possible. And what we really thought was that this was going to be most useful for the new user, somebody who hasn't really used the product and it turns out, when we looked at our data, and we looked at how it got used, it turns out that yes, new users did use it, but existing or very skilled users were using it just as much if not more, 'cause it turned out that it was so good at making recommendations that it was like a shortcut. Like, even if they knew the product really well, it's still actually a little more work to go through our catalog of 400 plus Snaps and pick something out when if it's just sitting right there and saying, "Hey, the next thing you need to do," you don't even have to think. You just have to click, and it's right there. Then it just speeds up the expert user as well. That was an interesting sort of revelation about machine learning and our application of it. In terms of what's changed over the last year, we've done a number of things. Probably the operationalizing it so that instead of training off of SnapShot, we're now training on a continuous basis so that we get that adaptive learning that I was talking about earlier. The other thing that we have done, and this is kind of getting into the weeds, we were using a decision tree model, which is a type of machine learning algorithm, and we switched to neural nets now, so now we use neural nets to achieve higher accuracy, and also a more adaptive learning experience. The neural net allowed us to bring in sort of like this organizational information so that your recommendations would be more tailored to your specific organization. The other thing we're just on the cusp of releasing is, in the integration assistant, we're working on sort of a, sort of, from beginning-to-end type recommendation, where you were kind of working forward. But what we found is, in talking to people in the field, and our customers who use the product, is there's all kinds of different ways that people interact with a product. They might know know where they want the data to go, and then they might want to work backwards. Or they might know that the most important thing I need this to do is to join some data. So like when you're solving a puzzle with the family, you either work on the edges or you put some clumps in the middle and work to get to. And that puzzle solving metaphor is where we're moving integration assistance so that you can fill in the pieces that you know, and then we help you work in any direction to make the puzzle complete. That's something that we've been adding to. We recently started recommending, based on your context, the most common sources and destinations you might need, but we're also about to introduce this idea of working backwards and then also working from the inside out. >> We just had Gaurav on, and he's talking about the next iteration of the vision is to get to autonomous, to get to where the thing not only can guess what you want to do, has a pretty good idea, but it actually starts to basically do it for you, and I guess it would flag you if there's some strange thing or it needs an assistant, and really almost full autonomy in this integration effort. It's a good vision. >> I'm the one who has to make that vision a reality. The way I like to explain is that customers or users have a concept of what they want to achieve. And that concept is as a thought in their head, and the goal is how to get that concept or thought into something that is machine executable. What's the pathway to achieve that? Or if somebody's using SnapLogic for a lot of their organizational operations or for their data integration, we can start looking at what you're doing and make recommendations about other things you should or might be doing. So it's kind of like this two-way thing where we can give you some suggestions but people also know what they want to do conceptually but how do we make that realizable as something that's executable. So I'm working on a number of research projects that is getting us closer to that vision. And one that I've been very excited about is we're working a lot with NLP, Natural Language Processing, like many companies and other products are investigating. For our use in particular is in a couple of different ways. To be sort of concrete, we've been working on a research project in which, rather than, you know, having to know the name of a Snap. 'Cause right now, you get this thing called a Snap catalog, and like I said, 400 plus Snaps. To go through the whole list, it's pretty long. You can start to type a name, and yeah, it'll limit it, but you still have to know exactly what that Snap is called. What we're doing is we're applying machine learning in order to allow you to either speak or type what the intention is of what you're looking for. I want to parse a CSV file. Now, we have a file reader, and we have a CSV parser, but if you just typed, parse a CSV file, it may not find what you're looking for. But we're trying to take the human description and then connect that with the actual Snaps that you might need to complete your task. That's one thing we're working on. I have two more. The second one is a little bit more ambitious, but we have some preliminary work that demonstrates this idea of actually saying or typing what you want an entire pipeline to do. I might say I want to read data from SalesForce, I want to filter out only records from the last week, and then I want to put those records into Redshift. And if you were to just say or type what I just said, we would give you a pipeline that maybe isn't entirely complete, but working and allows you to evolve it from there. So you didn't have to go through all the steps of finding each individual Snap and connecting them together. So this is still very early on, but we have some exciting results. And then the last thing we're working on with NLP is, in SnapLogic, we have a nice view eye, and it's really good. A lot of the heavy lifting in building these pipelines, though, is in the actual manipulation of the data. And to actually manipulate the data, you need to construct expressions. And expressions in SnapLogic, we have a JavaScript expression language, so you have to write these expressions to do operations, right. One of our next goals is to use natural language to help you describe what you want those expressions to do and then generate those expressions for you. To get at that vision, we have to chisel. We have to break down the barriers on each one of these and then collectively, this will get us closer to that vision of truly autonomous integration. >> What's so cool about it, and again, you say autonomous and I can't help but think autonomous vehicles. We had a great interview, he said, if you have an accident in your car, you learn, the person you had an accident learns a little bit, and maybe the insurance adjuster learns a little bit. But when you have an accident in an autonomous vehicle, everybody learns, the whole system learns. That learning is shared orders of magnitude greater, to greater benefit of the whole. And that's really where you guys are sitting in this cloud situation. You've got all this integration going on with customers, you have all this translation and movement of data. Everybody benefits from the learning that's gained by everybody's participation. That's what is so exciting, and why it's such a great accelerator to how things used to be done before by yourself, in your little company, coding away trying to solve your problems. Very very different kind of paradigm, to leverage all that information of actual use cases, what's actually happening with the platform. So it puts you guys in a pretty good situation. >> I completely agree. Another analogy is, look, we're not going to get rid of programmers anytime soon. However, programming's a complex, human endeavor. However, the Snap pipelines are kind of like programs, and what we're doing in our domain, our space, is trying to achieve automated programming so that, you're right, as you said, learning from the experience of others, learning from the crowd, learning from mistakes and capturing that knowledge in a way that when somebody is presented with a new task, we can either make it very quick for them to achieve that or actually provide them with exactly what they need. So yeah, it's very exciting. >> So we're running out of time. Before I let you go, I wanted to tie it back to your professor job. How do you leverage that? How does that benefit what's going on here at SnapLogic? 'Cause you've obviously been doing that for a long time, it's important to you. Bill Schmarzo, great fan of theCUBE, I deemed him the dean of big data a couple of years ago, he's now starting to teach. So there's a lot of benefits to being involved in academe, so what are you doing there in academe, and how does it tie back to what you're doing here in SnapLogic? >> So yeah, I've been a professor for 20 years at the University of San Francisco. I've long done research in operating systems and distributed systems, parallel computing programming languages, and I had the opportunity to start working with SnapLogic in 2010. And it was this great experience of, okay, I've done all this academic research, I've built systems, I've written research papers, and SnapLogic provided me with an opportunity to actually put a lot of this stuff in practice and work with real-world data. I think a lot of people on both sides of the industry academia fence will tell you that a lot of the real interesting stuff in computer science happens in industry because a lot of what we do with computer science is practical. And so I started off bringing in my expertise in working on innovation and doing research projects, which I continue to do today. And at USF, we happened to have a vehicle already set up. All of our students, both undergraduates and graduates, have to do a capstone senior project or master's project in which we pair up the students with industry sponsors to work on a project. And this is a time in their careers where they don't have a lot of professional experience, but they have a lot of knowledge. And so we bring the students in, and we carve out a project idea. And the students under my mentorship and working with the engineering team work toward whatever project we set up. Those projects have resulted in numerous innovations now that are in the product. The most recent big one is Iris came out of one of these research projects. >> Oh, it did? >> It was a machine learning project about, started around three years ago. We continuously have lots of other projects in the works. On the flip side, my experience with SnapLogic has allowed me to bring sort of this industry experience back to the classroom, both in terms of explaining to students and understanding what their expectations will be when they get out into industry, but also being able to make the examples more real and relevant in the classroom. For me, it's been a great relationship that's benefited both those roles. >> Well, it's such a big and important driver to what goes on in the Bay Area. USF doesn't get enough credit. Clearly Stanford and Cal get a lot, they bring in a lot of smart people every year. They don't leave, they love the weather. It is really a significant driver. Not to mention all the innovation that happens and cool startups that come out. Well, Greg thanks for taking a few minutes out of your busy day to sit down with us. >> Thank you, Jeff. >> All right, he's Greg, I'm Jeff. You're watching theCUBE from SnapLogic in San Mateo, California. Thanks for watching.

Published Date : May 18 2018

SUMMARY :

Brought to you by SnapLogic. and look at all the buildings, and the technologies available and make a lot of this and artificial intelligence to one of the simplest interfaces to do of the vast thousands and thousands, back in the day, we used and the augmented intelligence. "Hey, the next thing you need to do," and I guess it would flag you and the goal is how to get the person you had an learning from the experience of others, and how does it tie back to a lot of the real interesting to students and understanding what and cool startups that come out. SnapLogic in San Mateo, California.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Greg Benson	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Greg	PERSON	0.99+
2010	DATE	0.99+
Stanford	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
SnapLogic	ORGANIZATION	0.99+
USF	ORGANIZATION	0.99+
San Mateo, California	LOCATION	0.99+
Cal	ORGANIZATION	0.99+
Bay Area	LOCATION	0.99+
One	QUANTITY	0.99+
last week	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
both sides	QUANTITY	0.99+
Legos	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Rakuten	ORGANIZATION	0.99+
thousands	QUANTITY	0.98+
two-way	QUANTITY	0.98+
last month	DATE	0.98+
400 plus Snaps	QUANTITY	0.98+
101	LOCATION	0.98+
Iris	TITLE	0.98+
Linux	TITLE	0.97+
SnapLogic Innovation Day 2018	EVENT	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
second one	QUANTITY	0.97+
University of San Francisco	ORGANIZATION	0.97+
SnapLogic	TITLE	0.97+
Gaurav	PERSON	0.96+
Siebel Building	LOCATION	0.96+
today	DATE	0.96+
NLP	ORGANIZATION	0.95+
SnapShot	TITLE	0.95+
hundreds of people	QUANTITY	0.95+
Fleet Forward	ORGANIZATION	0.95+
University of San Francisco	ORGANIZATION	0.94+
92	LOCATION	0.93+
JavaScript	TITLE	0.93+
next day	DATE	0.92+
couple of years ago	DATE	0.91+
one way	QUANTITY	0.9+
Warrior	ORGANIZATION	0.88+
each one	QUANTITY	0.87+
one thing	QUANTITY	0.86+
SalesForce	ORGANIZATION	0.86+
Marketta	ORGANIZATION	0.85+
each individual	QUANTITY	0.84+
Iris	PERSON	0.84+
Iris	ORGANIZATION	0.84+
Natural Language Processing	ORGANIZATION	0.83+
around three years ago	DATE	0.81+
Snap	ORGANIZATION	0.79+

Greg Benson, SnapLogic - AWS Summit SF 2017 - #AWSSummit - #theCUBE

>> Voiceover: Live from San Francisco it's theCUBE. Covering AWS Summit 2017. Brought to you by Amazon Web Services. (upbeat music) >> Hey welcome back to theCUBE live at the Moscone Center at the Amazon Web Services Summit San Francisco. Very excited to be here, my co-host Jeff Rick. We're now talking to the Chief Scientist and professor at University of San Francisco, Greg Benson of SnapLogic. Greg, welcome to theCUBE, this is your first time here we're excited to have you. >> Thanks for having me. >> Lisa: So talk to us about what SnapLogic is, what do you do, and what did announce recently, today, with Amazon Web Services? >> Greg: Sure, so SnapLogic is a data integration company. We deliver a cloud-native product that allows companies to easily connect their different data sources and cloud applications to enrich their business processes and really make some of their business processes a lot easier. We have a very easy-to-use what we call self-service interface. So previously a lot of the things that people would have to do is hire programmers and do lots of manual programming to achieve some of the same things that they can do with our product. And we have a nice drag-and-drop. We call it digital programming interface to achieve this. And along those lines, I've been working for the last two years on ways to make that experience even easier than it already is. And because we're Cloud-based, because we have access to all of the types of problems that our customers run into, and the solutions that they solve with our product, we can now leverage that, and use it to harness machine-learning. We call this technology Iris, is what we're calling it. And so we've built out this entire meta-data framework that allows us to do data science on all of our meta-data in a very iterative and rapid fashion. And then we look for patterns, we look for historical data that we can learn from. And then what we do is we use that to train machinery and algorithms, in order to improve the customer experience in some way. When they're trying to achieve a task, specifically the first product feature that is based on the Iris technology is called the Integration Assistant. And the Integration Assistant is a very practical tool that is involved in the process of actually building out these pipelines. We call, when you build a pipeline it consists of these things called snaps, right? Snaps encapsulate functionality and then you can connect these snaps together. Now, it's often challenging when you have a problem to figure, OK, it's like a puzzle what snaps do I put together, and when do I put them together? Well, now that we've been doing this for a little while and we have quite a few customers with quite a few pipelines, we have a lot of knowledge about how people have solved those puzzles in the past. So, what we've done with Iris, is we've learned from all of those past solutions and now we give you automatic suggestions on where you might want to head next. And, we're getting pretty good accuracy for what we're predicting. So, we're basically, and this integration system is, a recommendation engine for connecting snaps into your pipelines as they're developing. So it's a real-time assistant. >> Jeff: So if I'm getting this right, it's really the intelligence of the crowd and the fact that you have so many customers that are executing many of the similar, same processes that you use as the basis to start to build the machine-learning to learn the best practices to make suggestions as people are going through this on their own. >> Greg: That's absolutely right. And furthermore, not only can we generalize from all of our customers to help new customers take advantage of this past knowledge, but what we can also do is tailor the suggestions for specific companies. So as you, as a company, as you start to build out more solutions that are specific to your problems, your different integration problems... >> Jeff: Right. >> The algorithms can now be, can learn from those specific things. So we both generalize and then we also make the work that you're doing easier within your company. >> And what's the specific impact? Are there any samples, stories you can share of what is the result of this type of activity? >> Greg: We're just, we're releasing it in May. >> Jeff: Oh OK. >> So it's going to be generally available to customers. >> Couple weeks still. >> Greg: Yeah. So... So... And... So... So we've done internal tests, so we've dove both through sort of the data science, so the experimentation to see, to feed it and get the feedback around how accurately it works. But we've also done user studies and what the user studies, not only did the science show but the user studies show that it can improve the time to completion of these pipelines, as you're building them. >> Lisa: So talk to us a little bit about who your target audience is. We're AWS, as we said. They really started 10 years ago in the start of space and have grown tremendous at getting to enterprise. Who is the target audience for SnapLogic that you're going after to help them really significantly improve their infrastructure get to the cloud, and beyond? >> Greg: So, so, so basically, we work with, largely with IT organizations within enterprises, who are, you know, larger companies are tasked with having sort of a common fabric for connecting, you know, which in an organization is lots of different databases for different purposes, ERP systems, you know, now, increasingly, lots of cloud applications and that's where part of our target is, we work with a lot of companies that still have policies where of course their data must be behind their firewall and maybe even on their premise, so our technology, while we're... we're hosted and run in the cloud, and we get the advantage of the SAS, a SAS platform, we also have the ability to run behind a firewall, and execute these data pipelines in the security domains of the customers themselves. So, they get the advantage of SAS, they get the advantage of things like Iris, and the Integration Assistant, right, because we can leverage all of the knowledge, but they get to adhere to any, you know, any regulatory or security policies that they have. And we don't have to see their data or touch their data. >> Lisa: So helping a customer that was, you know, using a service-oriented architecture or an ETL, modernize their infrastructure? >> Greg: Oh it's completely about modernization. Yeah, I mean, we, you know, our CEO, Gaurav Dhillon has been in the space for a while. He was formerly the CEO of Informatica. And so he has a lot of experience. And when he set out to start SnapLogic he wanted to look, you know, embrace the technologies of the time, right? So we're web-focused, right? We're HTTP and REST and JSON data. And we've centered the core technologies around these modern principles. So that makes us work very well with all the modern applications that you see today. >> Jeff: Look Greg, I want to shift gears a little bit. >> Greg: Yeah. >> You're also a professor. >> Greg: Correct. >> At University of San Francisco and UC Davis. I'd just love to get your perspective from the academic side of the house on what's happening at schools, around this new opportunity with big data, machine-learning, and AI and how that world is kind of changing? And then you are sitting in this great position where you kind of cross-over both... How does that really benefit, you know, to have some of that fresh, young blood, and learning, and then really take that back over, back into the other side of the house? >> Greg: Yeah, so a couple of things. Yeah, professor at University of San Francisco for 19 years. I did my PhD at UC Davis in computer science. And... My background is research in operating systems, parallel and distributed computing, in recent years, big data frameworks, big data processing. And University of San Francisco, itself, we have a, what we call the Senior and Masters Project Programs. Where, we've been doing this for, ever since I've been at USF, where what we do is we partner groups of students with outside sponsors, who are looking for opportunities to explore a research area. Maybe one that they can't allocate, you know, they can't justify allocating funds for, because it's a little bit outside of the main product, right? And so... It's a great win, 'cause our students get experience with a San Francisco, Silicon Valley company, right? So it helps their resume. It enhances their university experience, right? And because, you know, a lot of research happens in academia and computer science but a lot of research is also happening in industry, which is a really fascinating thing, if you look at what has come out of some of the bigger companies around here. And we feel like we're doing the same thing at SnapLogic and at the University of San Francisco. So just to kind of close that loop, students are great because they're not constrained by, maybe, some of us who have been in the industry for a while, about maybe what is possible and what's no so possible. And it's great to have somebody come and look at a problem and say, "You know, I think we could approach this differently." And, in fact, really, the impetus for the Integration Assistant came out of one of these projects where I pitched to our students, and I said "OK, we're going to explore SnapLogic meta-data and we're going to look at ways we can leverage machine-learning in the product on this data." But I left it kind of vague, kind of open. This fantastic student of mine from Thailand, his name is Jump, he kind of, he spent some time looking at the data and he actually said, "You know I'm seeing some patterns here. I'm seeing that, you know, we've got this great repository of these," like I described, "of these solved puzzles. And I think we could use that to train some algorithms." And so we spent, in the project phase, as part of his coursework, he worked on this technology. Then we demoed it at the company. The company said, "Wow, this is great technology. Let's put this into production." And then, there was kind of this transition from sort of this more academic, sort of experimental project into, going with engineers and making it a real feature. >> Lisa: What a great opportunity though, not just for the student to get more real-world applicability, like you're saying, taking it from that very experimental, investigational, academic approach and seeing all of the components within a business, that student probably gets so much more out of just an experiment. But your other point is very valid of having that younger talent that maybe doesn't have a lot of the biases and the pre-conceived notions that those of us that have been in the industry for a while. That's a great pipeline, no pun intended... >> Greg: Sure. >> For SnapLogic, is that something that you helped bring into the company by nature of being a professor? Just sort of a nice by-product? >> Well, so a couple of things there. One is that, like I said, University of San Francisco we were running this project class for a while, and... I got involved, you know, I had been at USF for a long time before I got involved with SnapLogic. I was introduced to Gaurav and there was this opportunity. And initially, right, initially, I was looking to apply some of my research to the technology, their product and their technology. But then it became clear that hey, you know we have this infrastructure in place at the university, they go through the academic training, our students are, it's a very rigorous program, back to your point about what they are exposed to, we have, you know, we're very modern, around big data, machine-learning, and then all of the core computer science that you would expect from a program. And so, yeah, it's been... It's been a great mutually beneficial relationship with SnapLogic and the students. But many other companies also come and pitch projects and those students also do similar types of projects at other companies. I would like to say that I started it at USF but I didn't. It was in existence. But I helped carry it forward. >> Jeff: That's great. >> Lisa: That is fantastic. >> And even before we got started, I mean you said your kind of attitude was to be the iPhone in this space. >> Greg: Of integration, yeah. >> Jeff: So again, taking a very different approach a really modern approach, to the expected behavior of things is very different. And you know, the consumerization of IT in terms of the expected behavior of how we interact with stuff has been such a powerful driver in the development of all these different applications. It's pretty amazing. >> Greg: And I think, you know, just like maybe, now you couldn't imagine most sort-of consumer-facing products not having a mobile application of some sort, increasingly what you're seeing is applications will require machine-learning, right, will require some amount of augmented intelligence. And I would go as far to say that the technology that we're doing at SnapLogic with self-service integration is also going to be a requirement. That, you just can't think of self-service integration without having it powered by a machine-learning framework helping you, right? It almost, like, in a few years we won't imagine it any other way. >> Lisa: And I like the analogy that Jeff, you just brought up, Greg, the being the iPhone of data integration. The simplicity message, something that was very prevalent today at the keynote, about making things simpler, faster, enabling more. And it sounds like that's what you're leveraging computer science to do. So, Greg Benson, Chief Scientist at SnapLogic. Thank you so much for being on theCUBE, you're now CUBE alumni, so that's fantastic. >> Alright. >> Lisa: We appreciate you being here and we appreciate you watching. For my co-host Jeff Rick, I'm Lisa Martin, again we are live from the AWS Summit in San Francisco. Stick around, we'll be right back. (upbeat music)

Published Date : Apr 19 2017

SUMMARY :

Brought to you by Amazon Web Services. live at the Moscone Center at the and now we give you automatic suggestions and the fact that you have so many customers that are more solutions that are specific to your problems, make the work that you're doing easier so the experimentation to see, to feed it Lisa: So talk to us a little bit about but they get to adhere to any, you know, any regulatory all the modern applications that you see today. How does that really benefit, you know, And because, you know, a lot of research happens not just for the student to get more real-world we have, you know, we're very modern, And even before we got started, I mean you said And you know, the consumerization of IT Greg: And I think, you know, just like maybe, And it sounds like that's what you're leveraging and we appreciate you watching.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Jeff Rick	PERSON	0.99+
Greg	PERSON	0.99+
Lisa	PERSON	0.99+
Greg Benson	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Gaurav Dhillon	PERSON	0.99+
AWS	ORGANIZATION	0.99+
SnapLogic	ORGANIZATION	0.99+
Thailand	LOCATION	0.99+
May	DATE	0.99+
19 years	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
UC Davis	ORGANIZATION	0.99+
University of San Francisco	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
San Francisco	LOCATION	0.99+
first time	QUANTITY	0.99+
10 years ago	DATE	0.98+
both	QUANTITY	0.98+
Silicon Valley	LOCATION	0.98+
Moscone Center	LOCATION	0.98+
Gaurav	PERSON	0.98+
Jump	PERSON	0.98+
AWS Summit	EVENT	0.97+
CUBE	ORGANIZATION	0.97+
today	DATE	0.96+
#AWSSummit	EVENT	0.94+
AWS Summit 2017	EVENT	0.93+
Amazon Web Services Summit	EVENT	0.93+
Couple weeks	QUANTITY	0.93+
USF	ORGANIZATION	0.92+
Iris	TITLE	0.92+
One	QUANTITY	0.91+
first product feature	QUANTITY	0.89+
AWS Summit SF 2017	EVENT	0.87+
last two years	DATE	0.83+
theCUBE	ORGANIZATION	0.8+
JSON	OTHER	0.76+
Iris	ORGANIZATION	0.73+
REST	OTHER	0.67+
one of these	QUANTITY	0.66+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for USF: