Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Ken Shifman	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Erik Bradley	PERSON	0.99+
November 21	DATE	0.99+
Darren Bramen	PERSON	0.99+
Alex	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Netskope	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
$50 million	QUANTITY	0.99+
21%	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
19%	QUANTITY	0.99+
Jeremy Burton	PERSON	0.99+
$800 million	QUANTITY	0.99+
6,000	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
November '21	DATE	0.99+
ETR	ORGANIZATION	0.99+
First	QUANTITY	0.99+
25%	QUANTITY	0.99+
last year	DATE	0.99+
OneTrust	ORGANIZATION	0.99+
two dimensions	QUANTITY	0.99+
two groups	QUANTITY	0.99+
November of 21	DATE	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
more than 400 companies	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
MySQL	TITLE	0.99+
Moogsoft	ORGANIZATION	0.99+
The Cube	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.99+
first	QUANTITY	0.99+
28%	QUANTITY	0.99+
16%	QUANTITY	0.99+
Second	QUANTITY	0.99+

Jonathan Seckler, Dell & Cal Al-Dhubaib, Pandata | VMware Explore 2022

(gentle music) >> Welcome back to theCUBE's virtual program, covering VMware Explorer, 2022. The first time since 2019 that the VMware ecosystem is gathered in person. But in the post isolation economy, hybrid is the new format, cube plus digital, we call it. And so we're really happy to welcome Cal Al-Dhubaib who's the founder and CEO and AI strategist of Pandata. And Jonathan Seckler back in theCUBE, the senior director of product marketing at Dell Technologies. Guys, great to see you, thanks for coming on. >> Yeah, thanks a lot for having us. >> Yeah, thank you >> Cal, Pandata, cool name, what's it all about? >> Thanks for asking. Really excited to share our story. I'm a data scientist by training and I'm based here in Cleveland, Ohio. And Pandata is a company that helps organizations design and develop machine learning and AI technology. And when I started this here in Cleveland six years ago, I had people react to me with, what? So we help demystify AI and make it practical. And we specifically focus on trustworthy AI. So we work a lot in regulated industries like healthcare. And we help organizations navigate the complexities of building machine learning and AI technology when data's hard to work with, when there's risk on the potential outcomes, or high cost in the consequences. And that's what we do every day. >> Yeah, yeah timing is great given all the focus on privacy and what you're seeing with big tech and public policy, so we're going to get into that. Jonathan, I understand you guys got some hard news. What's your story around AI and AutoML? Share that with us. >> Yeah, thanks. So having the opportunity to speak with Cal today is really important because one of the hardest things that we find that our customers have is making that transition of experimenting with AI to making it really useful in real life. >> What is the tech underneath that? Are we talking VxRail here? Are you're talking servers? What do you got? >> Yeah, absolutely. So the Dell validated design for AI is a reference framework that is based on the optimized set of hardware for a given outcome. That includes it could be VxRail, VMware, vSphere and Nvidia GPUs and Nvidia software to make all of that happen. And for today, what we're working with is H2O.ai's solution to develop automatic machine learning. So take just that one more step to make it easier for customers to bring AI into production. >> Cool. >> So it's a full stack of software that includes automated machine learning, it includes NVIDIA's AI enterprise for deployment and development, and it's all built on an engineering validated set of hardware, including servers and storage and whatever else you need >> AI out of the box, I don't have to worry about cobbling it all together. >> Exactly. >> Cal, I want to come back to this trusted AI notion. A lot of people don't trust AI just by the very nature of it. I think about, okay, well how does it know it's a cat? And then you can never explain, it says black box. And so I'm like, what are they do with my data? And you mentioned healthcare, financial services, the government, they know everything about me. I just had to get a real ID and Massachusetts, I had to give all my data away. I don't trust it. So what is trusted AI? >> Well, so let me take a step back and talk about sobering statistics. There's a lot of different sources that report on this, but anywhere you look, you'll hear somewhere between 80 to 90% of AI projects fail to yield a return. That's pretty scary, that's a disappointing industry. And why is that? AI is hard. Versus traditional software, you're programming rules hard and fast. If I click this button, I expect A, B, C to happen. And we're talking about recognizing and reacting to patterns. It's not, will it be wrong? It's, when it's wrong, how wrong will it be? And what are it cost to accept related to that? So zooming back in on this lens of trustworthy AI, much of the last 10 years the development in AI has looked like this. Let's get the data, let's race to build the warehouses, okay we did that, no problem. Next was race to build the algorithms. Can we build more sophisticated models? Can we work with things like documents and images? And it used to be the exclusive domain of deep tech companies. You'd have to have teams of teams building the software, building the infrastructure, working on very specific components in this pipeline. And now we have this explosion of technologies, very much like what Jonathan was talking about with validated designs. So it removes the complexities of the infrastructure, it removes the complexities of being able to access the right data. And we have a ton of modeling capabilities and tools out there, so we can build a lot of things. Now, this is when we start to encounter risk in machine learning and AI. If you think about the models that are being used to replicate or learn from language like GPT-3 to create new content, it's training data set is everything that's on the internet. And if you haven't been on the internet recently, it's not all good. So how do you go about building technology to recognize specific patterns, pick up patterns that are desirable, and avoid unintended consequences? And no one's immune to this. So the discipline of trustworthy AI is building models that are easier to interrogate, that are useful for humans, and that minimize the risk of unintended consequences. >> I would add too, one of the good things about the Pandata solution is how it tries to enforce fairness and transparency in the models. We've done some studies recently with IDC, where we've tried to compare leaders in AI technology versus those who are just getting started. And I have to say, one of the biggest differences between a leader in AI and the rest of us is often that the leaders have a policy in place to deal with the risks and the ethics of using data through some kind of machine oriented model. And it's a really important part of making AI usable for the masses. >> You certainly hear a lot about, AI ultimately, there's algorithms which are built by humans. Although of course, there's algorithms to build algorithms, we know that today. >> Right, exactly. >> But humans are biased, there's inherent bias, and so this is a big problem. Obviously Dell, you have a giant observation space in terms of customers. But I wonder, Cal, if you can share with us how you're working with your customers at Pandata? What kind of customers are you working with? What are they asking? What problems are they asking you to solve? And how does it manifest itself? >> So when I like to talk about AI and where it's useful, it usually has to do with taking a repetitive task that humans are tasked with, but they're starting to act more like machines than humans. There's not much creativity in the process, it's handling something that's fairly routine, and it ends up being a bottleneck to scaling. And just a year ago even, we'd have to start approaching our clients with conversations around trustworthy AI, and now they're starting to approach us. Really example, this actually just happened earlier today, we're partnering with one of our clients that basically scans medical claims from insurance providers. And what they're trying to do is identify members that qualify for certain government subsidies. And this isn't as straightforward as it seems because there's a lot of complexities in how the rules are implemented, how judges look at these cases. Long story short, we help them build machine learning to identify these patients that qualify. And a question that comes up, and that we're starting to hear from the insurance companies they serve is how do you go about making sure that your decisions are fair and you're not selecting certain groups of individuals over others to get this assistance? And so clients are starting to wise up to that and ask questions. Other things that we've done include identifying potential private health information that's contained in medical images so that you can create curated research data sets. We've helped organizations identify anomalies in cybersecurity logs. And go from an exploration space of billions of eventual events to what are the top 100 that I should look at today? And so it's all about, how do you find these routine processes that humans are bottlenecked from getting to, we're starting to act more like machines and insert a little bit of outer recognition intelligence to get them to spend more time on the creative side. >> Can you talk a little bit more about how? A lot of people talk about augmented AI. AI is amazing. My daughter the other day was, I'm sure as an AI expert, you've seen it, where the machine actually creates standup comedy which it's so hilarious because it is and it isn't. Some of the jokes are actually really funny. Some of them are so funny 'cause they're not funny and they're weird. So it really underscored the gap. And so how do you do it? Is it augmented? Is it you're focusing on the mundane things that you want to take humans out of the loop? Explain how. >> So there's this great Wall Street Journal article by Jennifer Strong that she published I think four years ago now. And she says, "For AI to become more useful, it needs to become more boring." And I really truly believe in that. So you hear about these cutting edge use cases. And there's certainly some room for these generative AI applications inspiring new designs, inspiring new approaches. But the reality is, most successful use cases that we encounter in our business have to do with augmenting human decisions. How do you make arriving at a decision easier? How do you prioritize from millions of options, hundreds of thousands of options down to three or four that a human can then take the last stretch and really consider or think about? So a really cool story, I've been playing around with DALL.E 2. And for those of you who haven't heard, it's this algorithm that can create images from props. And they're just painting I really wish I had bought when I was in Paris a few years ago. And I gave it a description, skyline of the Sacre-Coeur Church in Montmartre with pink and white hues. And it came up with a handful of examples that I can now go take to an artist and say paint me this. So at the end of the day, automation, it's not really, yes, there's certain applications where you really are truly getting to that automated AI in action. But in my experience, most of the use cases have to do with using AI to make humans more effective, more creative, more valuable. >> I'd also add, I think Cal, is that the opportunity to make AI real here is to automate these things and simplify the languages so that can get what we call citizen data scientists out there. I say ordinary, ordinary employees or people who are at the front line of making these decisions, working with the data directly. We've done this with customers who have done this on farms, where the growers are able to use AI to monitor and to manage the yield of crops. I think some of the other examples that you had mentioned just recently Cal I think are great. The other examples is where you can make this technology available to anyone. And maybe that's part of the message of making it boring, it's making it so simple that any of us can use it. >> I love that. John Furrier likes to say that traditionally in IT, we solve complexity with more complexity. So anything that simplifies things is goodness. So how do you use automated machine learning at Pandata? Where does that fit in here? >> So really excited that the connection here through H2O that Jonathan had mentioned earlier. So H2O.ai is one of the leading AutoML platforms. And what's really cool is if you think about the traditional way you would approach machine learning, is you need to have data scientists. These patterns might exist in documents or images or boring old spreadsheets. And the way you'd approach this is, okay, get these expensive data scientists, and 80% of what they do is clean up the data. And I'm yet to encounter a situation where there isn't cleaning data. Now, I'll get through the cleaning up the data step, you actually have to consider, all right, am I working with language? Am I working with financial forecasts? What are the statistical modeling approaches I want to use? And there's a lot of creativity involved in that. And you have to set up a whole experiment, and that takes a lot of time and effort. And then you might test one, two or three models because you know to use those or those are the go to for this type of problem. And you see which one performs best and you iterate from there. The AutoML framework basically allows you to cut through all of that. It can reduce the amount of time you're spending on those steps to 1/10 of the time. You're able to very quickly profile data, understand anomalies, understand what data you want to work with, what data you don't want to work with. And then when it comes to the modeling steps, instead of iterating through three or four AutoML is throwing the whole kitchen sink at it. Anything that's appropriate to the task, maybe you're trying to predict a category or label something, maybe you're trying to predict a value like a financial forecast or even generate test. And it tests all of the models that it has at its disposal that are appropriate to the task and says, here are the top 10. You can use features like let me make this more explainable, let me make the model more accurate. I don't necessarily care about interrogating the results because the risk here is low, I want to a model that predicts things with a higher accuracy. So you can use these dials instead of having to approach it from a development perspective. You can approach it from more of an experimental mindset. So you still need that expertise, you still need to understand what you're looking at, but it makes it really quick. And so you're not spending all that expensive data science time cleaning up data. >> Makes sense. Last question, so Cal, obviously you guys go deep into AI, Jonathan Dell works with every customer on the planet, all sizes, all industries. So what are you hearing and doing with customers that are best practices that you can share for people that want to get into it, that are concerned about AI, they want to simplify it? What would you tell them? Go ahead, Cal. >> Okay, you go first, Cal. >> And Jonathan, you're going to bring us home. >> Sure. >> This sounds good. So as far as where people get scared, I see two sides of it. One, our data's not clean enough, not enough quality, I'm going to stay away from this. So one, I combat that with, you've got to experiment, you got to iterate, And that's the only way your data's going to improve. Two, there's organizations that worry too much about managing the risk. We don't have the data science expertise that can help us uncover potential biases we have. We are now entering a new stage of AI development and machine learning development, And I use those terms interchangeably anymore. I know some folks will differentiate between them. But machine learning is the discipline driving most of the advances. The toolkits that we have at our disposal to quickly profile and manage and mitigate against the risk that data can bring to the table is really giving organizations more comfort, should give organizations more comfort to start to build mission critical applications. The thing that I would encourage organizations to look for, is organizations that put trustworthy AI, ethical AI first as a consideration, not as an afterthought or not as a we're going to sweep this on the carpet. When you're intentional with that, when you bring that up front and you make it a part of your design, it sets you up for success. And we saw this when GDPR changed the IT world a few years ago. Organizations that built for privacy first to begin with, adapting to GDPR was relatively straightforward. Organizations that made that an afterthought or had that as an afterthought, it was a huge lift, a huge cost to adapt and adjust to those changes. >> Great example. All right, John, I said bring us home, put a bow on this. >> Last bit. So I think beyond the mechanics of how to make a AI better and more workable, one of the big challenges with the AI is this concern that you're going to isolate and spend too much effort and dollars on the infrastructure itself. And that's one of the benefits that Dell brings to the table here with validated designs. Is that our AI validated design is built on a VMware vSphere architecture. So your backup, your migration, all of the management and the operational tools that IT is most comfortable with can be used to maintain and develop and deploy artificial intelligence projects without having to create unique infrastructure, unique stacks of hardware, and then which potentially isolates the data, potentially makes things unavailable to the rest of the organization. So when you run it all in a VMware environment, that means you can put it in the cloud, you can put it in your data center. Just really makes it easier for IT to build AI into their everyday process >> Silo busting. All right, guys, thanks Cal, John. I really appreciate you guys coming on theCUBE. >> Yeah, it's been a great time, thanks. >> All right. And thank you for watching theCUBE's coverage of VMware Explorer, 2022. Keep it right there for more action from the show floor with myself, Dave Velante, John Furrier, Lisa Martin and David Nicholson, keep it right there. (gentle music)

Published Date : Aug 30 2022

SUMMARY :

that the VMware ecosystem I had people react to me with, what? given all the focus on privacy So having the opportunity that is based on the I don't have to worry about And then you can never and that minimize the risk And I have to say, one of algorithms to build algorithms, And how does it manifest itself? so that you can create And so how do you do it? that I can now go take to an the opportunity to make AI real here So how do you use automated And it tests all of the models that are best practices that you can share going to bring us home. And that's the only way your All right, John, I said bring And that's one of the benefits I really appreciate you And thank you for watching

ENTITIES

Entity	Category	Confidence
Jonathan	PERSON	0.99+
John	PERSON	0.99+
Jennifer Strong	PERSON	0.99+
Jonathan Seckler	PERSON	0.99+
Dave Velante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Nicholson	PERSON	0.99+
Cleveland	LOCATION	0.99+
Paris	LOCATION	0.99+
John Furrier	PERSON	0.99+
Jonath	PERSON	0.99+
Jonathan Dell	PERSON	0.99+
two	QUANTITY	0.99+
80%	QUANTITY	0.99+
Pandata	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
billions	QUANTITY	0.99+
Cleveland, Ohio	LOCATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
six years ago	DATE	0.99+
four	QUANTITY	0.99+
Montmartre	LOCATION	0.99+
three	QUANTITY	0.99+
Two	QUANTITY	0.99+
GDPR	TITLE	0.99+
a year ago	DATE	0.99+
2022	DATE	0.99+
Cal Al-Dhubaib	PERSON	0.98+
today	DATE	0.98+
Cal	PERSON	0.98+
2019	DATE	0.98+
first time	QUANTITY	0.98+
VxRail	TITLE	0.98+
first	QUANTITY	0.97+
Massachusetts	LOCATION	0.97+
millions of options	QUANTITY	0.97+
AutoML	TITLE	0.97+
three models	QUANTITY	0.97+
four years ago	DATE	0.97+
80	QUANTITY	0.96+
IDC	ORGANIZATION	0.96+
90%	QUANTITY	0.96+
DALL.E 2	TITLE	0.96+
1/10	QUANTITY	0.95+
VMware Explorer	TITLE	0.93+
Sacre-Coeur Church	LOCATION	0.92+
earlier today	DATE	0.91+
theCUBE	ORGANIZATION	0.9+
H2O.ai	TITLE	0.9+
Pandata	PERSON	0.9+
hundreds of thousands of options	QUANTITY	0.87+
10	QUANTITY	0.86+
VMware vSphere	TITLE	0.84+
few years ago	DATE	0.83+
H2O	TITLE	0.83+
GPT	TITLE	0.82+
VMware	ORGANIZATION	0.8+
Al-Dhubaib	PERSON	0.8+
100	QUANTITY	0.79+

Robert Picciano & Shay Sabhikhi | CUBE Conversation, October 2021

>>Machine intelligence is everywhere. AI is being embedded into our everyday lives, through applications, process automation, social media, ad tech, and it's permeating virtually every industry and touching everyone. Now, a major issue with machine learning and deep learning is trust in the outcome. That is the black box problem. What is that? Well, the black box issue arises when we can see the input and the output of the data, but we don't know what happens in the middle. Take a simple example of a picture of a cat or a hotdog for you. Silicon valley fans, the machine analyzes the picture and determines it's a cat, but we really don't know exactly how the machine determined that. Why is it a problem? Well, if it's a cat on social media, maybe it isn't so onerous, but what if it's a medical diagnosis facilitated by a machine? And what if that diagnosis is wrong? >>Or what if the machine is using deep learning to qualify an individual for a home loan and that person applying for the loan gets rejected. Was that decision based on bias? If the technology to produce that result is opaque. Well, you get the point. There are serious implications of not understanding how decisions are made with AI. So we're going to dig into the issue and the topic of how to make AI explainable and operationalize AI. And with me are two guests today, Shea speaky, who's the co-founder and COO of cognitive scale and long time friend of the cube and newly minted CEO of cognitive scale. Bob pitchy, Yano, gents. Welcome to the cube, Bob. Good to see you again. Welcome back on. >>Thanks for having us >>Say, let me start with you. Why did you start the company? I think you started the company in 2013. Give us a little history and the why behind cognitive scale. >>Sure. David. So, um, look, I spent some time, um, you know, through multiple startups, but I ended up at IBM, which is where I met Bob. And one of the things that we did was the commercialization of IBM Watson initially. And that led to, uh, uh, thinking about how do you operationalize this because of the, a lot of people thinking about data science and machine learning in isolation, building models, you know, trying to come up with better ways to deliver some kind of a prediction, but if you truly want to operationalize it, you need to think about scale that enterprises need. So, you know, we were in the early days, enamored by ways, I'm still in landed by ways. The application that takes me from point a to point B and our view is look as you go from point a to point B, but if you happen to be, um, let's say a patient or a financial services customer, imagine if you could have a raise like application giving you all the insights that you needed telling you at the right moment, you know, what was needed, the right explanation so that it could guide you through the journey. >>So that was really the sort of the thesis behind cognitive scale is how do you apply AI, uh, to solve problems like that in regulated industries like health care management services, but do it in a way that it's done at scale where you can get, bring the output of the data scientists, application developers, and then those insights that can be powered into those end applications like CRM systems, mobile applications, web applications, applications that consumers like us, whether it be in a healthcare setting or a financial services setting can get the benefit of those insights, but have the appropriate sort of evidence and transparency behind it. So that was the, that was the thesis for. >>Got it. Thank you for that. Now, Bob, I got to ask you, I knew you couldn't stay in the sidelines, my friend. So, uh, so what was it that you saw in the marketplace that Lord you back in to, to take on the CEO role? >>Yeah, so David is an exciting space and, uh, you're right. I couldn't stay on the sideline stuff. So look, I always felt that, uh, enterprise AI had a promise to keep. Um, and I don't think that many enterprises would say, you know, with their experience that yeah, we're getting the value that we wanted out of it. We're getting the scale that we wanted out of it. Um, and we're really satisfied with what it's delivered to us so far. So I felt there was a gap in keeping that promise and I saw cognitive scale as an important company and being able to fill that gap. And the reason that that gap exists is that, you know, enterprise AI, unlike AI, that relates to one particular conversational service or one particular small narrow domain application is really a team sport. You know, it involves all sorts of roles, um, and all sorts of aspects of a working enterprise. >>That's already scaled with systems of engagement, um, and, and systems of record. And we show up in the, with the ability to actually help put all of that together. It's a brown field, so to speak, not a Greenfield, um, and where Shea and Matt and Minosh and the team really focused was on what are the important last mile problems, uh, that an enterprise needs to address that aren't necessarily addressed with any one tool that might serve some members of that team? Because there are a lot of great tools out there in the space of AI or machine learning or deep learning, but they don't necessarily help come together to, to deliver the outcomes that an enterprise wants. So what are those important aspects? And then also, where do we apply AI inside of our platform and our capabilities to kind of take that operationalization to the next level, uh, with, you know, very specific insights and to take that journey and make it highly personalized while also making it more transparent and explainable. >>So what's the ICP, the ideal customer profile, is it, is it highly regulated industries? Is it, is it developers? Uh, maybe you could parse that a little bit. >>Yeah. So we do focus in healthcare and in financial services. And part of the reason for that is the problem is very difficult for them. You know, you're, you're working in a space where, you know, you have rules and regulations about when and how you need to engage with that client. So the bar for trust is very, very high and everything that we do is around trusted AI, which means, you know, thinking about using the data platforms and the model platforms in a way to create marketplaces, where being able to utilize that data is something that's provisioned in permission before we go out and do that assembly so that the target customer really is somebody who's driving digital transformation in those regulated industries. It might be a chief digital officer. It might be a chief client officer, customer officer, somebody who's really trying to understand. I have a very fragmented view of my member or of my patient or my client. And I want to be able to utilize AI to help that client get better outcomes or to make sure that they're not lost in the system by understanding and more holistically understanding them in a more personalized way, but while always maintaining, you know, that that chain of trust >>Got it. So can we get into the product like a little bit more about what the product is and maybe share, you can give us a census to kind of where you started and the evolution of the portfolio >>Look where we started there is, um, the application of AI, right? So look, the product and the platform was all being developed, but our biggest sort of view from the start had been, how do you get into the trenches and apply this to solve problems? And as well, pointed out, one of the areas we picked was healthcare because it is a tough industry. There's a lot of data, but there's a lot of regulation. And it's truly where you need the notion of being able to explain your decision at a really granular level, because those decisions have some serious consequences. So, you know, he started building a platform out and, um, a core product is called cortex. It's the, it's a software platform on top of this. These applications are built, but to our engagements over the last six, seven years, working with customers in healthcare, in financial services, some of the largest banks, the largest healthcare organizations, we have developed a software product to essentially help you scale enterprise AI, but it starts with how do you build these systems? >>Building the systems requires us to provide tooling that can help developers take models, data that exists within the enterprise, bring it together, rapidly, assemble this, orchestrate these different components, stand up. These systems, deploy these systems again in a very complex environment that includes, you know, on-prem systems as well as on the cloud, and then be able to done on APIs that can plug into an application. So we had to essentially think of this entire problem end to end, and that's poor cortex does, but extremely important part of cortex that didn't start off. Initially. We certainly had all the, you know, the, the makings of a trusted AI would be founded the industry wasn't quite ready over time. We've developed capabilities around explainability being able to detect bias. So not only are you building these end to end systems, assembling them and deploying them, you have as a first-class citizen built into this product, the notion of being able to understand bias, being able to detect whether there's the appropriate level of explainability to make a decision and all of that's embedded within the cortex platform. So that's what the platform does. And it's now in its sixth generation as we >>Speak. Yeah. So Dave, if you think about the platform, it really has three primary components. One is this, uh, uh, application development or assembly platform that fits between existing AI tools and models and data and systems of engagement. And that allows for those AI developers to rapidly visualize and orchestrate those aspects. And in that regard were tremendous partners with people like IBM, Microsoft H2O people that provide aspects that are helping develop the data platform, the data fabric, things like the, uh, data science tools to be able to then feed this platform. And then on the front end, really helping transform those systems of engagement into things that are more personalized with better recommendations in a more targeted space with explainable decisions. So that's one element that's called cortex fabric. There's another component called cortex certify. And that capability is largely around the model intelligence model introspection. >>It works, uh, across things that are of cost model driven, but other things that are based on deterministic algorithms, as well as rule-based algorithms to provide that explainability of decisions that are made upstream before they get to the black box model, because organizations are discovering that many times the data has, you know, aspects of dimensions to it and, and, and biases to it before it gets to the model. So they want to understand that entire chain of, of, uh, of decisioning before it gets there. And then there's the notion of some pew, preacher rated applications and blueprints to rapidly deliver outcomes in some key repeating areas like customer experience or like lead generation. Um, those elements where almost every customer we engage with, who is thinking about digital transformation wants to start by providing better client experience. They want to reduce costs. They want to have operational savings while driving up things like NPS and improving the outcomes for the people they're serving. So we have those sets of applications that we built over time that imagine that being that first use application, that starter set, that also trains the customer on how to you utilize this operational platform. And then they're off to the races building out those next use cases. So what we see as one typical insertion place play that returns value, and then they're scaling rapidly. Now I want to cover some secret sauce inside of the platform. >>Yeah. So before you do, I think, I just want to clarify, so the cortex fabric, cause that's really where I wanted to go next, but the cortex fabric, it seems like that's the way in which you're helping people operationalize inject use familiar tooling. It sounds like, am I correct? That the cortex certify is where you're kind of peeling the onion of that complicated, whether it's deep learning or neural networks, which is that's where the black box exists. Maybe you could tell us, you know, is that where the secret sauce lives, if not, where is it? And if >>It actually is in all places right though. So there's some really important, uh, introductions of capabilities, because like I mentioned, many times these, uh, regulated industries have been developed and highly fragmented pillars. Just think about the insurance companies between property casualty and personal lines. Um, many times they have grown through acquisition. So they have these systems of record that are, that are really delivering the operational aspects of the company's products, but the customers are sometimes lost in the scenes. And so they've built master data management capabilities and data warehouse capabilities to try to serve that. But they find that when they then go to apply AI across some of those curated data environments, it's still not sufficient. So we developed an element of being able to rapidly assemble what we call a profile of one. It's a very, very intimate profile around declared data sources, uh, that relate to a key business entity. >>In most cases, it's a person, it's a member, it's a patient, it's a client, but it can be a product for some of our clients. It's real estate. Uh, it's a listing. Um, you know, it can be someone who's enjoying a theme park. It can be someone who's a shopper in a grocery store. Um, it can be a region. So it's any key business entity. And one of the places where we applied our AI knowledge is by being able to extract key information out of these declared systems and then start to make longitudinal observations about those systems and to learn about them. And then line those up with prediction engines that both we supply as well as third parties and the customers themselves supply them. So in this theme of operationalization, they're constantly coming up with new innovations or a new model that they might want to interject into that engagement application. Our platform with this profile of one allows them to align that model directly into that profile, get the benefits of what we've already done, but then also continue to enhance, differentiate and provide even greater, uh, greater value to that client. IBM is providing aspects of those models that we can plug in. And many of our clients are that's really >>Well. That's interesting. So that profile of one is kind of the instantiation of that secret sauce, but you mentioned like master data management data warehouse, and, you know, as well as I do Bob we've we've we've decades of failures trying to get a 360 degree view for example of the customer. Uh, it's just, just not real time. It's not as current as we would want it to be. The quality is not necessarily there. It's a very asynchronous process. Things have changed the processing power. You and I have talked about this a lot. We have much more data now. So it's that, that, that profile one. So, but also you mentioned curated apps, customer experience, and lead gen. You mentioned those two, uh, and you've also talked about digital transformation. So it sounds like you're supporting, and maybe this is not necessarily the case, but I'm curious as to what's going on here, maybe supporting more revenue generation in the early phases than say privacy or compliance, or is it actually, do you have use cases for both? >>It's all, it's all of it. Um, and, and shake and, you know, really talk passionately about some of the things we've helped clients do, like for instance, uh, J money. Why don't you talk about the, the hospital, um, uh, uh, you know, discharge processes. >>Absolutely. So, so, you know, just to make this a bit more real, they, you know, when you talk about a profile on one, it's about understanding of patient, as I said earlier, but it's trying to bring this notion of not just the things that you know about the patient you call that declared information. You can find the system in, you can find this information in traditional EMR systems, right? But imagine bringing in, uh, observed information, things that you observed an interaction with the patient, uh, and then bring in inferences that you can then start drawing on top of that. So to bring this to a live example, imagine at the point of care, knowing when all the conditions are right for the patient to be discharged after surgery. And oftentimes as you know, those, if all the different evidence of the different elements that don't come together, you can make some really serious mistakes in terms of patient discharge, bad things can happen. >>Patient could be readmitted or even worse. That could be a serious outcome. Now, how do you bring that information at the point of care for the person making a decision, but not just looking at the information, you know, but also understanding not just the clinical information, but the social, the socioeconomic information, and then making sure that that decision has the appropriate evidence behind it. So then when you do make that decision, you have the appropriate sort of, uh, you know, the guidance behind it for audit reasons, but also for ensuring that you don't have a bad outcome. So that's the example Bob's talking about, where we have a flight this in real settings, in, in healthcare, but also in financial services and other industries where you can make these decisions based on the machine, telling you with a lot of detail behind it, whether this is the right decision to be made, we call this explainability and the evidence that's needed. >>You know, that's interesting. I, I, I'm imagining a use case in my mind where after a patient leaves, so often there's just a complete disconnect with the patient, unless that patient has problems and goes back, but that patient might have some problems, but they forget it's too much of a pain in the neck to go back, but, but the system can now track this and we could get much more accurate information and that could help in future diagnoses and, and also decision-making for a patient in terms of, of outcomes and probability of success. Um, question, what do you actually sell? So it's a middleware product. It's a, how do I license it? >>It's a, it's a, uh, it's a software platform. So we sell software, um, and it is deployed in the customer's cloud environment of choice. Uh, of course we support complete hybrid cloud capabilities. Um, we support native cloud deployments on top of Microsoft and Amazon and Google. And we support IBM's hybrid cloud initiative with red hat OpenShift as well, which also puts us in a position to both support those public cloud environments, as well as the customer's private cloud environments. So constructed with Kubernetes in that environment, um, which helps the customer also re you know, realize the value of that operational appar operationalization, because they can modify those applications and then redeploy them directly into their cloud environment and start to see those as struck to see those spaces. Now, I want to cover a couple of the other components of the secret sauce, if I could date to make sure that you've got a couple other elements where some real breakthroughs are occurring, uh, in these spaces. >>Um, so Dave, you and I, you know, we're passionate about the semiconductor industry, uh, and you know, we know what is, you know, happening with regard to innovation and broadening the people who are now siliconized their intellectual property and a lot of that's happening because those companies who have been able to figure out how to manufacture or how to design those semiconductors are operationalizing those platforms with our customers. So you have people like apple who are able to really break out of the scene and do things by utilizing utilities and macros their own knowledge about how things need to work. And it's just, it's very similar to what we're talking about doing here for enterprise AI, they're operationalizing that construction, but none of those companies would actually start creating the actual devices until they go through simulation and design. Correct. Well, when you think about most enterprises and how they develop software, they just immediately start to develop the code and they're going through AB testing, but they're all writing code. >>They're developing those assets. They're creating many, many models. You know, some organizations say 90% of the models they create. They never use some say 50, and they think that's good. But when you think about that in terms of, you know, the capital that's being deployed, both on the resources, as well as the infrastructure, that's potentially a lot of waste as well. So one of the breakthroughs is, uh, the creation of what we call synthetic data and simulations inside of our, of our operational platform. So cortex fabric allows someone to actually say, look, this is my data pattern. And because it's sensitive data, it might be, you know, PII. Um, we can help them by saying, okay, what is the pattern of that data? And then we can create synthetic data off of that pattern for someone to experiment with how a model might function or how that might work in the application context. >>And then to run that through a set of simulations, if they want to bring a new model into an application and say, what will the outcomes of this model be before I deployed into production, we allow them to drive simulations across millions or billions of interactions to understand what is that model going to be effective. Was it going to make a difference for that individual or for this application or for the cost savings goal and outcomes that I'm trying to drive? So just think about what that means in terms of that digital transformation officers, having the great idea, being in the C-suite and saying, I want to do this with my business. Oftentimes they have to turn around to the CIO or the chief data officer and say, when can you get me that data? And we all know the answer to that question. They go like this, like the, yeah, I've got a couple other things on the plate and I'll get to that as soon as I can. >>Now we're able to liberate that. Now we're able to say, look, you know, what's the concept that you're trying to develop. Let's create the synthetic data off of that environment. We have a Corpus of data that we have collected through various client directions that many times gets that bootstrapped and then drive that through simulation. So we're able to drive from imagination of what could be the outcome to really getting high confidence that this initiative is going to have a meaningful value for the enterprise. And then that stimulates the right kind of following and the right kind of endorsement, uh, throughout really driving that change to the enterprise and that aspect of the simulations, the ability to plan out what that looks like and develop those synthetic aspects is another important element that the secret sauce inside of cortex fabric, >>Back to the semiconductor innovation, I can do that very cheaply. I think, I think I I'm thinking AWS cloud, I could experiment using graviton or maybe do a little bit of training with some, you know, new processors and, and then containerize it, bring it back to my on-premise state and apply it. Uh, and so, uh, just a as you say, a much more agile environment, um, yeah, >>Speed efficiency, um, and the ability to validate the hypothesis that, that started the process. >>Guys, think about the Tam, the total available market. Can we have that discussion? How big is that? >>I mean, if you think about the spend across, uh, the healthcare space and financial services, we're talking about hundreds of billions, uh, in that, in terms of what the enterprise AI opportunity, as in just those spaces. And remember financial services is a broad spectrum. So one of the things that we're actually starting to roll out today in fact, is a SAS service that we developed. That's based on top of our offerings called trust star trust star.ai, and trust star is a set of personalized insights that get delivered directly to the loan officer inside of, uh, an institution who's trying to, uh, really match, uh, lending to someone who wants to buy a property. Um, and when you think about many of those organizations, they have very, very high demand. They've got a lot of information, they've got a lot of regulation they need to adhere to. >>But many times they're very analytically challenged in terms of the tools they have to be able to serve those needs. So what's happening with new listings, what's happening with my competitors, what's happening. As people move from high tax states, where they want to potentially leave into new, more attractive toxin and opportunity-based environments where they're not known to those lending institutions that maybe, you know, they're, they're trying to be married up with. So we've developed a set of insights that are, is, this is a subscription service trust r.ai, um, which goes directly to the loan officer. And then we use our platform behind the scenes to use things like the home disclosure act, data, MLS data, other data that is typically Isagenix to those sources and providing very customized insights to help that buyer journey. And of course, along the way, we can identify things like are some of the decisions more difficult to explain, are there potential biases that might be involved in that environment as people are applying for mortgages, and we can really drive growth through inclusion for those lending institutions, because they might just not understand that potential client well enough, that we can identify the kind of things that they can do to know them better. >>And the benefit is really to hold there, right? And shale, I'll let you jump in, but to me, it's twofold. There. One is, you know, you want to have accurate decisions. You want to have low risk decisions. And if you want to be able to explain that to an individual that may get rejected, here's why, um, and, and it wasn't because of bias. It was because of XYZ and you need to work on these things, but go ahead shape. >>Now, this is going to add that point here, Dave, which is a double-faced point on the dam. One of the things that, and the reason why, you know, industries like healthcare, financial services spending billions, it's not because they look at AI in isolation, they actually looking at the existing processes. So, you know, established disciplines like CRM or supply chain procurement, whether it is contact center and so on. And the examples that we gave you earlier, it's about infusing AI into those existing applications, existing systems. And that's, what's creating the left because what's been missing so far is the silos of data and you traditional traditional transaction systems, but this notion of intelligence that can be infused into the systems and that's, what's creating this massive market opportunity for us. >>Yeah. And I think, um, I think a lot of people just misunderstood in the, or in the early, early days of the AI, you know, new AI when we came out of the AI winter, if you will, people thought, okay, the incumbents are in big trouble now because they are not, they're not AI developers, but really what you guys are showing is it's not about building your own AI. It's about applying AI and having the tools to do so. The incumbents actually have a huge advantage because they've got the systems in place. They can, if they, if they're smart, they can infuse AI and then extract value out of that for their customers. >>And that's why, you know, companies like, uh, like IBM are an investor in a great partner in this space. Anthem is an investor, uh, you know, of the company, but also, you know, someone who can utilize the capabilities, Microsoft, uh, Intel, um, you know, we've been, we've been, uh, you know, really blessed with a great backing Norwest venture partners, um, obviously is, uh, an investor in us as well. So, you know, we've seen the ability to really help those organizations think about, um, you know, where that future lies. But one of the things that is also, you know, one of the gaps in the promises when a C-suite executive like a digital transformation officer, chief digital chief customer officer, they're having their idea, they want to be accountable to that idea. They're having that idea in the boardroom. And they're saying, look, I think I can improve my customer satisfaction and, uh, by 20 points and decrease the cost of my call center by 20 or 30 or 50 points. >>Um, but they need to be able to measure that. So one of the other things that, uh, we've done a cognitive scale is help them understand the progress that they're making across those business goals. Um, now when you think about this people like Andrew Nang, or just really talking about this aspect of goal oriented AI, don't start with the problem, start with what your business goal is, start with, what outcome you're trying to drive, and then think about how AI helps you along that goal. We're delivering this now in our product, our version six product. So while some people are saying, yeah, this is really the right way to potentially do it. We have those capabilities in the product. And what we do is we identify this notion of the campaign, an AI campaign. So when the case that I just gave you where the chief digital officer is saying, I want to drive customer satisfaction up. >>I want to have more explainable decisions, and I want to drive cost down. Maybe I want to drive, call avoidance. Um, you know, and I want to be able to reduce a handling time, um, to drive those costs down, that is a campaign. And then underneath that campaign, there's all sorts of missions that support that campaign. Some of them are very long running. Some of them are very ephemeral. Some of them are cyclical, and we have this notion of the campaign and then admission planner that supports the goals of that campaign, showing that a leader, how they're doing against that goal by measuring the outcomes of every interaction against that mission and all the missions against the campaign. So, you know, we think accountability is an important part of that process as well. And we've never engaged an executive that says, I want to do this, but I don't want to be accountable to the result, but they're having a hard time identifying I'm spending this money. >>How do I ensure that I'm getting the return? And so we've put our, you know, our secret sauce into that space as well. And that includes, you know, the information around the trustworthiness of those, uh, capabilities. Um, and I should mention as well, you know, when we think about that aspect of the responsible AI capabilities, it's really important. The partnerships that we're driving across that space, no one company is going to have the perfect model intelligence tool to be able to address an enterprise's needs. It's much like cybersecurity, right? People thought initially, well, I'll do it myself. I'll just turn up my firewall. You know, I'll make my applications, you know, uh, you know, roll access much more granular. I'll turn down the permissions on the database and I'll be safe from cybersecurity. And then they realized, no, that's not how it was going to work. >>And by the way, the threats already inside and there's, long-term persistent code running, and you have to be able to scan it, have intelligence around it. And there are different capabilities that are specialized for different components of that problem. The same is going to be turnaround responsible and trustworthy AI. So we're partnered with people like IBM, people like Microsoft and others to really understand how we take the best of what it is that they're doing partner with the best, uh, that they're doing and make those outcomes better for clients. And then there's also leaders like the responsible AI Institute, which is a non-profit independent organization who were thinking about a new rating systems for, um, the space of responsible and trusted AI, thinking about things like certifications for professionals that really drive that notion of education, which is an important component of addressing the problem. And we're providing the integration of our tools directly with those assessments and those certifications. So if someone gets started with our platform, they're already using an ecosystem that includes independent thinkers from across the entire industry, um, including public sector, as well as the private sector, to be able to be on the cutting edge of what it's going to take to really step up to the challenge in that space. >>Yeah. You guys got a lot going on. I mean, you're eight years in now and you've got now an executive to really drive the next scale. You mentioned Bob, some of your investors, uh, Anthem, IBM Norwest, uh, I it's Crunchbase, right? It says you've raised 40 million. Is that the right number? Where are you in fundraising? What can you tell? >>Um, they're a little behind where we are, but, uh, you know, we're staged B and, uh, you know, we're looking forward to now really driving that growth. We're past that startup phase, and now we're into the growth phase. Um, and we're seeing, you know, the focus that we've applied in the industries, um, really starting to pay off, you know, initially it would be a couple of months as a customer was starting to understand what to be able to do with our capabilities to address their challenges. Now we're seeing that happen in weeks. So now is the right time to be able to drive that scalability. So we'll be, you know, looking in the market of how we assemble that, uh, you know, necessary capability to grow. Um, Shay and I have worked, uh, in the past year of, uh, with the board support of building out our go to market around that space. >>Um, and in the first hundred days, it's all about alignment because when you're going to go through that growth phase growth phase, you really have to make sure that things were pointed in the right direction and pointed together in the right direction, simplifying what it is that we're doing for the market. So people could really understand, you know, how unique we are in this space, um, and what they can expect out of an engagement with us. Um, and then, you know, really driving that aspect of designing to go to market. Um, and then scaling that. >>Yeah, I think I, it sounds like you've got, you got, if you're, if you're in down to days or weeks in terms of the ROI, it sounds like you've got product market fit nailed. Now it's about sort of the next phase is you really driving your go to market and the science behind how your dimension and your, your sales productivity, and you can now codify what you've learned in that first phase. I like the approach. A lot of, a lot of times you see companies, of course, this comes out of the west coast, east coast guy, but you see the double, double, triple, triple grow, grow, grow, grow, grow, and then, and then churn becomes that silent killer of the S the software company. I think you guys, it sounds you've, you've taken a much, much more adult-like approach, and now you're ready to really drive that scale. I think it's the new formula really for success for hitting escape velocity. Guys, we got to go, but thanks so much. Uh, uh, Bob, I'll give you the last word, w w w what you mentioned some of your a hundred day priorities. Maybe you can summarize that and what should we be looking for as Martin? >>I mean, I, I think, I think the, you know, the, our measures of success are our clients measure success and the same for our partners. So we're not doing this alone, we're doing it with system integrator partners, and we're doing it with a great technology partners in the market as well. So this is a part about keeping that promise for enterprise AI. And one of the things that I'll say just in the last couple of minutes is, you know, this is not just a company with a great vision and great engineers to develop out this great portfolio, but it's a company with great values, great commitments to its employees and the marketplace and the communities we serve. So I was attracted to the culture of this company, as well as I was, uh, to the, uh, innovation and what they mean to the, to the space of a, >>And I said, I said, I'll give you last word. Actually, I got a question for Shea you Austin based, is that correct? >>But we have a global presence, obviously I'm operating out of Austin, other parts of the U S but, uh, offices in, in, uh, in the UK, as well as in India, >>You're not moving to tax-free Texas. Like everybody else. >>I've got to, I've got an important home, uh, and life in Connecticut cell. I'll be traveling back and forth between Connecticut and Austin, but keeping my home there. >>Thanks for coming on and best of luck, we want to follow your progress and really appreciate your time today. Good luck. >>Thank you, Dave. All right. >>Thank you for watching this cube conversation. This is Dave Volante. We'll see you next time.

Published Date : Oct 19 2021

SUMMARY :

but we don't know what happens in the middle. Good to see you again. I think you started the company in 2013. and machine learning in isolation, building models, you know, trying to come up with better ways to So that was really the sort of the thesis behind cognitive scale is how do you apply AI, So, uh, so what was it that you saw in the marketplace that Lord you back in to, And the reason that that gap exists is that, you know, enterprise AI, uh, with, you know, very specific insights and to take that journey and Uh, maybe you could parse that a little bit. you know, you have rules and regulations about when and how you need to engage with you can give us a census to kind of where you started and the evolution of the portfolio And it's truly where you need the notion So not only are you building these end to end systems, assembling them and deploying them, And that allows for those AI developers to rapidly visualize and orchestrate times the data has, you know, aspects of dimensions to it and, Maybe you could tell us, you know, is that where the secret sauce lives, if not, where is it? So we developed an element of being able to rapidly Um, you know, it can be someone who's enjoying a theme park. So that profile of one is kind of the instantiation of that secret sauce, Um, and, and shake and, you know, really talk passionately about some of the things we've helped just the things that you know about the patient you call that declared information. uh, you know, the guidance behind it for audit reasons, but also for ensuring that you don't have a bad outcome. in the neck to go back, but, but the system can now track this and we could get much more accurate in that environment, um, which helps the customer also re you know, realize the value of that operational we know what is, you know, happening with regard to innovation and broadening the people terms of, you know, the capital that's being deployed, both on the resources, as well as the infrastructure, to turn around to the CIO or the chief data officer and say, when can you get me that data? Now we're able to say, look, you know, what's the concept that you're trying to develop. with some, you know, new processors and, and then containerize it, bring it back to my on-premise state that started the process. Can we have that discussion? Um, and when you think about many of those organizations, they're not known to those lending institutions that maybe, you know, they're, they're trying to be married up with. One is, you know, you want to have accurate decisions. And the examples that we gave you earlier, it's about infusing AI the AI, you know, new AI when we came out of the AI winter, if you will, people thought, But one of the things that is also, you know, So when the case that I just gave you where the chief digital officer is saying, Um, you know, and I want to be able to reduce a handling time, Um, and I should mention as well, you know, when we think about that aspect of the responsible AI capabilities, and you have to be able to scan it, have intelligence around it. What can you tell? So we'll be, you know, looking in the market of how we assemble that, uh, you know, Um, and then, you know, really driving that aspect of designing Now it's about sort of the next phase is you really driving your go to market and the science behind how I mean, I, I think, I think the, you know, the, our measures of success are our clients measure success And I said, I said, I'll give you last word. You're not moving to tax-free Texas. I've got to, I've got an important home, uh, and life in Connecticut cell. Thanks for coming on and best of luck, we want to follow your progress and really appreciate your time today. Thank you for watching this cube conversation.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
David	PERSON	0.99+
Bob	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Texas	LOCATION	0.99+
Shay	PERSON	0.99+
Shay Sabhikhi	PERSON	0.99+
UK	LOCATION	0.99+
Connecticut	LOCATION	0.99+
October 2021	DATE	0.99+
India	LOCATION	0.99+
90%	QUANTITY	0.99+
2013	DATE	0.99+
Google	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Robert Picciano	PERSON	0.99+
Andrew Nang	PERSON	0.99+
40 million	QUANTITY	0.99+
Austin	LOCATION	0.99+
two guests	QUANTITY	0.99+
apple	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
eight years	QUANTITY	0.99+
Martin	PERSON	0.99+
20	QUANTITY	0.99+
30	QUANTITY	0.99+
20 points	QUANTITY	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
50 points	QUANTITY	0.99+
Bob pitchy	PERSON	0.99+
Shea speaky	PERSON	0.99+
millions	QUANTITY	0.99+
two	QUANTITY	0.99+
Anthem	ORGANIZATION	0.99+
one	QUANTITY	0.99+
shale	PERSON	0.99+
sixth generation	QUANTITY	0.99+
U S	LOCATION	0.99+
first phase	QUANTITY	0.99+
Isagenix	ORGANIZATION	0.98+
IBM Norwest	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
Matt	PERSON	0.98+
AWS	ORGANIZATION	0.98+
both	QUANTITY	0.98+
Shea	PERSON	0.98+
billions	QUANTITY	0.98+
one element	QUANTITY	0.98+
first hundred days	QUANTITY	0.98+
point B	OTHER	0.97+
Norwest	ORGANIZATION	0.96+
Minosh	PERSON	0.96+
50	QUANTITY	0.96+
one tool	QUANTITY	0.95+
SAS	ORGANIZATION	0.94+
AI Institute	ORGANIZATION	0.94+
Silicon valley	LOCATION	0.93+
Crunchbase	ORGANIZATION	0.91+
point a	OTHER	0.91+

Darrell Jordan Smith, Red Hat | Red Hat Summit 2021 Virtual Experience

(upbeat music) >> And, welcome back to theCube's coverage of Red Hat Summit, 2021. I'm John Furrier, host of theCube. We've got a great segment here on how Red Hat is working with telcos and the disruption in the telco cloud. We've got a great guest Cube alumni, Darrell Jordan Smith, senior vice president of industries and global accounts at Red Hat. Darrell, great to see you. Thanks for coming back on theCube. >> Oh, it's been, it's great to be here and I'm really excited about having the opportunity to talk to you today. >> Yeah, we're not in person, in real life's coming back soon. Although I hear Mobile World Congress, might be in person this year, looking like it's good. A lot of people are going to be virtual and activating I know. A lot to talk about. This is probably one of the most important topics in the industry because when you talk about telco industry, you're really talking about the edge. You're talking about 5G, talking about industrial benefits for business, because it's not just edge for connectivity access. We're talking about innovative things from self-driving cars to business benefits. It's not just consumer, it's really bringing that together. You guys are really leading with the cloud-native platform from REL, OpenShift managed services. Everything about the cloud-native underpinnings, you guys have been successful as a company. But now in your area, telco is being disrupted. You're leading the way >> Absolutely. Give us your take on this, this is super exciting. >> Well, it's actually one of the most exciting times. I've been in the industry for 30 years. I'm probably aging myself now, but in the telecommunications industry, this for me, is the most exciting. It's where, you know, technology is actually going to visibly change, the way, that everyone interacts with the network. And with the applications that are being developed out there on, on our platform. and, you know, as you mentioned, IoT, and a number of the other AI and ML innovations, that are occurring in the marketplace. We're going to see a new wave of applications and innovation. >> What's the key delivery workload you're seeing, with 5G environment. Obviously it's not just, you know 5G in the sense of thinking about mobile phones or mobile computers as they are now. It's not just that consumer, "Hey surf the web and check your email and get an app and download and, and communicate". It's bigger than that now. Can you tell us, where you see the workloads coming in on the 5G environment? >> You, you hit the nail on the head. The, the, the, the killer application, isn't the user or the consumer and the way that we traditionally have known it. Because you might be able to download a video and that might take 20 seconds less, but you're not going to pay an awful lot more money for that. The real opportunity around 5G, is the industrial applications. Things like connected car. You know automotive driving, factory floor automation. How you actually interface digitally with your bank. How we're doing all sorts of things, more intelligently at the edge of the network, using artificial intelligence and machine learning. So all of those things are going to deliver a new experience, for everyone that interacts with the network and the telcos are at the heart of it. >> You know, I want to get into the real kind of underpinnings, of what's going on with the innovations happening. You just kind of laid out kind of the implications of the use cases and the target application workloads, but there's kind of two big things going on with the edge and 5G. One is under the hood networking, you know, what's going on with the moving the packets around the workload, throughput, bandwidth, et cetera, and all that, that goes on under the hood. And then there's the domain expertise in the data, where AI and machine learning have to kind of weave in. So let's take the first part, first. OpenShift is out there. Red Hat's got a lot of products, but you have to nail the networking requirements and cloud native with containerization, because at large scales, not just packets, it's all kinds of things going on, security, managing compute at the edge. There's a lot of things under the hood, if you will, from a networking perspective. >> Could you share what Red Hat's doing in that area? >> Yep, so, so that's a very good question, in that we've been building on our experience with OpenStack and the last time I was on theCube, I talked about, you know, people virtualizing network applications and network services. We're taking a lot of that knowledge, that we've learned from OpenStack and we're bringing that into the container based world. So we're looking at how we accelerate packets. We're looking at how we build cloud-native applications, on bare metal, in order to drive that level of performance. We're looking at actually how we do, the certification around these applications and services, because they may be sitting in different applets across the cloud. And in some instances running on multiple clouds, at the same time. So we're building on our experience from OpenStack. We're bringing all of that into OpenShift, our container based environment. With all of the tooling necessary to make that effective. >> It's interesting with all the automation going on and certainly with the edge developing nicely, the way you're describing it, it's certainly disrupting the telco cloud. You have an operator mindset a cloud-native operator thinking, kind of, I mean it's distributed computing. We know that, but it's hybrid. So it's essentially cloud operations. So there's an operator mindset here, that's just different. Could you just share quickly, before we move on to the next segment, what's different about this operating model, for the, these new kinds of operators. As, as you guys have been saying, the CIO is the new cloud operator. That's the skill set they have to be thinking. And certainly IT, to anyone else provisioning and managing infrastructure has to think like an operator, what's your view? >> Exactly. They certainly do need to think like an operator. They need to look at how they automate a lot of these functions, because they're actually deployed in many different places, all at the same time. They have to live independently of each other, that's what cloud-native actually really is. So the whole, the whole notion of five nines and vertically orientated stacks of five nines availability that's kind of going out the window. We're looking at application availability, across a hybrid cloud environment and making sure the application can live and sustain itself. So operators as part of OpenShift is one element of that, operations in terms of management and orchestration and all the tooling that we actually also provide as Red Hat, but also in conjunction with a big partner ecosystem, such as companies like Netcracker, for example, or IBM as another example. Or Ericsson bringing their automation tool sets and their orchestration tool sets, to that whole equation, to address exactly that problem. >> Yeah. You bring up the ecosystem and this is really an interesting point. I want to, just hit on that real quick, because it reminds me of the days, when we had this massive innovation wave in the nineties. During that era, the client server movement, really was about multi-vendor, right? And that, you start to see that now and where this ties into here I think, is and I want to get your reaction to this is that, you know, moving to the cloud was all about to 2015, moved to the cloud, move to the cloud, cloud-native. Now it's all about not only being agile and better performance, but you're going to have smaller footprints, with more security requirements, more net, enterprise requirements. This is now, it's more complicated. So you have to kind of make the complication go away. And now you have more people in the ecosystem, filling in these white spaces. So, you have to be performance and purpose built, if you will. I hate to use that word, but, or, or at least performing and agile, smaller footprint, greater security, enabling other people to participate. That's a requirement. Can you share your reactions to that? >> Well, that's core of what we do at Red Hat. I mean, we take open source community software, into a hardened distribution, fit for the telecommunications marketplace. So we're very adapt to working with communities and third parties. That ecosystem is really important to us. We're investing hundreds of engineers, literally hundreds of engineers, working with our ecosystem partners, to make sure that their application is services certified running on our platform. But also importantly, is certified to be running in conjunction with other cloud-native applications that sit under the same cloud. So that, that is not trivial to achieve, in any stretch of the imagination. And a lot of IT technology skills come to bear. And as you mentioned earlier a lot of networking skills, things that we've learned, and we build with a lot of these traditional vendors as we bring that to the marketplace. >> You know, I've been saying on theCube, I think five years ago, I started talking about this and it was kind of a loose formulation. I want to get your reaction, because you brought up ecosystem. Now saying, you know, you're going to see the big clouds develop obviously Amazon and Microsoft came in after and now Google and others. And then I said, there's going to be a huge wave of, of what I call secondary clouds. And you see companies, like Snowflake building on top of Amazon. And so you start to see the power law, of new cloud service providers emerging, that can either sit and work with, across multiple clouds, either one cloud or others, that's now multi-cloud and hybrid. But this rise of the new, more CSPs, more cloud service providers. This is a huge part of your area right now because some call that telco, telco cloud, edge hits that. What is Red Hat doing in this cloud service provider market specifically? How do you help them? If I'm a cloud service provider, what do I get in working with Red Hat? How do I be successful? Because it's very easy to be a cloud service provider now more than ever. What do I do? How do you help? How do you help me? >> Well, we, we, we offer a, a platform called OpenShift which is our containerized based platform, but it's not just a container. It involves huge amounts of tooling associated with operating it, developing in and around it. So the, the concept that we have, is that you can bring those applications, develop them once, on one, one single platform, and run it on premise. You can run it natively as a service in Microsoft's environment. You can actually run it natively as a service in Amazon's environment. You can run it natively in IBM's environment. You can build an application once and run it in all of them, depending on what you want to achieve and who actually provides you the best zoning, the best terms and conditions, the best, the best tooling in terms of other services, such as an AI, associated with that. So it's all about developing it once, certifying it once, but deploying it in many, many different locations, leveraging the largest possible developer ecosystem, to drive innovation through applications on that common platform. >> So the assumption there, is that's going to drive down costs. Can you tell me about why the benefits, the economics are there? Talk about the economics. >> Well, Yeah, so, so, A, it does drive down costs and that's an important aspect but more importantly, it drives up agility, so time to market advantage is actually attainable for you. So many of the telcos when they deploy a network service, traditionally it would take them literally, maybe a year to roll it all out. They have to do it in days, they have to do updates in real time, in day two operations, in literally minutes. So we were building the fabric necessary, in order to enable those applications and services to occur. And as you move into the edge of the network and you look at things like private 5G networks, service providers or telcos, in this instance, will be able to deliver services all the way out to the edge, into that private 5G environment and operate that, in conjunction with those enterprise clients. >> So OpenShift allows me if I get this right, from the CSP to run, have a horizontally scalable organization. Okay. And from a unification platform standpoint. Okay. Whether it's 5G and other functions, is that correct? >> Darrell: That's correct. >> Okay. So you've got that. Now I want to come in and bring in the top of the stack with the other element that's been been a big conversation here at Red Hat Summit and in the industry. That is AI and the use of data. One of the things that's emerging is the ability to have both the horizontal scale, as well as the specialism of the data and have that domain expertise. You're in the industries for Red Hat. This is important because you're going to have, one industry is going to have different jargon, different language, different data, different KPIs. So you got to have that domain expertise, to enable the ability, to, to write the apps and also enable AI. Can you comment on how that works and what's Red Hat do in there? >> So, so, so, we, we're developing OpenShift and a number of our, other technologies, to be fit for the edge of the network, where a lot of these AI applications will reside, because you want them at the closest to the client or the, or the application itself, where it needs to reside. We're, we're creating that edge fabric, if you like. The next generation of hybrid cloud is really going to be, in my view at the edge. We're enabling a lot of the service providers to go after that, but we're also igniting by industry. You mentioned different industries. So if I look at, for example, manufacturing with MindSphere, we recently announced with Siemens, how they do at the edge of the network, factory automation, collecting telemetry, doing real-time data and analytics, looking at materials going through the factory floor, in order to get a better quality result, with lower, lower levels of imperfections, as they run through that system. It's just one industry and they have, their own private and favorite AI platforms and data sets they want to work with. With their own data scientists who understand that, that, that ecosystem inherently. You can move that to healthcare. And you can imagine, you know, how you actually interface with your healthcare professionals here in North America, but also around the world. How those applications and services and what the AI needs to do, in terms of understanding x-rays and looking at, you know common errors associated with different x-rays, so, so our practitioner can make a more specific diagnosis, faster, saving money and potentially lives as well. So different, different vertical markets in this space, have different AI and ML requirements and needs, different data sciences and different data models. And what we're seeing is an ecosystem of companies, that are starting up there in that space, you know, we have Watson as part of IBM, but you have Perceptor Labs, you have H2O and a number of other, very very important AI based companies in that ecosystem. >> Yeah. And you've got the horizontal scalability of the control plane then in the platform, if you will, that gives us cross-organizational leverage and enable that, that vertical domain expertise. >> Exactly. And you'd want to build an AI application, that might run on a factory floor for certain reasons, it's location and what they're actually physically building. You might want to run that on premise. You might actually want to put it in the IBM cloud, or in Zuora or into AWS. You develop it once to OpenShift, you can deploy it in all of those as a service, sitting natively in those environments. >> Darrell, great chat. You got a lot going on. telco cloud, there is a lot of cloud-native disruption going on. It's a challenge and an opportunity. And some people have to be on the right side of history, on this one, if they're going to get it right. We'll know, and the scoreboard will be very clear, 'cause this is a shift, it's a shift. So again, you hit all the key points that I wanted to get out, but I want to ask you two more areas that are hot here at Red Hat Summit 21, as well, again as well in the industry. I want to get your reaction and thoughts on. And they are DevSecOps and automation. Okay. Two areas everyone's talking about, DevOps, which we know is infrastructure as code, programmability, under the hood, modern application development, all good. You add the second there, security, DevSecOps, it's critical. Automation is continuing to be the benefits of cloud-native. So DevSecOps and automation, what's your take, and how's that impact the telco world and your world? >> You can't, you can't operate a network without having security in place. You're talking about very sensitive data. You're talking about applications that could be real-time critical And this is actually, even lifesaving or life threatening, if you don't get them right. So the acquisition that Red Hat recently made around StackRox, really helps us, make that next level of transition into that space. And we're looking at about how we go about securing containers, in a cloud-native environment. As you can imagine, there'll be many many thousands, tens of thousands of containers running. If one is actually misbehaving for want of a better term, that creates a security risk and a security loophole. We're shoring that up. That's important for the deployment OpenShift in the telco domain and other domains. In terms of automation, if you can't do it at scale and if you look at 5G and you look at the radios at the edge of the network and how you're going to provision those services. You're talking about hundreds of thousands of nodes, hundreds of thousands. So you have to automate a lot of those processes, otherwise you can't scale to meet the opportunity. You can't physically deploy. >> You know, Darrell this is a great conversation, you know as a student of history and Dave Vellante and I always kind of joke about that. And you've been in and around the industry for a long time. Telcos have been balancing this evolution of digital business for many, many decades. And now with cloud-native, it's finally a time where you're startin' to see, that it's just the same game, now, new infrastructure. You know, video, voice, text, data, all now happening, all transformed and going digital, all the way, all aspects of it. In your opinion, how should telcos be thinking about, as they put their plans in place for next generation? Because you know, the world is, is now cloud-native. There's a huge surface here of opportunities, different ecosystem relationships. The power dynamics are shifting. It's, it's really a time where there will be winners and there will be losers. What's your, what's your view on on how the telco industry needs to Cloudify, and how to be positioned for success? >> So, so one of the things I, I truly believe very deeply, that the telcos need to create a platform, horizontal platform that attracts developer and ecosystems to their platform, because innovation is going to sit elsewhere. Then you know, there might be a killer application that one telco might create, but in reality, most of those innovations, the most of those disruptors are going to occur from outside of that telco company. So you want to create an environment, where you're easy to engage and you've got maximum sets of tools and versatility and agility in order to attract that innovation. If you attract the innovation, you're going to ignite the business opportunity that 5G and 6G and beyond is going to actually provide you, or enable your business to drive. And you've really got to unlock that innovation. And you can only unlock it, in our view at Red Hat innovation, if you're open. You know, you follow open standards, you're using open systems and open source, is a method or a tool, that you guys, if you're a telco I would ask, you guys need to leverage and harness. >> Yeah. And there's a lot. And there's a lot of upside there if you get that right. >> Yes. >> There's plenty of upside. A lot of leverage, a lot of assets, take advantage of the whole offline, online, coming back together. We are living in a hybrid world, certainly with the pandemic. We've seen what that means. It's put a spotlight, on critical infrastructure and the critical shifts. If you had to kind of get pinned down Darrell, how would you describe that learnings from the pandemic. As folks start to come out of the pandemic, there is a light at the end of the tunnel. As we come out of this pandemic, companies want a growth strategy. Want to be positioned for success. What's your learning coming out of the pandemic? >> So from, from my perspective, which really kind of in one respect was, was very admirable, but, in another respect is actually deeply, a lot of gratitude, is the fact that the telecommunications companies, because of their carrier grade capabilities and their operational prowess, were able to keep their networks up and running and they had to move significant capacity from major cities to rural areas, because everyone was working from home. And in many different countries around the world, they did that extremely, extremely well. And their networks held up. I don't know, and maybe someone will correct me and email me, but I don't know one telco had a huge network outage, through this pandemic. And that kept us connected. It kept us working. And it also, what I also learned is, that in certain countries, particularly Latam, where they have a very large prepaid market. They were worried that the prepaid market in the pandemic would go down, because they felt that people would have less money to spend. And therefore they wouldn't top up their phones as much. The opposite effect occurred. They saw prepaid grow. And that really taught me, that, that connectivity is critical, in times of stress, that we are also, where everyone's going through. So, I think there were some key learnings there. >> Yeah, I think you're right on the money there. It's like they pulled the curtain back of all the FUD and said, you know, necessity's the mother of invention. And when you look at what happened and what had to happen, to survive in the pandemic and be functional, you're, you nailed it. The network stability, the resilience, but also the new capabilities that were needed, had to be delivered in an agile way. And I think, you know, it's pretty much a forcing function, for all the projects that are on the table, to know which ones to double down on. So, I think you pretty much nailed it. >> Thank you. Darrell Jordan Smith, senior vice president of industries and global accounts for Red Hat, theCube alumni. Thanks for that insight. Thanks for sharing. Great conversation around telcos and telco clouds and all the edge opportunities. Thanks for coming on. >> Thank you, John. >> Okay. It's theCube's coverage of Red Hat Summit 21. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Apr 28 2021

SUMMARY :

and the disruption in the telco cloud. to talk to you today. in the industry because when Give us your take on this, and a number of the other coming in on the 5G environment? and the way that we kind of the implications and the last time I was on it's certainly disrupting the telco cloud. and all the tooling And that, you start to see that now in any stretch of the imagination. And so you start to see the power law, is that you can bring those applications, So the assumption there, So many of the telcos from the CSP to run, and bring in the top of the stack the closest to the client the platform, if you will, put it in the IBM cloud, and how's that impact the and if you look at 5G and going digital, all the that the telcos need to create a platform, there if you get that right. and the critical shifts. in the pandemic would go down, that are on the table, the edge opportunities. coverage of Red Hat Summit 21.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Ericsson	ORGANIZATION	0.99+
Darrell	PERSON	0.99+
Siemens	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Google	ORGANIZATION	0.99+
20 seconds	QUANTITY	0.99+
Darrell Jordan Smith	PERSON	0.99+
2015	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
30 years	QUANTITY	0.99+
North America	LOCATION	0.99+
John	PERSON	0.99+
telco	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Netcracker	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Perceptor Labs	ORGANIZATION	0.99+
one element	QUANTITY	0.98+
five years ago	DATE	0.98+
telcos	ORGANIZATION	0.98+
both	QUANTITY	0.98+
theCube	ORGANIZATION	0.98+
OpenStack	ORGANIZATION	0.98+
Red Hat Summit 21	EVENT	0.98+
Cube	ORGANIZATION	0.98+
One	QUANTITY	0.98+
OpenShift	TITLE	0.98+
Mobile World Congress	EVENT	0.98+
Telcos	ORGANIZATION	0.98+
MindSphere	ORGANIZATION	0.98+
Two areas	QUANTITY	0.98+
Red Hat Summit	EVENT	0.98+
one industry	QUANTITY	0.98+
a year	QUANTITY	0.97+
today	DATE	0.97+
first part	QUANTITY	0.97+
hundreds of thousands	QUANTITY	0.97+
thousands	QUANTITY	0.97+
this year	DATE	0.96+
one	QUANTITY	0.96+
five	QUANTITY	0.95+
pandemic	EVENT	0.95+
DevSecOps	TITLE	0.94+
Red Hat Summit 2021	EVENT	0.94+
tens of thousands	QUANTITY	0.93+
one respect	QUANTITY	0.93+
two big	QUANTITY	0.92+
one cloud	QUANTITY	0.91+
hundreds of engineers	QUANTITY	0.9+
nineties	DATE	0.9+
two more areas	QUANTITY	0.89+

Christian Keynote with Disclaimer

(upbeat music) >> Hi everyone, thank you for joining us at the Data Cloud Summit. The last couple of months have been an exciting time at Snowflake. And yet, what's even more compelling to all of us at Snowflake is what's ahead. Today I have the opportunity to share new product developments that will extend the reach and impact of our Data Cloud and improve the experience of Snowflake users. Our product strategy is focused on four major areas. First, Data Cloud content. In the Data Cloud silos are eliminated and our vision is to bring the world's data within reach of every organization. You'll hear about new data sets and data services available in our data marketplace and see how previous barriers to sourcing and unifying data are eliminated. Second, extensible data pipelines. As you gain frictionless access to a broader set of data through the Data Cloud, Snowflakes platform brings additional capabilities and extensibility to your data pipelines, simplifying data ingestion, and transformation. Third, data governance. The Data Cloud eliminates silos and breaks down barriers and in a world where data collaboration is the norm, the importance of data governance is ratified and elevated. We'll share new advancements to support how the world's most demanding organizations mobilize your data while maintaining high standards of compliance and governance. Finally, our fourth area focuses on platform performance and capabilities. We remain laser focused on continuing to lead with the most performant and capable data platform. We have some exciting news to share about the core engine of Snowflake. As always, we love showing you Snowflake in action, and we prepared some demos for you. Also, we'll keep coming back to the fact that one of the characteristics of Snowflake that we're proud as staff is that we offer a single platform from which you can operate all of your data workloads, across clouds and across regions, which workloads you may ask, specifically, data warehousing, data lake, data science, data engineering, data applications, and data sharing. Snowflake makes it possible to mobilize all your data in service of your business without the cost, complexity and overhead of managing multiple systems, tools and vendors. Let's dive in. As you heard from Frank, the Data Cloud offers a unique capability to connect organizations and create collaboration and innovation across industries fueled by data. The Snowflake data marketplace is the gateway to the Data Cloud, providing visibility for organizations to browse and discover data that can help them make better decisions. For data providers on the marketplace, there is a new opportunity to reach new customers, create new revenue streams, and radically decrease the effort and time to data delivery. Our marketplace dramatically reduces the friction of sharing and collaborating with data opening up new possibilities to all participants in the Data Cloud. We introduced the Snowflake data marketplace in 2019. And it is now home to over 100 data providers, with half of them having joined the marketplace in the last four months. Since our most recent product announcements in June, we have continued broadening the availability of the data marketplace, across regions and across clouds. Our data marketplace provides the opportunity for data providers to reach consumers across cloud and regional boundaries. A critical aspect of the Data Cloud is that we envisioned organizations collaborating not just in terms of data, but also data powered applications and services. Think of instances where a provider doesn't want to open access to the entirety of a data set, but wants to provide access to business logic that has access and leverages such data set. That is what we call data services. And we want Snowflake to be the platform of choice for developing discovering and consuming such rich building blocks. To see How the data marketplace comes to live, and in particular one of these data services, let's jump into a demo. For all of our demos today, we're going to put ourselves in the shoes of a fictional global insurance company. We've called it Insureco. Insurance is a data intensive and highly regulated industry. Having the right access control and insight from data is core to every insurance company's success. I'm going to turn it over to Prasanna to show how the Snowflake data marketplace can solve a data discoverability and access problem. >> Let's look at how Insureco can leverage data and data services from the Snowflake data marketplace and use it in conjunction with its own data in the Data Cloud to do three things, better detect fraudulent claims, arm its agents with the right information, and benchmark business health against competition. Let's start with detecting fraudulent claims. I'm an analyst in the Claims Department. I have auto claims data in my account. I can see there are 2000 auto claims, many of these submitted by auto body shops. I need to determine if they are valid and legitimate. In particular, could some of these be insurance fraud? By going to the Snowflake data marketplace where numerous data providers and data service providers can list their offerings, I find the quantifying data service. It uses a combination of external data sources and predictive risk typology models to inform the risk level of an organization. Quantifying external sources include sanctions and blacklists, negative news, social media, and real time search engine results. That's a wealth of data and models built on that data which we don't have internally. So I'd like to use Quantifind to determine a fraud risk score for each auto body shop that has submitted a claim. First, the Snowflake data marketplace made it really easy for me to discover a data service like this. Without the data marketplace, finding such a service would be a lengthy ad hoc process of doing web searches and asking around. Second, once I find Quantifind, I can use Quantifind service against my own data in three simple steps using data sharing. I create a table with the names and addresses of auto body shops that have submitted claims. I then share the table with Quantifind to start the risk assessment. Quantifind does the risk scoring and shares the data back with me. Quantifind uses external functions which we introduced in June to get results from their risk prediction models. Without Snowflake data sharing, we would have had to contact Quantifind to understand what format they wanted the data in, then extract this data into a file, FTP the file to Quantifind, wait for the results, then ingest the results back into our systems for them to be usable. Or I would have had to write code to call Quantifinds API. All of that would have taken days. In contrast, with data sharing, I can set this up in minutes. What's more, now that I have set this up, as new claims are added in the future, they will automatically leverage Quantifind's data service. I view the scores returned by Quantifind and see the two entities in my claims data have a high score for insurance fraud risk. I open up the link returned by Quantifind to read more, and find that this organization has been involved in an insurance crime ring. Looks like that is a claim that we won't be approving. Using the Quantifind data service through the Snowflake data marketplace gives me access to a risk scoring capability that we don't have in house without having to call custom APIs. For a provider like Quantifind this drives new leads and monetization opportunities. Now that I have identified potentially fraudulent claims, let's move on to the second part. I would like to share this fraud risk information with the agents who sold the corresponding policies. To do this, I need two things. First, I need to find the agents who sold these policies. Then I need to share with these agents the fraud risk information that we got from Quantifind. But I want to share it such that each agent only sees the fraud risk information corresponding to claims for policies that they wrote. To find agents who sold these policies, I need to look up our Salesforce data. I can find this easily within Insureco's internal data exchange. I see there's a listing with Salesforce data. Our sales Ops team has published this listing so I know it's our officially blessed data set, and I can immediately access it from my Snowflake account without copying any data or having to set up ETL. I can now join Salesforce data with my claims to identify the agents for the policies that were flagged to have fraudulent claims. I also have the Snowflake account information for each agent. Next, I create a secure view that joins on an entitlements table, such that each agent can only see the rows corresponding to policies that they have sold. I then share this directly with the agents. This share contains the secure view that I created with the names of the auto body shops, and the fraud risk identified by Quantifind. Finally, let's move on to the third and last part. Now that I have detected potentially fraudulent claims, I'm going to move on to building a dashboard that our executives have been asking for. They want to see how Insureco compares against other auto insurance companies on key metrics, like total claims paid out for the auto insurance line of business nationwide. I go to the Snowflake data marketplace and find SNL U.S. Insurance Statutory Data from SNP. This data is included with Insureco's existing subscription with SMP so when I request access to it, SMP can immediately share this data with me through Snowflake data sharing. I create a virtual database from the share, and I'm ready to query this data, no ETL needed. And since this is a virtual database, pointing to the original data in SNP Snowflake account, I have access to the latest data as it arrives in SNPs account. I see that the SNL U.S. Insurance Statutory Data from SNP has data on assets, premiums earned and claims paid out by each us insurance company in 2019. This data is broken up by line of business and geography and in many cases goes beyond the data that would be available from public financial filings. This is exactly the data I need. I identify a subset of comparable insurance companies whose net total assets are within 20% of Insureco's, and whose lines of business are similar to ours. I can now create a Snow site dashboard that compares Insureco against similar insurance companies on key metrics, like net earned premiums, and net claims paid out in 2019 for auto insurance. I can see that while we are below median our net earned premiums, we are doing better than our competition on total claims paid out in 2019, which could be a reflection of our improved claims handling and fraud detection. That's a good insight that I can share with our executives. In summary, the Data Cloud enabled me to do three key things. First, seamlessly fine data and data services that I need to do my job, be it an external data service like Quantifind and external data set from SNP or internal data from Insureco's data exchange. Second, get immediate live access to this data. And third, control and manage collaboration around this data. With Snowflake, I can mobilize data and data services across my business ecosystem in just minutes. >> Thank you Prasanna. Now I want to turn our focus to extensible data pipelines. We believe there are two different and important ways of making Snowflakes platform highly extensible. First, by enabling teams to leverage services or business logic that live outside of Snowflake interacting with data within Snowflake. We do this through a feature called external functions, a mechanism to conveniently bring data to where the computation is. We announced this feature for calling regional endpoints via AWS gateway in June, and it's currently available in public preview. We are also now in public preview supporting Azure API management and will soon support Google API gateway and AWS private endpoints. The second extensibility mechanism does the converse. It brings the computation to Snowflake to run closer to the data. We will do this by enabling the creation of functions and procedures in SQL, Java, Scala or Python ultimately providing choice based on the programming language preference for you or your organization. You will see Java, Scala and Python available through private and public previews in the future. The possibilities enabled by these extensibility features are broad and powerful. However, our commitment to being a great platform for data engineers, data scientists and developers goes far beyond programming language. Today, I am delighted to announce Snowpark a family of libraries that will bring a new experience to programming data in Snowflake. Snowpark enables you to write code directly against Snowflake in a way that is deeply integrated into the languages I mentioned earlier, using familiar concepts like DataFrames. But the most important aspect of Snowpark is that it has been designed and optimized to leverage the Snowflake engine with its main characteristics and benefits, performance, reliability, and scalability with near zero maintenance. Think of the power of a declarative SQL statements available through a well known API in Scala, Java or Python, all these against data governed in your core data platform. We believe Snowpark will be transformative for data programmability. I'd like to introduce Sri to showcase how our fictitious insurance company Insureco will be able to take advantage of the Snowpark API for data science workloads. >> Thanks Christian, hi, everyone? I'm Sri Chintala, a product manager at Snowflake focused on extensible data pipelines. And today, I'm very excited to show you a preview of Snowpark. In our first demo, we saw how Insureco could identify potentially fraudulent claims. Now, for all the valid claims InsureCo wants to ensure they're providing excellent customer service. To do that, they put in place a system to transcribe all of their customer calls, so they can look for patterns. A simple thing they'd like to do is detect the sentiment of each call so they can tell which calls were good and which were problematic. They can then better train their claim agents for challenging calls. Let's take a quick look at the work they've done so far. InsureCo's data science team use Snowflakes external functions to quickly and easily train a machine learning model in H2O AI. Snowflake has direct integrations with H2O and many other data science providers giving Insureco the flexibility to use a wide variety of data science libraries frameworks or tools to train their model. Now that the team has a custom trained sentiment model tailored to their specific claims data, let's see how a data engineer at Insureco can use Snowpark to build a data pipeline that scores customer call logs using the model hosted right inside of Snowflake. As you can see, we have the transcribed call logs stored in the customer call logs table inside Snowflake. Now, as a data engineer trained in Scala, and used to working with systems like Spark and Pandas, I want to use familiar programming concepts to build my pipeline. Snowpark solves for this by letting me use popular programming languages like Java or Scala. It also provides familiar concepts in APIs, such as the DataFrame abstraction, optimized to leverage and run natively on the Snowflake engine. So here I am in my ID, where I've written a simple scalar program using the Snowpark libraries. The first step in using the Snowpark API is establishing a session with Snowflake. I use the session builder object and specify the required details to connect. Now, I can create a DataFrame for the data in the transcripts column of the customer call logs table. As you can see, the Snowpark API provides native language constructs for data manipulation. Here, I use the Select method provided by the API to specify the column names to return rather than writing select transcripts as a string. By using the native language constructs provided by the API, I benefit from features like IntelliSense and type checking. Here you can see some of the other common methods that the DataFrame class offers like filters like join and others. Next, I define a get sentiment user defined function that will return a sentiment score for an input string by using our pre trained H2O model. From the UDF, we call the score method that initializes and runs the sentiment model. I've built this helper into a Java file, which along with the model object and license are added as dependencies that Snowpark will send to Snowflake for execution. As a developer, this is all programming that I'm familiar with. We can now call our get sentiment function on the transcripts column of the DataFrame and right back the results of the score transcripts to a new target table. Let's run this code and switch over to Snowflake to see the score data and also all the work that Snowpark has done for us on the back end. If I do a select star from scored logs, we can see the sentiment score of each call right alongside the transcript. With Snowpark all the logic in my program is pushed down into Snowflake. I can see in the query history that Snowpark has created a temporary Java function to host the pre trained H20 model, and that the model is running right in my Snowflake warehouse. Snowpark has allowed us to do something completely new in Snowflake. Let's recap what we saw. With Snowpark, Insureco was able to use their preferred programming language, Scala and use the familiar DataFrame constructs to score data using a machine learning model. With support for Java UDFs, they were able to run a train model natively within Snowflake. And finally, we saw how Snowpark executed computationally intensive data science workloads right within Snowflake. This simplifies Insureco's data pipeline architecture, as it reduces the number of additional systems they have to manage. We hope that extensibility with Scala, Java and Snowpark will enable our users to work with Snowflake in their preferred way while keeping the architecture simple. We are very excited to see how you use Snowpark to extend your data pipelines. Thank you for watching and with that back to you, Christian. >> Thank you Sri. You saw how Sri could utilize Snowpark to efficiently perform advanced sentiment analysis. But of course, if this use case was important to your business, you don't want to fully automate this pipeline and analysis. Imagine being able to do all of the following in Snowflake, your pipeline could start far upstream of what you saw in the demo. By storing your actual customer care call recordings in Snowflake, you may notice that this is new for Snowflake. We'll come back to the idea of storing unstructured data in Snowflake at the end of my talk today. Once you have the data in Snowflake, you can use our streams and past capabilities to call an external function to transcribe these files. To simplify this flow even further, we plan to introduce a serverless execution model for tasks where Snowflake can automatically size and manage resources for you. After this step, you can use the same serverless task to execute sentiment scoring of your transcript as shown in the demo with incremental processing as each transcript is created. Finally, you can surface the sentiment score either via snow side, or through any tool you use to share insights throughout your organization. In this example, you see data being transformed from a raw asset into a higher level of information that can drive business action, all fully automated all in Snowflake. Turning back to Insureco, you know how important data governance is for any major enterprise but particularly for one in this industry. Insurance companies manage highly sensitive data about their customers, and have some of the strictest requirements for storing and tracking such data, as well as managing and governing it. At Snowflake, we think about governance as the ability to know your data, manage your data and collaborate with confidence. As you saw in our first demo, the Data Cloud enables seamless collaboration, control and access to data via the Snowflake data marketplace. And companies may set up their own data exchanges to create similar collaboration and control across their ecosystems. In future releases, we expect to deliver enhancements that create more visibility into who has access to what data and provide usage information of that data. Today, we are announcing a new capability to help Snowflake users better know and organize your data. This is our new tagging framework. Tagging in Snowflake will allow user defined metadata to be attached to a variety of objects. We built a broad and robust framework with powerful implications. Think of the ability to annotate warehouses with cost center information for tracking or think of annotating tables and columns with sensitivity classifications. Our tagging capability will enable the creation of companies specific business annotations for objects in Snowflakes platform. Another key aspect of data governance in Snowflake is our policy based framework where you specify what you want to be true about your data, and Snowflake enforces those policies. We announced one such policy earlier this year, our dynamic data masking capability, which is now available in public preview. Today, we are announcing a great complimentary a policy to achieve row level security to see how role level security can enhance InsureCo's ability to govern and secure data. I'll hand it over to Artin for a demo. >> Hello, I'm Martin Avanes, Director of Product Management for Snowflake. As Christian has already mentioned, the rise of the Data Cloud greatly accelerates the ability to access and share diverse data leading to greater data collaboration across teams and organizations. Controlling data access with ease and ensuring compliance at the same time is top of mind for users. Today, I'm thrilled to announce our new row access policies that will allow users to define various rules for accessing data in the Data Cloud. Let's check back in with Insureco to see some of these in action and highlight how those work with other existing policies one can define in Snowflake. Because Insureco is a multinational company, it has to take extra measures to ensure data across geographic boundaries is protected to meet a wide range of compliance requirements. The Insureco team has been asked to segment what data sales team members have access to based on where they are regionally. In order to make this possible, they will use Snowflakes row access policies to implement row level security. We are going to apply policies for three Insureco's sales team members with different roles. Alice, an executive must be able to view sales data from both North America and Europe. Alex in North America sales manager will be limited to access sales data from North America only. And Jordan, a Europe sales manager will be limited to access sales data from Europe only. As a first step, the security administrator needs to create a lookup table that will be used to determine which data is accessible based on each role. As you can see, the lookup table has the row and their associated region, both of which will be used to apply policies that we will now create. Row access policies are implemented using standard SQL syntax to make it easy for administrators to create policies like the one our administrators looking to implement. And similar to masking policies, row access policies are leveraging our flexible and expressive policy language. In this demo, our admin users to create a row access policy that uses the row and region of a user to determine what row level data they have access to when queries are executed. When users queries are executed against the table protected by such a row access policy, Snowflakes query engine will dynamically generate and apply the corresponding predicate to filter out rows the user is not supposed to see. With the policy now created, let's log in as our Sales Users and see if it worked. Recall that as a sales executive, Alice should have the ability to see all rows from North America and Europe. Sure enough, when she runs her query, she can see all rows so we know the policy is working for her. You may also have noticed that some columns are showing masked data. That's because our administrator's also using our previously announced data masking capabilities to protect these data attributes for everyone in sales. When we look at our other users, we should notice that the same columns are also masked for them. As you see, you can easily combine masking and row access policies on the same data sets. Now let's look at Alex, our North American sales manager. Alex runs to st Korea's Alice, row access policies leverage the lookup table to dynamically generate the corresponding predicates for this query. The result is we see that only the data for North America is visible. Notice too that the same columns are still masked. Finally, let's try Jordan, our European sales manager. Jordan runs the query and the result is only the data for Europe with the same columns also masked. And you reintroduced masking policies, today you saw row access policies in action. And similar to our masking policies, row access policies in Snowflake will be accepted Hands of capability integrated seamlessly across all of Snowflake everywhere you expect it to work it does. If you're accessing data stored in external tables, semi structured JSON data, or building data pipelines via streams or plan to leverage Snowflakes data sharing functionality, you will be able to implement complex row access policies for all these diverse use cases and workloads within Snowflake. And with Snowflakes unique replication feature, you can instantly apply these new policies consistently to all of your Snowflake accounts, ensuring governance across regions and even across different clouds. In the future, we plan to demonstrate how to combine our new tagging capabilities with Snowflakes policies, allowing advanced audit and enforcing those policies with ease. And with that, let's pass it back over to Christian. >> Thank you Artin. We look forward to making this new tagging and row level security capabilities available in private preview in the coming months. One last note on the broad area of data governance. A big aspect of the Data Cloud is the mobilization of data to be used across organizations. At the same time, privacy is an important consideration to ensure the protection of sensitive, personal or potentially identifying information. We're working on a set of product capabilities to simplify compliance with privacy related regulatory requirements, and simplify the process of collaborating with data while preserving privacy. Earlier this year, Snowflake acquired a company called Crypto Numerix to accelerate our efforts on this front, including the identification and anonymization of sensitive data. We look forward to sharing more details in the future. We've just shown you three demos of new and exciting ways to use Snowflake. However, I want to also remind you that our commitment to the core platform has never been greater. As you move workloads on to Snowflake, we know you expect exceptional price performance and continued delivery of new capabilities that benefit every workload. On price performance, we continue to drive performance improvements throughout the platform. Let me give you an example comparing an identical set of customers submitted queries that ran both in August of 2019, and August of 2020. If I look at the set of queries that took more than one second to compile 72% of those improved by at least 50%. When we make these improvements, execution time goes down. And by implication, the required compute time is also reduced. Based on our pricing model to charge for what you use, performance improvements not only deliver faster insights, but also translate into cost savings for you. In addition, we have two new major announcements on performance to share today. First, we announced our search optimization service during our June event. This service currently in public preview can be enabled on a table by table basis, and is able to dramatically accelerate lookup queries on any column, particularly those not used as clustering columns. We initially support equality comparisons only, and today we're announcing expanded support for searches in values, such as pattern matching within strings. This will unlock a number of additional use cases such as analytics on logs data for performance or security purposes. This expanded support is currently being validated by a few customers in private preview, and will be broadly available in the future. Second, I'd like to introduce a new service that will be in private preview in a future release. The query acceleration service. This new feature will automatically identify and scale out parts of a query that could benefit from additional resources and parallelization. This means that you will be able to realize dramatic improvements in performance. This is especially impactful for data science and other scan intensive workloads. Using this feature is pretty simple. You define a maximum amount of additional resources that can be recruited by a warehouse for acceleration, and the service decides when it would be beneficial to use them. Given enough resources, a query over a massive data set can see orders of magnitude performance improvement compared to the same query without acceleration enabled. In our own usage of Snowflake, we saw a common query go 15 times faster without changing the warehouse size. All of these performance enhancements are extremely exciting, and you will see continued improvements in the future. We love to innovate and continuously raise the bar on what's possible. More important, we love seeing our customers adopt and benefit from our new capabilities. In June, we announced a number of previews, and we continue to roll those features out and see tremendous adoption, even before reaching general availability. Two have those announcements were the introduction of our geospatial support and policies for dynamic data masking. Both of these features are currently in use by hundreds of customers. The number of tables using our new geography data type recently crossed the hundred thousand mark, and the number of columns with masking policies also recently crossed the same hundred thousand mark. This momentum and level of adoption since our announcements in June is phenomenal. I have one last announcement to highlight today. In 2014, Snowflake transformed the world of data management and analytics by providing a single platform with first class support for both structured and semi structured data. Today, we are announcing that Snowflake will be adding support for unstructured data on that same platform. Think of the abilities of Snowflake used to store access and share files. As an example, would you like to leverage the power of SQL to reason through a set of image files. We have a few customers as early adopters and we'll provide additional details in the future. With this, you will be able to leverage Snowflake to mobilize all your data in the Data Cloud. Our customers rely on Snowflake as the data platform for every part of their business. However, the vision and potential of Snowflake is actually much bigger than the four walls of any organization. Snowflake has created a Data Cloud a data connected network with a vision where any Snowflake customer can leverage and mobilize the world's data. Whether it's data sets, or data services from traditional data providers for SaaS vendors, our marketplace creates opportunities for you and raises the bar in terms of what is possible. As examples, you can unify data across your supply chain to accelerate your time and quality to market. You can build entirely new revenue streams, or collaborate with a consortium on data for good. The possibilities are endless. Every company has the opportunity to gain richer insights, build greater products and deliver better services by reaching beyond the data that he owns. Our vision is to enable every company to leverage the world's data through seamless and governing access. Snowflake is your window into this data network into this broader opportunity. Welcome to the Data Cloud. (upbeat music)

Published Date : Nov 19 2020

SUMMARY :

is the gateway to the Data Cloud, FTP the file to Quantifind, It brings the computation to Snowflake and that the model is running as the ability to know your data, the ability to access is the mobilization of data to

ENTITIES

Entity	Category	Confidence
Insureco	ORGANIZATION	0.99+
Christian	PERSON	0.99+
Alice	PERSON	0.99+
August of 2020	DATE	0.99+
August of 2019	DATE	0.99+
June	DATE	0.99+
InsureCo	ORGANIZATION	0.99+
Martin Avanes	PERSON	0.99+
Europe	LOCATION	0.99+
Quantifind	ORGANIZATION	0.99+
Prasanna	PERSON	0.99+
15 times	QUANTITY	0.99+
2019	DATE	0.99+
Alex	PERSON	0.99+
SNP	ORGANIZATION	0.99+
2014	DATE	0.99+
Jordan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Java	TITLE	0.99+
72%	QUANTITY	0.99+
SQL	TITLE	0.99+
Today	DATE	0.99+
North America	LOCATION	0.99+
each agent	QUANTITY	0.99+
SMP	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
First	QUANTITY	0.99+
Second	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Snowflake	TITLE	0.99+
Python	TITLE	0.99+
each call	QUANTITY	0.99+
Sri Chintala	PERSON	0.99+
each role	QUANTITY	0.99+
today	DATE	0.99+
Two	QUANTITY	0.99+
two	QUANTITY	0.99+
Both	QUANTITY	0.99+
Crypto Numerix	ORGANIZATION	0.99+
two entities	QUANTITY	0.99+

Abhinav Joshi & Tushar Katarki, Red Hat | KubeCon + CloudNativeCon Europe 2020 – Virtual

>> Announcer: From around the globe, it's theCUBE with coverage of KubeCon + CloudNativeCon Europe 2020 Virtual brought to you by Red Hat, the Cloud Native Computing Foundation and Ecosystem partners. >> Welcome back I'm Stu Miniman, this is theCUBE's coverage of KubeCon + CloudNativeCon Europe 2020, the virtual event. Of course, when we talk about Cloud Native we talk about Kubernetes there's a lot that's happening to modernize the infrastructure but a very important thing that we're going to talk about today is also what's happening up the stack, what sits on top of it and some of the new use cases and applications that are enabled by all of this modern environment and for that we're going to talk about artificial intelligence and machine learning or AI and ML as we tend to talk in the industry, so happy to welcome to the program. We have two first time guests joining us from Red Hat. First of all, we have Abhinav Joshi and Tushar Katarki they are both senior managers, part of the OpenShift group. Abhinav is in the product marketing and Tushar is in product management. Abhinav and Tushar thank you so much for joining us. >> Thanks a lot, Stu, we're glad to be here. >> Thanks Stu and glad to be here at KubeCon. >> All right, so Abhinav I mentioned in the intro here, modernization of the infrastructure is awesome but really it's an enabler. We know... I'm an infrastructure person the whole reason we have infrastructure is to be able to drive those applications, interact with my data and the like and of course, AI and ML are exciting a lot going on there but can also be challenging. So, Abhinav if I could start with you bring us inside your customers that you're talking to, what are the challenges, the opportunities? What are they seeing in this space? Maybe what's been holding them back from really unlocking the value that is expected? >> Yup, that's a very good question to kick off the conversation. So what we are seeing as an organization they typically face a lot of challenges when they're trying to build an AI/ML environment, right? And the first one is like a talent shortage. There is a limited amount of the AI, ML expertise in the market and especially the data scientists that are responsible for building out the machine learning and the deep learning models. So yeah, it's hard to find them and to be able to retain them and also other talents like a data engineer or app DevOps folks as well and the lack of talent can actually stall the project. And the second key challenge that we see is the lack of the readily usable data. So the businesses collect a lot of data but they must find the right data and make it ready for the data scientists to be able to build out, to be able to test and train the machine learning models. If you don't have the right kind of data to the predictions that your model is going to do in the real world is only going to be so good. So that becomes a challenge as well, to be able to find and be able to wrangle the right kind of data. And the third key challenge that we see is the lack of the rapid availability of the compute infrastructure, the data and machine learning, and the app dev tools for the various personas like a data scientist or data engineer, the software developers and so on that can also slow down the project, right? Because if all your teams are waiting on the infrastructure and the tooling of their choice to be provisioned on a recurring basis and they don't get it in a timely manner, it can stall the projects. And then the next one is the lack of collaboration. So you have all these kinds of teams that are involved in the AI project, and they have to collaborate with each other because the work one of the team does has a dependency on a different team like say for example, the data scientists are responsible for building the machine learning models and then what they have to do is they have to work with the app dev teams to make sure the models get integrated as part of the app dev processes and ultimately rolled out into the production. So if all these teams are operating in say silos and there is lack of collaboration between the teams, so this can stall the projects as well. And finally, what we see is the data scientists they typically start the machine learning modeling on their individual PCs or laptops and they don't focus on the operational aspects of the solution. So what this means is when the IT teams have to roll all this out into a production kind of deployment, so they get challenged to take all the work that has been done by the individuals and then be able to make sense out of it, be able to make sure that it can be seamlessly brought up in a production environment in a consistent way, be it on-premises, be it in the cloud or be it say at the edge. So these are some of the key challenges that we see that the organizations are facing, as they say try to take the AI projects from pilot to production. >> Well, some of those things seem like repetition of what we've had in the past. Obviously silos have been the bane of IT moving forward and of course, for many years we've been talking about that gap between developers and what's happening in the operation side. So Tushar, help us connect the dots, containers, Kubernetes, the whole DevOps movement. How is this setting us up to actually be successful for solutions like AI and ML? >> Sure Stu I mean, in fact you said it right like in the world of software, in the world of microservices, in the world of app modernization, in the world of DevOps in the past 10, 15 years, but we have seen this evolution revolution happen with containers and Kubernetes driving more DevOps behavior, driving more agile behavior so this in fact is what we are trying to say here can ease up the cable to EIML also. So the various containers, Kubernetes, DevOps and OpenShift for software development is directly applicable for AI projects to make them move agile, to get them into production, to make them more valuable to organization so that they can realize the full potential of AI. We already touched upon a few personas so it's useful to think about who the users are, who the personas are. Abhinav I talked about data scientists these are the people who obviously do the machine learning itself, do the modeling. Then there are data engineers who do the plumbing who provide the essential data. Data is so essential to machine learning and deep learning and so there are data engineers that are app developers who in some ways will then use the output of what the data scientists have produced in terms of models and then incorporate them into services and of course, none of these things are purely cast in stone there's a lot of overlap you could find that data scientists are app developers as well, you'll see some of app developers being data scientist later data engineer. So it's a continuum rather than strict boundaries, but regardless what all of these personas groups of people need or experts need is self service to that preferred tools and compute and storage resources to be productive and then let's not forget the IT, engineering and operations teams that need to make all this happen in an easy, reliable, available manner and something that is really safe and secure. So containers help you, they help you quickly and easily deploy a broad set of machine learning tools, data tools across the cloud, the hybrid cloud from data center to public cloud to the edge in a very consistent way. Teams can therefore alternatively modify, change a shared container images, machine learning models with (indistinct) and track changes. And this could be applicable to both containers as well as to the data by the way and be transparent and transparency helps in collaboration but also it could help with the regulatory reasons later on in the process. And then with containers because of the inherent processes solution, resource control and protection from threat they can also be very secure. Now, Kubernetes takes it to the next level first of all, it forms a cluster of all your compute and data resources, and it helps you to run your containerized tools and whatever you develop on them in a consistent way with access to these shared compute and centralized compute and storage and networking resources from the data center, the edge or the public cloud. They provide things like resource management, workload scheduling, multi-tendency controls so that you can be a proper neighbors if you will, and quota enforcement right? Now that's Kubernetes now if you want to up level it further if you want to enhance what Kubernetes offers then you go into how do you write applications? How do you actually make those models into services? And that's where... and how do you lifecycle them? And that's sort of the power of Helm and for the more Kubernetes operators really comes into the picture and while Helm helps in installing some of this for a complete life cycle experience. A kubernetes operator is the way to go and they simplify the acceleration and deployment and life cycle management from end-to-end of your entire AI, ML tool chain. So all in all organizations therefore you'll see that they need to dial up and define models rapidly just like applications that's how they get ready out of it quickly. There is a lack of collaboration across teams as Abhinav pointed out earlier, as you noticed that has happened still in the world of software also. So we're talking about how do you bring those best practices here to AI, ML. DevOps approaches for machine learning operations or many analysts and others have started calling as MLOps. So how do you kind of bring DevOps to machine learning, and fosters better collaboration between teams, application developers and IT operations and create this feedback loop so that the time to production and the ability to take more machine learning into production and ML-powered applications into production increase is significant. So that's kind of the, where I wanted shine the light on what you were referring to earlier, Stu. >> All right, Abhinav of course one of the good things about OpenShift is you have quite a lot of customers that have deployed the solution over the years, bring us inside some of your customers what are they doing for AI, ML and help us understand really what differentiates OpenShift in the marketplace for this solution set. >> Yeah, absolutely that's a very good question as well and we're seeing a lot of traction in terms of all kinds of industries, right? Be it the financial services like healthcare, automotive, insurance, oil and gas, manufacturing and so on. For a wide variety of use cases and what we are seeing is at the end of the day like all these deployments are focused on helping improve the customer experience, be able to automate the business processes and then be able to help them increase the revenue, serve their customers better, and also be able to save costs. If you go to openshift.com/ai-ml it's got like a lot of customer stories in there but today I will not touch on three of the customers we have in terms of the different industries. The first one is like Royal Bank of Canada. So they are a top global financial institution based out of Canada and they have more than 17 million clients globally. So they recently announced that they build out an AI-powered private cloud platform that was based on OpenShift as well as the NVIDIA DGX AI compute system and this whole solution is actually helping them to transform the customer banking experience by being able to deliver an AI-powered intelligent apps and also at the same time being able to improve the operational efficiency of their organization. And now with this kind of a solution, what they're able to do is they're able to run thousands of simulations and be able to analyze millions of data points in a fraction of time as compared to the solution that they had before. Yeah, so like a lot of great work going on there but now the next one is the ETCA healthcare. So like ETCA is one of the leading healthcare providers in the country and they're based out of the Nashville, Tennessee. And they have more than 184 hospitals as well as more than 2,000 sites of care in the U.S. as well as in the UK. So what they did was they developed a very innovative machine learning power data platform on top of our OpenShift to help save lives. The first use case was to help with the early detection of sepsis like it's a life-threatening condition and then more recently they've been able to use OpenShift in the same kind of stack to be able to roll out the new applications that are powered by machine learning and deep learning let say to help them fight COVID-19. And recently they did a webinar as well that had all the details on the challenges they had like how did they go about it? Like the people, process and technology and then what the outcomes are. And we are proud to be a partner in the solution to help with such a noble cause. And the third example I want to share here is the BMW group and our partner DXC Technology what they've done is they've actually developed a very high performing data-driven data platform, a development platform based on OpenShift to be able to analyze the massive amount of data from the test fleet, the data and the speed of the say to help speed up the autonomous driving initiatives. And what they've also done is they've redesigned the connected drive capability that they have on top of OpenShift that's actually helping them provide various use cases to help improve the customer experience. With the customers and all of the customers are able to leverage a lot of different value-add services directly from within the car, their own cars. And then like last year at the Red Hat Summit they had a keynote as well and then this year at Summit, they were one of the Innovation Award winners. And we have a lot more stories but these are the three that I thought are actually compelling that I should talk about here on theCUBE. >> Yeah Abhinav just a quick follow up for you. One of the things of course we're looking at in 2020 is how has the COVID-19 pandemic, people working from home how has that impacted projects? I have to think that AI and ML are one of those projects that take a little bit longer to deploy, is it something that you see are they accelerating it? Are they putting on pause or are new project kicking off? Anything you can share from customers you're hearing right now as to the impact that they're seeing this year? >> Yeah what we are seeing is that the customers are now even more keen to be able to roll out the digital (indistinct) but we see a lot of customers are now on the accelerated timeline to be able to say complete the AI, ML project. So yeah, it's picking up a lot of momentum and we talk to a lot of analyst as well and they are reporting the same thing as well. But there is the interest that is actually like ramping up on the AI, ML projects like across their customer base. So yeah it's the right time to be looking at the innovation services that it can help improve the customer experience in the new virtual world that we live in now about COVID-19. >> All right, Tushar you mentioned that there's a few projects involved and of course we know at this conference there's a very large ecosystem. Red Hat is a strong contributor to many, many open source projects. Give us a little bit of a view as to in the AI, ML space who's involved, which pieces are important and how Red Hat looks at this entire ecosystem? >> Thank you, Stu so as you know technology partnerships and the power of open is really what is driving the technology world these days in any ways and particularly in the AI ecosystem. And that is mainly because one of the machine learning is in a bootstrap in the past 10 years or so and a lot of that emerging technology to take advantage of the emerging data as well as compute power has been built on the kind of the Linux ecosystem with openness and languages like popular languages like Python, et cetera. And so what you... and of course tons of technology based in Java but the point really here is that the ecosystem plays a big role and open plays a big role and that's kind of Red Hat's best cup of tea, if you will. And that really has plays a leadership role in the open ecosystem so if we take your question and kind of put it into two parts, what is the... what we are doing in the community and then what we are doing in terms of partnerships themselves, commercial partnerships, technology partnerships we'll take it one step at a time. In terms of the community itself, if you step back to the three years, we worked with other vendors and users, including Google and NVIDIA and H2O and other Seldon, et cetera, and both startups and big companies to develop this Kubeflow ecosystem. The Kubeflow is upstream community that is focused on developing MLOps as we talked about earlier end-to-end machine learning on top of Kubernetes. So Kubeflow right now is in 1.0 it happened a few months ago now it's actually at 1.1 you'll see that coupon here and then so that's the Kubeflow community in addition to that we are augmenting that with the Open Data Hub community which is something that extends the capabilities of the Kubeflow community to also add some of the data pipelining stuff and some of the data stuff that I talked about and forms a reference architecture on how to run some of this on top of OpenShift. So the Open Data Hub community also has a great way of including partners from a technology partnership perspective and then tie that with something that I mentioned earlier, which is the idea of Kubernetes operators. Now, if you take a step back as I mentioned earlier, Kubernetes operators help manage the life cycle of the entire application or containerized application including not only the configuration on day one but also day two activities like update and backups, restore et cetera whatever the application needs. Afford proper functioning that a "operator" needs for it to make sure so anyways, the Kubernetes operators ecosystem is also flourishing and we haven't faced that with the OperatorHub.io which is a community marketplace if you will, I don't call it marketplace a community hub because it's just comprised of community operators. So the Open Data Hub actually can take community operators and can show you how to run that on top of OpenShift and manage the life cycle. Now that's the reference architecture. Now, the other aspect of it really is as I mentioned earlier is the commercial aspect of it. It is from a customer point of view, how do I get certified, supported software? And to that extent, what we have is at the top of the... from a user experience point of view, we have certified operators and certified applications from the AI, ML, ISV community in the Red Hat marketplace. And from the Red Hat marketplace is where it becomes easy for end users to easily deploy these ISVs and manage the complete life cycle as I said. Some of the examples of these kinds of ISVs include startups like H2O although H2O is kind of well known in certain sectors PerceptiLabs, Cnvrg, Seldon, Starburst et cetera and then on the other side, we do have other big giants also in this which includes partnerships with NVIDIA, Cloudera et cetera that we have announced, including our also SaaS I got to mention. So anyways these provide... create that rich ecosystem for data scientists to take advantage of. A TEDx Summit back in April, we along with Cloudera, SaaS Anaconda showcased a live demo that shows all these things to working together on top of OpenShift with this operator kind of idea that I talked about. So I welcome people to go and take a look the openshift.com/ai-ml that Abhinav already referenced should have a link to that it take a simple Google search might download if you need some of that, but anyways and the other part of it is really our work with the hardware OEMs right? And so obviously NVIDIA GPUs is obviously hardware, and that accelerations is really important in this world but we are also working with other OEM partners like HP and Dell to produce this accelerated AI platform that turnkey solutions to run your data-- to create this open AI platform for "private cloud" or the data center. The other thing obviously is IBM, IBM Cloud Pak for Data is based on OpenShift that has been around for some time and is seeing very good traction, if you think about a very turnkey solution, IBM Cloud Pak is definitely kind of well ahead in that and then finally Red Hat is about driving innovation in the open-source community. So, as I said earlier, we are doing the Open Data Hub which that reference architecture that showcases a combination of upstream open source projects and all these ISV ecosystems coming together. So I welcome you to take a look at that at opendatahub.io So I think that would be kind of the some total of how we are not only doing open and community building but also doing certifications and providing to our customers that assurance that they can run these tools in production with the help of a rich certified ecosystem. >> And customer is always key to us so that's the other thing that the goal here is to provide our customers with a choice, right? They can go with open source or they can go with a commercial solution as well. So you want to make sure that they get the best in cloud experience on top of our OpenShift and our broader portfolio as well. >> All right great, great note to end on, Abhinav thank you so much and Tushar great to see the maturation in this space, such an important use case. Really appreciate you sharing this with theCUBE and Kubecon community. >> Thank you, Stu. >> Thank you, Stu. >> Okay thank you and thanks a lot and have a great rest of the show. Thanks everyone, stay safe. >> Thanks you and stay with us for a lot more coverage from KubeCon + CloudNativeCon Europe 2020, the virtual edition I'm Stu Miniman and thank you as always for watching theCUBE. (soft upbeat music plays)

Published Date : Aug 18 2020

SUMMARY :

the globe, it's theCUBE and some of the new use Thanks a lot, Stu, to be here at KubeCon. and the like and of course, and make it ready for the data scientists in the operation side. and for the more Kubernetes operators that have deployed the and also at the same time One of the things of course is that the customers and how Red Hat looks at and some of the data that the goal here is great to see the maturation and have a great rest of the show. the virtual edition I'm Stu Miniman

ENTITIES

Entity	Category	Confidence
Brian Gilmore	PERSON	0.99+
David Brown	PERSON	0.99+
Tim Yoakum	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave Volante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Brian	PERSON	0.99+
Dave	PERSON	0.99+
Tim Yokum	PERSON	0.99+
Stu	PERSON	0.99+
Herain Oberoi	PERSON	0.99+
John	PERSON	0.99+
Dave Valante	PERSON	0.99+
Kamile Taouk	PERSON	0.99+
John Fourier	PERSON	0.99+
Rinesh Patel	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Santana Dasgupta	PERSON	0.99+
Europe	LOCATION	0.99+
Canada	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
ICE	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Jack Berkowitz	PERSON	0.99+
Australia	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
Venkat	PERSON	0.99+
Michael	PERSON	0.99+
Camille	PERSON	0.99+
Andy Jassy	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Venkat Krishnamachari	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Don Tapscott	PERSON	0.99+
thousands	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Intercontinental Exchange	ORGANIZATION	0.99+
Children's Cancer Institute	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
telco	ORGANIZATION	0.99+
Sabrina Yan	PERSON	0.99+
Tim	PERSON	0.99+
Sabrina	PERSON	0.99+
John Furrier	PERSON	0.99+
Google	ORGANIZATION	0.99+
MontyCloud	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Leo	PERSON	0.99+
COVID-19	OTHER	0.99+
Santa Ana	LOCATION	0.99+
UK	LOCATION	0.99+
Tushar	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Valente	PERSON	0.99+
JL Valente	PERSON	0.99+
1,000	QUANTITY	0.99+

Sri Satish Ambati, H20.ai | CUBE Conversation, May 2020

>> connecting with thought leaders all around the world, this is a CUBE Conversation. Hi, everybody this is Dave Vellante of theCUBE, and welcome back to my CXO series. I've been running this through really since the start of the COVID-19 crisis to really understand how leaders are dealing with this pandemic. Sri Ambati is here, he's the CEO and founder of H20. Sri, it's great to see you again, thanks for coming on. >> Thank you for having us. >> Yeah, so this pandemic has obviously given people fits, no question, but it's also given opportunities for companies to kind of reassess where they are. Automation is a huge watchword, flexibility, business resiliency and people who maybe really hadn't fully leaned into things like the cloud and AI and automation are now realizing, wow, we have no choice, it's about survival. Your thought as to what you're seeing in the marketplace. >> Thanks for having us. I think first of all, kudos to the frontline health workers who have been ruthlessly saving lives across the country and the world, and what you're really doing is a fraction of what we could have done or should be doing to stay away the next big pandemic. But that apart I think, I usually tend to say BC is before COVID. So if the world was thinking about going digital after COVID-19, they have been forced to go digital and as a result, you're seeing tremendous transformation across our customers, and a lot of application to kind of go in and reinvent their business models that allow them to scale as effortlessly as they could using the digital means. >> So, think about, doctors and diagnosis machines, in some cases, are helping doctors make diagnoses, they're sometimes making even better diagnosis, (mumbles) is informing. There's been a lot of talk about the models, you know how... Yeah, I know you've been working with a lot of healthcare organizations, you may probably familiar with that, you know, the Medium post, The Hammer and the Dance, and if people criticize the models, of course, they're just models, right? And you iterate models and machine intelligence can help us improve. So, in this, you know, you talk about BC and post C, how have you seen the data and in machine intelligence informing the models and proving that what we know about this pandemic, I mean, it changed literally daily, what are you seeing? >> Yeah, and I think it started with Wuhan and we saw the best application of AI in trying to trace, literally from Alipay, to WeChat, track down the first folks who were spreading it across China and then eventually the rest of the world. I think contact tracing, for example, has become a really interesting problem. supply chain has been disrupted like never before. We're beginning to see customers trying to reinvent their distribution mechanisms in the second order effects of the COVID, and the the prime center is hospital staffing, how many ventilator, is the first few weeks so that after COVID crisis as it evolved in the US. We are busy predicting working with some of the local healthcare communities to predict how staffing in hospitals will work, how many PPE and ventilators will be needed and so henceforth, but that quickly and when the peak surge will be those with the beginning problems, and many of our customers have begin to do these models and iterate and improve and kind of educate the community to practice social distancing, and that led to a lot of flattening the curve and you're talking flattening the curve, you're really talking about data science and analytics in public speak. That led to kind of the next level, now that we have somewhat brought a semblance of order to the reaction to COVID, I think what we are beginning to figure out is, is there going to be a second surge, what elective procedures that were postponed, will be top of the mind for customers, and so this is the kind of things that hospitals are beginning to plan out for the second half of the year, and as businesses try to open up, certain things were highly correlated to surgeon cases, such as cleaning supplies, for example, the obvious one or pantry buying. So retailers are beginning to see what online stores are doing well, e-commerce, online purchases, electronic goods, and so everyone essentially started working from home, and so homes needed to have the same kind of bandwidth that offices and commercial enterprises needed to have, and so a lot of interesting, as one side you saw airlines go away, this side you saw the likes of Zoom and video take off. So you're kind of seeing a real divide in the digital divide and that's happening and AI is here to play a very good role to figure out how to enhance your profitability as you're looking about planning out the next two years. >> Yeah, you know, and obviously, these things they get, they get partisan, it gets political, I mean, our job as an industry is to report, your job is to help people understand, I mean, let the data inform and then let public policy you know, fight it out. So who are some of the people that you're working with that you know, as a result of COVID-19. What's some of the work that H2O has done, I want to better understand what role are you playing? >> So one of the things we're kind of privileged as a company to come into the crisis, with a strong balance and an ability to actually have the right kind of momentum behind the company in terms of great talent, and so we have 10% of the world's top data scientists in the in the form of Kaggle Grand Masters in the company. And so we put most of them to work, and they started collecting data sets, curating data sets and making them more qualitative, picking up public data sources, for example, there's a tremendous amount of job loss out there, figuring out which are the more difficult kind of sectors in the economy and then we started looking at exodus from the cities, we're looking at mobility data that's publicly available, mobility data through the data exchanges, you're able to find which cities which rural areas, did the New Yorkers as they left the city, which places did they go to, and what's to say, Californians when they left Los Angeles, which are the new places they have settled in? These are the places which are now busy places for the same kind of items that you need to sell if you're a retailer, but if you go one step further, we started engaging with FEMA, we start engaging with the universities, like Imperial College London or Berkeley, and started figuring out how best to improve the models and automate them. The SEER model, the most popular SEER model, we added that into our Driverless AI product as a recipe and made that accessible to our customers in testing, to customers in healthcare who are trying to predict where the surge is likely to come. But it's mostly about information right? So the AI at the end of it is all about intelligence and being prepared. Predictive is all about being prepared and that's kind of what we did with general, lots of blogs, typical blog articles and working with the largest health organizations and starting to kind of inform them on the most stable models. What we found to our not so much surprise, is that the simplest, very interpretable models are actually the most widely usable, because historical data is actually no longer as effective. You need to build a model that you can quickly understand and retry again to the feedback loop of back testing that model against what really happened. >> Yeah, so I want to double down on that. So really, two things I want to understand, if you have visibility on it, sounds like you do. Just in terms of the surge and the comeback, you know, kind of what those models say, based upon, you know, we have some advanced information coming from the global market, for sure, but it seems like every situation is different. What's the data telling you? Just in terms of, okay, we're coming into the spring and the summer months, maybe it'll come down a little bit. Everybody says it... We fully expect it to come back in the fall, go back to college, don't go back to college. What is the data telling you at this point in time with an understanding that, you know, we're still iterating every day? >> Well, I think I mean, we're not epidemiologists, but at the same time, the science of it is a highly local response, very hyper local response to COVID-19 is what we've seen. Santa Clara, which is just a county, I mean, is different from San Francisco, right, sort of. So you beginning to see, like we saw in Brooklyn, it's very different, and Bronx, very different from Manhattan. So you're seeing a very, very local response to this disease, and I'm talking about US. You see the likes of Brazil, which we're worried about, has picked up quite a bit of cases now. I think the silver lining I would say is that China is up and running to a large degree, a large number of our user base there are back active, you can see the traffic patterns there. So two months after their last research cases, the business and economic activity is back and thriving. And so, you can kind of estimate from that, that this can be done where you can actually contain the rise of active cases and it will take masking of the entire community, masking and the healthy dose of increase in testing. One of our offices is in Prague, and Czech Republic has done an incredible job in trying to contain this and they've done essentially, masked everybody and as a result they're back thinking about opening offices, schools later this month. So I think that's a very, very local response, hyper local response, no one country and no one community is symmetrical with other ones and I think we have a unique situation where in United States you have a very, very highly connected world, highly connected economy and I think we have quite a problem on our hands on how to safeguard our economy while also safeguarding life. >> Yeah, so you can't just, you can't just take Norway and apply it or South Korea and apply it, every situation is different. And then I want to ask you about, you know, the economy in terms of, you know, how much can AI actually, you know, how can it work in this situation where you have, you know, for example, okay, so the Fed, yes, it started doing asset buys back in 2008 but still, very hard to predict, I mean, at this time of this interview you know, Stock Market up 900 points, very difficult to predict that but some event happens in the morning, somebody, you know, Powell says something positive and it goes crazy but just sort of even modeling out the V recovery, the W recovery, deep recession, the comeback. You have to have enough data, do you not? In order for AI to be reasonably accurate? How does it work? And how does at what pace can you iterate and improve on the models? >> So I think that's exactly where I would say, continuous modeling, instead of continuously learning continuous, that's where the vision of the world is headed towards, where data is coming, you build a model, and then you iterate, try it out and come back. That kind of rapid, continuous learning would probably be needed for all our models as opposed to the typical, I'm pushing a model to production once a year, or once every quarter. I think what we're beginning to see is the kind of where companies are beginning to kind of plan out. A lot of people lost their jobs in the last couple of months, right, sort of. And so up scaling and trying to kind of bring back these jobs back both into kind of, both from the manufacturing side, but also lost a lot of jobs in the transportation and the kind of the airlines slash hotel industries, right, sort of. So it's trying to now bring back the sense of confidence and will take a lot more kind of testing, a lot more masking, a lot more social empathy, I think well, some of the things that we are missing while we are socially distant, we know that we are so connected as a species, we need to kind of start having that empathy for we need to wear a mask, not for ourselves, but for our neighbors and people we may run into. And I think that kind of, the same kind of thinking has to kind of parade, before we can open up the economy in a big way. The data, I mean, we can do a lot of transfer learning, right, sort of there are new methods, like try to model it, similar to the 1918, where we had a second bump, or a lot of little bumps, and that's kind of where your W shaped pieces, but governments are trying very well in seeing stimulus dollars being pumped through banks. So some of the US case we're looking for banks is, which small medium business in especially, in unsecured lending, which business to lend to, (mumbles) there's so many applications that have come to banks across the world, it's not just in the US, and banks are caught up with the problem of which and what's growing the concern for this business to kind of, are they really accurate about the number of employees they are saying they have? Do then the next level problem or on forbearance and mortgage, that side of the things are coming up at some of these banks as well. So they're looking at which, what's one of the problems that one of our customers Wells Fargo, they have a question which branch to open, right, sort of that itself, it needs a different kind of modeling. So everything has become a very highly good segmented models, and so AI is absolutely not just a good to have, it has become a must have for most of our customers in how to go about their business. (mumbles) >> I want to talk a little bit about your business, you have been on a mission to democratize AI since the beginning, open source. Explain your business model, how you guys make money and then I want to help people understand basic theoretical comparisons and current affairs. >> Yeah, that's great. I think the last time we spoke, probably about at the Spark Summit. I think Dave and we were talking about Sparkling Water and H2O our open source platforms, which are premium platforms for democratizing machine learning and math at scale, and that's been a tremendous brand for us. Over the last couple of years, we have essentially built a platform called Driverless AI, which is a license software and that automates machine learning models, we took the best practices of all these data scientists, and combined them to essentially build recipes that allow people to build the best forecasting models, best fraud prevention models or the best recommendation engines, and so we started augmenting traditional data scientists with this automatic machine learning called AutoML, that essentially allows them to build models without necessarily having the same level of talent as these great Kaggle Grand Masters. And so that has democratized, allowed ordinary companies to start producing models of high caliber and high quality that would otherwise have been the pedigree of Google, Microsoft or Amazon or some of these top tier AI houses like Netflix and others. So what we've done is democratize not just the algorithms at the open source level. Now, we've made it easy for kind of rapid adoption of AI across every branch inside a company, a large organization, also across smaller organizations which don't have the access to the same kind of talent. Now, third level, you know, what we've brought to market, is ability to augment data sets, especially public and private data sets that you can, the alternative data sets that can increase the signal. And that's where we've started working on a new platform called Q, again, more license software, and I mean, to give you an idea there from business models endpoint, now majority of our software sales is coming from closed source software. And sort of so, we've made that transition, we still make our open source widely accessible, we continue to improve it, a large chunk of the teams are improving and participating in building the communities but I think from a business model standpoint as of last year, 51% of our revenues are now coming from closed source software and that change is continuing to grow. >> And this is the point I wanted to get to, so you know, the open source model was you know, Red Hat the one company that, you know, succeeded wildly and it was, put it out there open source, come up with a service, maintain the software, you got to buy the subscription okay, fine. And everybody thought that you know, you were going to do that, they thought that Databricks was going to do and that changed. But I want to take two examples, Hortonworks which kind of took the Red Hat model and Cloudera which does IP. And neither really lived up to the expectation, but now there seems to be sort of a new breed I mentioned, you guys, Databricks, there are others, that seem to be working. You with your license software model, Databricks with a managed service and so there's, it's becoming clear that there's got to be some level of IP that can be licensed in order to really thrive in the open source community to be able to fund the committers that you have to put forth to open source. I wonder if you could give me your thoughts on that narrative. >> So on Driverless AI, which is the closest platform I mentioned, we opened up the layers in open source as recipes. So for example, different companies build their zip codes differently, right, the domain specific recipes, we put about 150 of them in open source again, on top of our Driverless AI platform, and the idea there is that, open source is about freedom, right? It is not necessarily about, it's not a philosophy, it's not a business model, it allows freedom for rapid adoption of a platform and complete democratization and commodification of a space. And that allows a small company like ours to compete at the level of an SaaS or a Google or a Microsoft because you have the same level of voice as a very large company and you're focused on using code as a community building exercise as opposed to a business model, right? So that's kind of the heart of open source, is allowing that freedom for our end users and the customers to kind of innovate at the same level of that a Silicon Valley company or one of these large tech giants are building software. So it's really about making, it's a maker culture, as opposed to a consumer culture around software. Now, if you look at how the the Red Hat model, and the others who have tried to replicate that, the difficult part there was, if the product is very good, customers are self sufficient and if it becomes a standard, then customers know how to use it. If the product is crippled or difficult to use, then you put a lot of services and that's where you saw the classic Hadoop companies, get pulled into a lot of services, which is a reasonably difficult business to scale. So I think what we chose was, instead, a great product that builds a fantastic brand, that makes AI, even when other first or second.ai domain, and for us to see thousands of companies which are not AI and AI first, and even more companies adopting AI and talking about AI as a major way that was possible because of open source. If you had chosen close source and many of your peers did, they all vanished. So that's kind of how the open source is really about building the ecosystem and having the patience to build a company that takes 10, 20 years to build. And what we are expecting unfortunately, is a first and fast rise up to become unicorns. In that race, you're essentially sacrifice, building a long ecosystem play, and that's kind of what we chose to do, and that took a little longer. Now, if you think about the, how do you truly monetize open source, it takes a little longer and is much more difficult sales machine to scale, right, sort of. Our open source business actually is reasonably positive EBITDA business because it makes more money than we spend on it. But trying to teach sales teams, how to sell open source, that's a much, that's a rate limiting step. And that's why we chose and also explaining to the investors, how open source is being invested in as you go closer to the IPO markets, that's where we chose, let's go into license software model and scale that as a regular business. >> So I've said a few times, it's kind of like ironic that, this pandemic is as we're entering a new decade, you know, we've kind of we're exiting the era, I mean, the many, many decades of Moore's law being the source of innovation and now it's a combination of data, applying machine intelligence and being able to scale and with cloud. Well, my question is, what did we expect out of AI this decade if those are sort of the three, the cocktail of innovation, if you will, what should we expect? Is it really just about, I suggest, is it really about automating, you know, businesses, giving them more agility, flexibility, you know, etc. Or should we should we expect more from AI this decade? >> Well, I mean, if you think about the decade of 2010 2011, that was defined by software is eating the world, right? And now you can say software is the world, right? I mean, pretty much almost all conditions are digital. And AI is eating software, right? (mumbling) A lot of cloud transitions are happening and are now happening much faster rate but cloud and AI are kind of the leading, AI is essentially one of the biggest driver for cloud adoption for many of our customers. So in the enterprise world, you're seeing rebuilding of a lot of data, fast data driven applications that use AI, instead of rule based software, you're beginning to see patterned, mission AI based software, and you're seeing that in spades. And, of course, that is just the tip of the iceberg, AI has been with us for 100 years, and it's going to be ahead of us another hundred years, right, sort of. So as you see the discovery rate at which, it is really a fundamentally a math, math movement and in that math movement at the beginning of every century, it leads to 100 years of phenomenal discovery. So AI is essentially making discoveries faster, AI is producing, entertainment, AI is producing music, AI is producing choreographing, you're seeing AI in every walk of life, AI summarization of Zoom meetings, right, you beginning to see a lot of the AI enabled ETF peaking of stocks, right, sort of. You're beginning to see, we repriced 20,000 bonds every 15 seconds using H2O AI, corporate bonds. And so you and one of our customers is on the fastest growing stock, mostly AI is powering a lot of these insights in a fast changing world which is globally connected. No one of us is able to combine all the multiple dimensions that are changing and AI has that incredible opportunity to be a partner for every... (mumbling) For a hospital looking at how the second half will look like for physicians looking at what is the sentiment of... What is the surge to expect? To kind of what is the market demand looking at the sentiment of the customers. AI is the ultimate money ball in business and then I think it's just showing its depth at this point. >> Yeah, I mean, I think you're right on, I mean, basically AI is going to convert every software, every application, or those tools aren't going to have much use, Sri we got to go but thanks so much for coming to theCUBE and the great work you guys are doing. Really appreciate your insights. stay safe, and best of luck to you guys. >> Likewise, thank you so much. >> Welcome, and thank you for watching everybody, this is Dave Vellante for the CXO series on theCUBE. We'll see you next time. All right, we're clear. All right.

Published Date : May 19 2020

SUMMARY :

Sri, it's great to see you Your thought as to what you're and a lot of application and if people criticize the models, and kind of educate the community and then let public policy you know, and starting to kind of inform them What is the data telling you of the entire community, and improve on the models? and the kind of the airlines and then I want to help people understand and I mean, to give you an idea there in the open source community to be able and the customers to kind of innovate and being able to scale and with cloud. What is the surge to expect? and the great work you guys are doing. Welcome, and thank you

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
2008	DATE	0.99+
Dave Vellante	PERSON	0.99+
Wells Fargo	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Prague	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
51%	QUANTITY	0.99+
May 2020	DATE	0.99+
China	LOCATION	0.99+
United States	LOCATION	0.99+
100 years	QUANTITY	0.99+
Bronx	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
US	LOCATION	0.99+
Santa Clara	LOCATION	0.99+
last year	DATE	0.99+
10%	QUANTITY	0.99+
20,000 bonds	QUANTITY	0.99+
Imperial College London	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
COVID-19	OTHER	0.99+
Los Angeles	LOCATION	0.99+
Netflix	ORGANIZATION	0.99+
H20	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
South Korea	LOCATION	0.99+
Sri Satish Ambati	PERSON	0.99+
thousands	QUANTITY	0.99+
FEMA	ORGANIZATION	0.99+
Brazil	LOCATION	0.99+
second half	QUANTITY	0.99+
first	QUANTITY	0.99+
second surge	QUANTITY	0.99+
two months	QUANTITY	0.99+
one	QUANTITY	0.98+
second bump	QUANTITY	0.98+
two things	QUANTITY	0.98+
H2O	ORGANIZATION	0.98+
both	QUANTITY	0.98+
Czech Republic	LOCATION	0.98+
Silicon Valley	LOCATION	0.98+
WeChat	TITLE	0.98+
three	QUANTITY	0.98+
hundred years	QUANTITY	0.98+
once a year	QUANTITY	0.97+
Powell	PERSON	0.97+
Sparkling Water	ORGANIZATION	0.97+
Alipay	TITLE	0.97+
Norway	LOCATION	0.97+
pandemic	EVENT	0.97+
second order	QUANTITY	0.97+
third level	QUANTITY	0.97+
first folks	QUANTITY	0.97+
COVID-19 crisis	EVENT	0.96+
Fed	ORGANIZATION	0.95+
1918	DATE	0.95+
later this month	DATE	0.95+
one side	QUANTITY	0.94+
Sri Ambati	PERSON	0.94+
two examples	QUANTITY	0.93+
Moore	PERSON	0.92+
Californians	PERSON	0.92+
CXO	TITLE	0.92+
last couple of months	DATE	0.92+
COVID	OTHER	0.91+
Spark Summit	EVENT	0.91+
one step	QUANTITY	0.91+
The Hammer	TITLE	0.9+
COVID crisis	EVENT	0.87+
every 15 seconds	QUANTITY	0.86+

Sri Satish Ambati, H20.ai | CUBE Conversation, May 2020

>> Starting the record, Dave in five, four, three. Hi, everybody this is Dave Vellante, theCUBE, and welcome back to my CXO series. I've been running this through really since the start of the COVID-19 crisis to really understand how leaders are dealing with this pandemic. Sri Ambati is here, he's the CEO and founder of H20. Sri, it's great to see you again, thanks for coming on. >> Thank you for having us. >> Yeah, so this pandemic has obviously given people fits, no question, but it's also given opportunities for companies to kind of reassess where they are. Automation is a huge watchword, flexibility, business resiliency and people who maybe really hadn't fully leaned into things like the cloud and AI and automation are now realizing, wow, we have no choice, it's about survival. Your thought as to what you're seeing in the marketplace. >> Thanks for having us. I think first of all, kudos to the frontline health workers who have been ruthlessly saving lives across the country and the world, and what you're really doing is a fraction of what we could have done or should be doing to stay away the next big pandemic. But that apart I think, I usually tend to say BC is before COVID. So if the world was thinking about going digital after COVID-19, they have been forced to go digital and as a result, you're seeing tremendous transformation across our customers, and a lot of application to kind of go in and reinvent their business models that allow them to scale as effortlessly as they could using the digital means. >> So, think about, doctors and diagnosis machines, in some cases, are helping doctors make diagnoses, they're sometimes making even better diagnosis, (mumbles) is informing. There's been a lot of talk about the models, you know how... Yeah, I know you've been working with a lot of healthcare organizations, you may probably familiar with that, you know, the Medium post, The Hammer and the Dance, and if people criticize the models, of course, they're just models, right? And you iterate models and machine intelligence can help us improve. So, in this, you know, you talk about BC and post C, how have you seen the data and in machine intelligence informing the models and proving that what we know about this pandemic, I mean, it changed literally daily, what are you seeing? >> Yeah, and I think it started with Wuhan and we saw the best application of AI in trying to trace, literally from Alipay, to WeChat, track down the first folks who were spreading it across China and then eventually the rest of the world. I think contact tracing, for example, has become a really interesting problem. supply chain has been disrupted like never before. We're beginning to see customers trying to reinvent their distribution mechanisms in the second order effects of the COVID, and the the prime center is hospital staffing, how many ventilator, is the first few weeks so that after COVID crisis as it evolved in the US. We are busy predicting working with some of the local healthcare communities to predict how staffing in hospitals will work, how many PPE and ventilators will be needed and so henceforth, but that quickly and when the peak surge will be those with the beginning problems, and many of our customers have begin to do these models and iterate and improve and kind of educate the community to practice social distancing, and that led to a lot of flattening the curve and you're talking flattening the curve, you're really talking about data science and analytics in public speak. That led to kind of the next level, now that we have somewhat brought a semblance of order to the reaction to COVID, I think what we are beginning to figure out is, is there going to be a second surge, what elective procedures that were postponed, will be top of the mind for customers, and so this is the kind of things that hospitals are beginning to plan out for the second half of the year, and as businesses try to open up, certain things were highly correlated to surgeon cases, such as cleaning supplies, for example, the obvious one or pantry buying. So retailers are beginning to see what online stores are doing well, e-commerce, online purchases, electronic goods, and so everyone essentially started working from home, and so homes needed to have the same kind of bandwidth that offices and commercial enterprises needed to have, and so a lot of interesting, as one side you saw airlines go away, this side you saw the likes of Zoom and video take off. So you're kind of seeing a real divide in the digital divide and that's happening and AI is here to play a very good role to figure out how to enhance your profitability as you're looking about planning out the next two years. >> Yeah, you know, and obviously, these things they get, they get partisan, it gets political, I mean, our job as an industry is to report, your job is to help people understand, I mean, let the data inform and then let public policy you know, fight it out. So who are some of the people that you're working with that you know, as a result of COVID-19. What's some of the work that H2O has done, I want to better understand what role are you playing? >> So one of the things we're kind of privileged as a company to come into the crisis, with a strong balance and an ability to actually have the right kind of momentum behind the company in terms of great talent, and so we have 10% of the world's top data scientists in the in the form of Kaggle Grand Masters in the company. And so we put most of them to work, and they started collecting data sets, curating data sets and making them more qualitative, picking up public data sources, for example, there's a tremendous amount of job loss out there, figuring out which are the more difficult kind of sectors in the economy and then we started looking at exodus from the cities, we're looking at mobility data that's publicly available, mobility data through the data exchanges, you're able to find which cities which rural areas, did the New Yorkers as they left the city, which places did they go to, and what's to say, Californians when they left Los Angeles, which are the new places they have settled in? These are the places which are now busy places for the same kind of items that you need to sell if you're a retailer, but if you go one step further, we started engaging with FEMA, we start engaging with the universities, like Imperial College London or Berkeley, and started figuring out how best to improve the models and automate them. The SaaS model, the most popular SaaS model, we added that into our Driverless AI product as a recipe and made that accessible to our customers in testing, to customers in healthcare who are trying to predict where the surge is likely to come. But it's mostly about information right? So the AI at the end of it is all about intelligence and being prepared. Predictive is all about being prepared and that's kind of what we did with general, lots of blogs, typical blog articles and working with the largest health organizations and starting to kind of inform them on the most stable models. What we found to our not so much surprise, is that the simplest, very interpretable models are actually the most widely usable, because historical data is actually no longer as effective. You need to build a model that you can quickly understand and retry again to the feedback loop of back testing that model against what really happened. >> Yeah, so I want to double down on that. So really, two things I want to understand, if you have visibility on it, sounds like you do. Just in terms of the surge and the comeback, you know, kind of what those models say, based upon, you know, we have some advanced information coming from the global market, for sure, but it seems like every situation is different. What's the data telling you? Just in terms of, okay, we're coming into the spring and the summer months, maybe it'll come down a little bit. Everybody says it... We fully expect it to come back in the fall, go back to college, don't go back to college. What is the data telling you at this point in time with an understanding that, you know, we're still iterating every day? >> Well, I think I mean, we're not epidemiologists, but at the same time, the science of it is a highly local response, very hyper local response to COVID-19 is what we've seen. Santa Clara, which is just a county, I mean, is different from San Francisco, right, sort of. So you beginning to see, like we saw in Brooklyn, it's very different, and Bronx, very different from Manhattan. So you're seeing a very, very local response to this disease, and I'm talking about US. You see the likes of Brazil, which we're worried about, has picked up quite a bit of cases now. I think the silver lining I would say is that China is up and running to a large degree, a large number of our user base there are back active, you can see the traffic patterns there. So two months after their last research cases, the business and economic activity is back and thriving. And so, you can kind of estimate from that, that this can be done where you can actually contain the rise of active cases and it will take masking of the entire community, masking and the healthy dose of increase in testing. One of our offices is in Prague, and Czech Republic has done an incredible job in trying to contain this and they've done essentially, masked everybody and as a result they're back thinking about opening offices, schools later this month. So I think that's a very, very local response, hyper local response, no one country and no one community is symmetrical with other ones and I think we have a unique situation where in United States you have a very, very highly connected world, highly connected economy and I think we have quite a problem on our hands on how to safeguard our economy while also safeguarding life. >> Yeah, so you can't just, you can't just take Norway and apply it or South Korea and apply it, every situation is different. And then I want to ask you about, you know, the economy in terms of, you know, how much can AI actually, you know, how can it work in this situation where you have, you know, for example, okay, so the Fed, yes, it started doing asset buys back in 2008 but still, very hard to predict, I mean, at this time of this interview you know, Stock Market up 900 points, very difficult to predict that but some event happens in the morning, somebody, you know, Powell says something positive and it goes crazy but just sort of even modeling out the V recovery, the W recovery, deep recession, the comeback. You have to have enough data, do you not? In order for AI to be reasonably accurate? How does it work? And how does at what pace can you iterate and improve on the models? >> So I think that's exactly where I would say, continuous modeling, instead of continuously learning continuous, that's where the vision of the world is headed towards, where data is coming, you build a model, and then you iterate, try it out and come back. That kind of rapid, continuous learning would probably be needed for all our models as opposed to the typical, I'm pushing a model to production once a year, or once every quarter. I think what we're beginning to see is the kind of where companies are beginning to kind of plan out. A lot of people lost their jobs in the last couple of months, right, sort of. And so up scaling and trying to kind of bring back these jobs back both into kind of, both from the manufacturing side, but also lost a lot of jobs in the transportation and the kind of the airlines slash hotel industries, right, sort of. So it's trying to now bring back the sense of confidence and will take a lot more kind of testing, a lot more masking, a lot more social empathy, I think well, some of the things that we are missing while we are socially distant, we know that we are so connected as a species, we need to kind of start having that empathy for we need to wear a mask, not for ourselves, but for our neighbors and people we may run into. And I think that kind of, the same kind of thinking has to kind of parade, before we can open up the economy in a big way. The data, I mean, we can do a lot of transfer learning, right, sort of there are new methods, like try to model it, similar to the 1918, where we had a second bump, or a lot of little bumps, and that's kind of where your W shaped pieces, but governments are trying very well in seeing stimulus dollars being pumped through banks. So some of the US case we're looking for banks is, which small medium business in especially, in unsecured lending, which business to lend to, (mumbles) there's so many applications that have come to banks across the world, it's not just in the US, and banks are caught up with the problem of which and what's growing the concern for this business to kind of, are they really accurate about the number of employees they are saying they have? Do then the next level problem or on forbearance and mortgage, that side of the things are coming up at some of these banks as well. So they're looking at which, what's one of the problems that one of our customers Wells Fargo, they have a question which branch to open, right, sort of that itself, it needs a different kind of modeling. So everything has become a very highly good segmented models, and so AI is absolutely not just a good to have, it has become a must have for most of our customers in how to go about their business. (mumbles) >> I want to talk a little bit about your business, you have been on a mission to democratize AI since the beginning, open source. Explain your business model, how you guys make money and then I want to help people understand basic theoretical comparisons and current affairs. >> Yeah, that's great. I think the last time we spoke, probably about at the Spark Summit. I think Dave and we were talking about Sparkling Water and H2O or open source platforms, which are premium platforms for democratizing machine learning and math at scale, and that's been a tremendous brand for us. Over the last couple of years, we have essentially built a platform called Driverless AI, which is a license software and that automates machine learning models, we took the best practices of all these data scientists, and combined them to essentially build recipes that allow people to build the best forecasting models, best fraud prevention models or the best recommendation engines, and so we started augmenting traditional data scientists with this automatic machine learning called AutoML, that essentially allows them to build models without necessarily having the same level of talent as these Greek Kaggle Grand Masters. And so that has democratized, allowed ordinary companies to start producing models of high caliber and high quality that would otherwise have been the pedigree of Google, Microsoft or Amazon or some of these top tier AI houses like Netflix and others. So what we've done is democratize not just the algorithms at the open source level. Now, we've made it easy for kind of rapid adoption of AI across every branch inside a company, a large organization, also across smaller organizations which don't have the access to the same kind of talent. Now, third level, you know, what we've brought to market, is ability to augment data sets, especially public and private data sets that you can, the alternative data sets that can increase the signal. And that's where we've started working on a new platform called Q, again, more license software, and I mean, to give you an idea there from business models endpoint, now majority of our software sales is coming from closed source software. And sort of so, we've made that transition, we still make our open source widely accessible, we continue to improve it, a large chunk of the teams are improving and participating in building the communities but I think from a business model standpoint as of last year, 51% of our revenues are now coming from closed source software and that change is continuing to grow. >> And this is the point I wanted to get to, so you know, the open source model was you know, Red Hat the one company that, you know, succeeded wildly and it was, put it out there open source, come up with a service, maintain the software, you got to buy the subscription okay, fine. And everybody thought that you know, you were going to do that, they thought that Databricks was going to do and that changed. But I want to take two examples, Hortonworks which kind of took the Red Hat model and Cloudera which does IP. And neither really lived up to the expectation, but now there seems to be sort of a new breed I mentioned, you guys, Databricks, there are others, that seem to be working. You with your license software model, Databricks with a managed service and so there's, it's becoming clear that there's got to be some level of IP that can be licensed in order to really thrive in the open source community to be able to fund the committers that you have to put forth to open source. I wonder if you could give me your thoughts on that narrative. >> So on Driverless AI, which is the closest platform I mentioned, we opened up the layers in open source as recipes. So for example, different companies build their zip codes differently, right, the domain specific recipes, we put about 150 of them in open source again, on top of our Driverless AI platform, and the idea there is that, open source is about freedom, right? It is not necessarily about, it's not a philosophy, it's not a business model, it allows freedom for rapid adoption of a platform and complete democratization and commodification of a space. And that allows a small company like ours to compete at the level of an SaaS or a Google or a Microsoft because you have the same level of voice as a very large company and you're focused on using code as a community building exercise as opposed to a business model, right? So that's kind of the heart of open source, is allowing that freedom for our end users and the customers to kind of innovate at the same level of that a Silicon Valley company or one of these large tech giants are building software. So it's really about making, it's a maker culture, as opposed to a consumer culture around software. Now, if you look at how the the Red Hat model, and the others who have tried to replicate that, the difficult part there was, if the product is very good, customers are self sufficient and if it becomes a standard, then customers know how to use it. If the product is crippled or difficult to use, then you put a lot of services and that's where you saw the classic Hadoop companies, get pulled into a lot of services, which is a reasonably difficult business to scale. So I think what we chose was, instead, a great product that builds a fantastic brand, that makes AI, even when other first or second.ai domain, and for us to see thousands of companies which are not AI and AI first, and even more companies adopting AI and talking about AI as a major way that was possible because of open source. If you had chosen close source and many of your peers did, they all vanished. So that's kind of how the open source is really about building the ecosystem and having the patience to build a company that takes 10, 20 years to build. And what we are expecting unfortunately, is a first and fast rise up to become unicorns. In that race, you're essentially sacrifice, building a long ecosystem play, and that's kind of what we chose to do, and that took a little longer. Now, if you think about the, how do you truly monetize open source, it takes a little longer and is much more difficult sales machine to scale, right, sort of. Our open source business actually is reasonably positive EBITDA business because it makes more money than we spend on it. But trying to teach sales teams, how to sell open source, that's a much, that's a rate limiting step. And that's why we chose and also explaining to the investors, how open source is being invested in as you go closer to the IPO markets, that's where we chose, let's go into license software model and scale that as a regular business. >> So I've said a few times, it's kind of like ironic that, this pandemic is as we're entering a new decade, you know, we've kind of we're exiting the era, I mean, the many, many decades of Moore's law being the source of innovation and now it's a combination of data, applying machine intelligence and being able to scale and with cloud. Well, my question is, what did we expect out of AI this decade if those are sort of the three, the cocktail of innovation, if you will, what should we expect? Is it really just about, I suggest, is it really about automating, you know, businesses, giving them more agility, flexibility, you know, etc. Or should we should we expect more from AI this decade? >> Well, I mean, if you think about the decade of 2010 2011, that was defined by software is eating the world, right? And now you can say software is the world, right? I mean, pretty much almost all conditions are digital. And AI is eating software, right? (mumbling) A lot of cloud transitions are happening and are now happening much faster rate but cloud and AI are kind of the leading, AI is essentially one of the biggest driver for cloud adoption for many of our customers. So in the enterprise world, you're seeing rebuilding of a lot of data, fast data driven applications that use AI, instead of rule based software, you're beginning to see patterned, mission AI based software, and you're seeing that in spades. And, of course, that is just the tip of the iceberg, AI has been with us for 100 years, and it's going to be ahead of us another hundred years, right, sort of. So as you see the discovery rate at which, it is really a fundamentally a math, math movement and in that math movement at the beginning of every century, it leads to 100 years of phenomenal discovery. So AI is essentially making discoveries faster, AI is producing, entertainment, AI is producing music, AI is producing choreographing, you're seeing AI in every walk of life, AI summarization of Zoom meetings, right, you beginning to see a lot of the AI enabled ETF peaking of stocks, right, sort of. You're beginning to see, we repriced 20,000 bonds every 15 seconds using H2O AI, corporate bonds. And so you and one of our customers is on the fastest growing stock, mostly AI is powering a lot of these insights in a fast changing world which is globally connected. No one of us is able to combine all the multiple dimensions that are changing and AI has that incredible opportunity to be a partner for every... (mumbling) For a hospital looking at how the second half will look like for physicians looking at what is the sentiment of... What is the surge to expect? To kind of what is the market demand looking at the sentiment of the customers. AI is the ultimate money ball in business and then I think it's just showing its depth at this point. >> Yeah, I mean, I think you're right on, I mean, basically AI is going to convert every software, every application, or those tools aren't going to have much use, Sri we got to go but thanks so much for coming to theCUBE and the great work you guys are doing. Really appreciate your insights. stay safe, and best of luck to you guys. >> Likewise, thank you so much. >> Welcome, and thank you for watching everybody, this is Dave Vellante for the CXO series on theCUBE. We'll see you next time. All right, we're clear. All right.

Published Date : May 18 2020

SUMMARY :

Sri, it's great to see you Your thought as to what you're and a lot of application and if people criticize the models, and kind of educate the community and then let public policy you know, is that the simplest, What is the data telling you of the entire community, and improve on the models? and the kind of the airlines and then I want to help people understand and I mean, to give you an idea there in the open source community to be able and the customers to kind of innovate and being able to scale and with cloud. What is the surge to expect? and the great work you guys are doing. Welcome, and thank you

ENTITIES

Entity	Category	Confidence
Wells Fargo	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
2008	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
five	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Brooklyn	LOCATION	0.99+
Prague	LOCATION	0.99+
China	LOCATION	0.99+
Bronx	LOCATION	0.99+
100 years	QUANTITY	0.99+
May 2020	DATE	0.99+
Manhattan	LOCATION	0.99+
51%	QUANTITY	0.99+
US	LOCATION	0.99+
Brazil	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
United States	LOCATION	0.99+
COVID-19	OTHER	0.99+
10%	QUANTITY	0.99+
20,000 bonds	QUANTITY	0.99+
Los Angeles	LOCATION	0.99+
last year	DATE	0.99+
H20	ORGANIZATION	0.99+
Imperial College London	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
one	QUANTITY	0.99+
four	QUANTITY	0.99+
Santa Clara	LOCATION	0.99+
One	QUANTITY	0.99+
hundred years	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Sri Satish Ambati	PERSON	0.99+
South Korea	LOCATION	0.99+
three	QUANTITY	0.99+
second half	QUANTITY	0.99+
two things	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
both	QUANTITY	0.98+
second surge	QUANTITY	0.98+
first	QUANTITY	0.98+
H2O	ORGANIZATION	0.98+
third level	QUANTITY	0.98+
once a year	QUANTITY	0.98+
Sparkling Water	ORGANIZATION	0.98+
FEMA	ORGANIZATION	0.98+
WeChat	TITLE	0.98+
pandemic	EVENT	0.98+
Powell	PERSON	0.97+
COVID-19 crisis	EVENT	0.97+
second bump	QUANTITY	0.97+
Czech Republic	LOCATION	0.96+
second order	QUANTITY	0.96+
1918	DATE	0.96+
Norway	LOCATION	0.96+
Fed	ORGANIZATION	0.95+
first folks	QUANTITY	0.94+
thousands of companies	QUANTITY	0.94+
two examples	QUANTITY	0.91+
10, 20 years	QUANTITY	0.91+
COVID	OTHER	0.91+
CXO	TITLE	0.91+
two months	QUANTITY	0.91+
last couple of months	DATE	0.9+
Moore	PERSON	0.9+
later this month	DATE	0.9+
Alipay	TITLE	0.89+
Sri Ambati	PERSON	0.88+
every 15 seconds	QUANTITY	0.88+
COVID crisis	EVENT	0.86+
Californians	PERSON	0.85+
Driverless	TITLE	0.84+

Nanda Vijaydev, HPE (BlueData) | CUBE Conversation, September 2019

from our studios in the heart of Silicon Valley Palo Alto California this is a cute conversation hi and welcome to the cube Studios for another cube conversation where we go in-depth with thought leaders driving innovation across the tech industry I'm your host Peter Burris AI is on the forefront of every board in every enterprise on a global basis as well as machine learning deep learning and other advanced technologies that are intended to turn data into business action that differentiates the business leads to more revenue leads to more profitability but the challenge is is that all of these new use cases are not able to be addressed with the traditional ways that we've set up the workflows that we've set up to address them so as a consequence we're going to need greater opera's the operationalization of how we translate business problems into ml and related technology solutions big challenge we've got a great guest today to talk about it non-division diof is a distinguished technologist and lead data scientists at HPE in the blue data team nonde welcome to the cube thank you happy to be here so ananda let's start with this notion of a need for an architected approach to how we think about matching AI ml technology to operations so that we get more certain results better outcomes more understanding of where we're going and how the technology is working within the business absolutely yeah ai and doing AI in an enterprise is not new there have been enterprise-grade tools in the space before but most of them have a very prescribed way of doing things sometimes you use custom sequel to use that particular tool or the way you present data to that tool requires some level of pre-processing which makes you copy the data into the tool so you have already data fidelity maybe at risk and you have a data duplication happening and then the scale right when you talk about doing AI at the scale that is required now considering data is so big and there is a variety of data sets for the scale it can probably be done but there is a huge cost associated with that and you may still not meet the variety of use cases that you want to actually work on so the problem now is to make sure that you empower your users who are working in the space and augment them with the right set of technologies and the ability to bring data in a timely manner for them to work on these solutions so it sounds as though what we're trying to do is simplify the process of taking great ideas and turn it into great outcomes but you mentioned users I think it's got to start with or let me ask you if we have to start here that we've always thought about how is going to center in the data science or the data scientist as these solutions have start to become more popularized if diffused across the industry a lot more people are engaging are all roles being served as well as you need to be absolutely I think that's the biggest challenge right in the past you know when we talk about very prescribed solutions end to end was happening within those tools so the different user persona were probably part of that particular solution and also the way these models came into production which is really making it available for a consumer is read coding or redeveloping this in technologies that were production friendly which is you're rewriting that and sequel you're recording that and C so there is a lot of details that are lost in translation and the third big problem was really having visibility or having a say from a developer's point of view or a data scientist point of view in how these things are performing in production that how do you actually take it back take that feedback back into deciding you know is this model still good or how do you retrain so when you look at this lifecycle holistically this is an iterative process it is no longer you know workflow where you hand things off this is not a water flow methodology anymore this is a very very continuous and iterative process especially in the New Age data science the tools that are developing where you build the model that developer decides what the run time is and the run times are capable of serving those models as is you don't have to recode you don't have to lose things during translation so with this back to your question of how do you serve two different roles now all those personas and all those roles have to be part of the same project and they have to be part of the same experiment they're just serving different parts of the lifecycle and now you've whatever tooling you provide or whatever architecture technologies you provide have to look at it holistically there has to be continuous development there has to be collaboration there has to be central repositories that actually cater to those needs so each so the architected approach needs to be able to serve each of the roles but in a way that is collaborative and is ultimately put in service to the outcome and driving the use of the technology forward well that leads to another question should it should the should this architected approach be tied to one or another set of algorithms or one or another set of implementation infrastructure or does it have to be able to serve a wide array of Technology types yeah great question right this is a living ecosystem we can no longer build for you know you plant something for the next two years or the next three years technologies are coming every day and the reason is because the types of use cases are evolving and what you need to solve that use case is completely different when you look at two different use cases so whatever standards you come up with you know the consistency has to be across how a user is on-boarded into the system a consistency has to be about data access about security about how does one provision these environments but as far as what tool is used or how is that tool being applied to a specific problem there's a lot of variability in there and it has to cater your architecture has to make sure that this variability is addressed and it is growing so HPE spends a lot of time with customers and you're learning from your customer successes and how you turn that into tooling that leads to this type of operator operationalization but give us some visibility into some of those successes that really stand out for you that have been essential to how HP has participated in this journey to create better tools for better AI and m/l absolutely you know traditionally with blue data HPE now you know we've been exposed to a lot of big data processing technologies where the current landscape the data is different data is not always at rest data is not structured you know data is coming it could be a stream of data it could be a picture and in the use cases like we talked about you know it could be image recognition or a voice recognition where the type of data is very different right so back to how we've learnt from our customers like in my role I talked to you know tens of customers on a daily or weekly basis and each one of them are at a different level of maturity in their life cycle and these are some very established customers but you know the various groups that are adopting this new age technologies even within an organization there is a lot of variability so whatever we offered them we have to help support all of that particular user groups there are some who are coming from the classic or language background there are some that are coming from Python background some are doing things in Scala someone doing things in SPARC and there are some commercial tools that they're using like h2o driverless AI or data iku so what we have to look at is in this life cycle we have to make sure that all these communities are represented and/or addressed and if they build a model in a specific technology how do we consume that how do we take it in then how do we deploy that from an end to point of view it doesn't matter where a model gets built it does matter how end-users access it it doesn't matter how security is applied to it it does matter how scaling is applied to it so really there is a lot of consistency is required in the operationalization and also in how you onboard those different tools how do you make sure that consistency or methodology or standard practices are applied in this entire lifecycle and also monitoring that's a huge aspect right when you have deployed a model and it's in production monitoring means two different things to people where is it even available you know when you go to a website when you click on something is a website available very similarly when you go to an endpoint or you're scoring against a model is that model available do you have enough resources can it scale depending on how much requests come in that's one aspect of monitoring and the second aspect is really how was the model performing you know is that what is the accuracy what is the drift when is it time to retrain so you no longer have the luxury to look at these things in isolation right so it we want to make sure that all these things can be addressed in a manner knowing that this iteration sometimes can be a month sometimes it can be a day sometimes it's probably a few hours and that is why it can no longer be an isolated and even infrastructure point of view some of these workloads may need things like GPU and you may need it for a very short amount of time let how do you make sure that you give what is needed for that duration that is required and take it back and assign it to something else because these are very valuable resources so I want to build on if I may on that notion of onboarding the tools we're talking about use cases that enterprises are using today to create business value we're talking about HPE as an example delivering tooling that operationalize is how that's done today but the reality is we're gonna see the state of the art still evolve pretty dramatically over the next few years how is HPE going about ensuring that your approach and the approach you working with your customers does not get balkanized does not get you know sclerotic that it's capable of evolving and changing as folks learn new approaches to doing things absolutely you know it this has to start with having an open architecture you know you have to there has to be standards without which enterprises can't run but at the same time those standards shouldn't be so constricting that it doesn't allow you to expand into newer use cases right so what HP EML ops offers is really making sure that you can do what you do today in a best-practice manner or in the most efficient manner bringing time to value you know making sure that there is you know instant provisioning or access to beta or making sure that you don't duplicate data compute storage separation containerization you know these are some of the standard best practice technologies that are out there making sure that you adopt those and what these sets users for is to make sure that they can evolve with the later use cases you can never have you know you can never have things you know frozen in time you just want to make sure that you can evolve and this is what it sets them up for and you evolve with different use cases and different tools as they come along nada thanks very much has been a very it's been a great conversation we appreciate you being on the cube thank you Peter so my guest has been non Division I of the distinguished technologists and lead data scientists at HPE blue data and for all of you thanks for joining us again for another cube conversation on Peter burst see you next time you [Music]

Published Date : Sep 5 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
September 2019	DATE	0.99+
Nanda Vijaydev	PERSON	0.99+
Scala	TITLE	0.99+
Python	TITLE	0.99+
second aspect	QUANTITY	0.99+
tens of customers	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Peter	PERSON	0.98+
HP	ORGANIZATION	0.98+
BlueData	ORGANIZATION	0.97+
each	QUANTITY	0.97+
two different use cases	QUANTITY	0.97+
a day	QUANTITY	0.97+
third big problem	QUANTITY	0.97+
a month	QUANTITY	0.96+
two different things	QUANTITY	0.96+
each one	QUANTITY	0.95+
two different roles	QUANTITY	0.94+
one	QUANTITY	0.94+
today	DATE	0.92+
Palo Alto California	LOCATION	0.92+
SPARC	TITLE	0.91+
lot of details	QUANTITY	0.87+
New Age	DATE	0.83+
blue data	ORGANIZATION	0.83+
lot	QUANTITY	0.81+
ananda	PERSON	0.76+
h2o	TITLE	0.76+
lot more	QUANTITY	0.74+
a few hours	QUANTITY	0.72+
few years	DATE	0.7+
next two years	DATE	0.69+
daily	QUANTITY	0.65+
years	QUANTITY	0.62+
weekly	QUANTITY	0.62+
next three	DATE	0.61+
HPE	TITLE	0.59+
time	QUANTITY	0.55+
Division I	QUANTITY	0.54+

Announcement: Sri Ambati, H2O.ai | CUBE Converstion, August 2019

(upbeat music) >> Announcer: From our studios, in the heart of Silicon Valley, Palo Alto, California, this is a Cube conversation. >> Everyone, welcome to this special Cube conversation here in Palo Alto Cube studios. I'm John Furrier, host of the Cube. We have special breaking news here, with Sri Ambati who is the founder and CEO of H2O.ai with big funding news. Great to see you Cube alumni, hot startup, you got some hot funding news, share with us. >> We are very excited to announce our Series D. Goldman Sachs, one of our leading customers and Ping An from China are leading our round. It's a round of $72 million, and bringing our total fundraise to 147. This is an endorsement of their support of our mission to democratize AI and an endorsement of the amazing teamwork behind the company and its customer centricity. Customers have now come to lead two of our rounds. Last round was Series C led by Wells Fargo and NVIDIA and I think it just goes to say how critical a thing we are for their success in AI. >> Well congratulations, I've been watching you guys build this company from scratch, we've had many conversations going back to 2013, '14 on The Cube. You call it-- >> You covered us long before. >> You guys were always on the wave, and you really created a category, this is a new category that Cloud 2.0 is creating which is a DevOps mindset, entrepreneurial mindset, creating a category to enable people to have the kind of infrastructure and tooling and software to enable them to do all the heavy lifting of AI without doing the heavy lifting. As the quote for cloud is, that Amazon always quotes is you do all of the undifferentiated heavy lifting that's required to stand up stuff and then provide tooling for the heavy differentiated lifting to make it easy to use. This has been a key thing. Has that been the-- >> Customers have be core to our, company building. H2O is here to build an amazing piece of innovation and technology and innovation is not new for Silicon Valley, as you know. But I think innovation, with a purpose and with a focus of customer success is something we represent and that's been kind of the key north finder for us. In terms of making things simpler, when we started, it was a grassroots movement in open source and we wanted the mind share of millions of users worldwide and that mind share got us a lot of feedback. And that feedback is how we then built the second generation of the product lines, which is driverless AI. We are also announcing our mission to make every company an AI company, this funding will power that transformation of several businesses that can then go on to build the AI superpower. >> And certainly, cloud computing, more compute more elastic resources is always a great tailwind. What are you guys going to do with the funding in terms of focus? >> You mentioned cloud which is a great story. We're obviously going to make things easier for folks who are doing the cloud, but they are the largest players, as well, Google, Microsoft, Amazon. They're right there, trying to innovate. AI is at the center of every software moment because AI eating software, software is eating the world. And so, all the software players are right there, trying to build a large AI opportunity for the world and we think in ecosystems, not just empires. So our mission is to uplift the entire AI to the place where businesses can use it, verticalize it, build new products, globalize. We are building our sales and marketing efforts now with a much bigger, faster systems-- >> So a lot of, go to market expansion, more customer focus. More field sales and support kind of thing. >> Build our center for AI research in Prague, within the CND, now we are building it in Chennai and Ottawa, and so globalizing the operation, going to China, going to build focus in Asia as well. >> So nice step up on funding at 72 million, you said? >> 72.5 million. >> 72.5 million, that's almost double what you've raised to date, nice kickup. So global expansion, nice philosophy. That's important to you guys, isn't it? >> The world has become a small village. There's no changing that, and data is global. Things are a wide global trend, it's amazing to see that AI is not just transforming the US, it's also transforming China, it's also transforming India. It's transforming Africa. Pay through mobile is a very common theme worldwide and I think data is being collected globally. I think there is no way to unbox it and box it back to a small place, so our vision is very borderless and global and we want the AI companies of the valley to also compete in a global arena and I think that's kind of why we think it's important to be-- >> Love competition, that's certainly going to force everyone to be more open. I got to ask you about the role of the developer. I love the democratization, putting AI in the hands of everybody, it's a great mission. You guys do a lot of AI for Good efforts. So congratulations on that, but how does this change the nature of the developer, because you're seeing with cloud and DevOps, developers are becoming closer to the front lines, they're becoming kingmakers. They're becoming really, really important. So the role of the developer is important. How do you change that role, if any. How do you expand it, what happens? >> There are two important transformations happening right now in the tech world. One is the role of data scientists and the role of the software engineer. Right, so they're coming closer in many ways, in actually in some of the newer places, software engineers are deploying data science models, data scientists are deploying software engineering. So Python has been a good new language, the new languages that are coming up that help that happen more closely. Software engineering as we know it, which was looking at data creating the rules and the logic that runs a program is now being automated to a degree where that logic is being generated from data using data science. So that's where the brains behind how programs run how computers build is now being, is AI inside. And so that's where the world is transforming, software engineers now get to do a lot more with a lot less of tinkering on a daily basis for little modules. They can probably build a whole slew of an application what would take 18 months to build is now compressing into 18 weeks or 18 days. >> Sri, I love how you talk about software engineering and data scientists, very specific. I was having a debate with my young son around what is computer science was the question. Well, computer science is the study of computers the science of computers. It used to be if you were a CS or a comp sci major which is not cool to say anymore but, when you were a computer science major, you were really a software engineer, that was the discipline. Now, computer science as a field has spread so far and so broad, you've got software engineering you've got data science, you have newer roles are emerging. But that brings up the question I want to put to you which is, the whole idea of, I'm a full stack developer. Well, if what you're saying you're doing is true, you're essentially cutting the stack in half. So it's a half stack developer on one end and a data scientist that's got the other half. So the notion of the full stack developer kind of goes away with the idea of horizontally scalable infrastructure and vertically specialized data and AI. Your thoughts, what's your reaction to that? >> I think the most... I would say the most scarce resource in the world is empathy, right? When developers have empathy for their users, they now start building design that cares for the users. So the design becomes still the limiting factor where you can't really automate a lot of that design. So the full stack engineer is now going closer to the front and understanding their users and making applications that are perceptive of how the users are using them and building that empathy into the product. A lot of the full stack, we used to learn how to build up a kernel, deploy it on cloud, scale it on your own servers. All of that is coming together in reasonably easier ways. With cloud is helping there, AI is helping there, data is helping there, and lessons from the data. But I think what has not gone away is imagination, creativity, and how to power that creativity with AI and get it in the hands of someone quickly. Marketing has become easier in the new world. So it's not just enough to make products, you have to make markets for your products and then deliver and get that success for customers-- >> So what you're saying-- >> The developers become-- >> The consistency of the lower end of the stack of wiring together the plumbing and the kernel and everything else is done for you. So you can move up. >> Up the stack. >> So the stack's growing, so it's still kind of full. No one calls themselves a half stack developer. I haven't met anyone say "Yeah I'm a half stack developer." They're full stack developers, but the roles are changing. >> I think what-- >> There's more to do on the front end of creativity so the stack's extending. >> Creativity is changing, I think the one thing we have learned. We've gone past Moore's Law in the valley and people are innovating architectures to run AI faster. So AI is beginning to eat hardware. So you've seen the transformation in microprocessors as well I think once AI starts being part of the overall conversation, you'll see a much more richer coexistence with being how a human programmer and a computer programmer is going to be working closely. But I think this is just the beginning of a real richness when you talk about rich interactive applications, you're going to talk about rich interactive appliances, where you start seeing intelligence really spread around the form. >> Sri, if we really want to have some fun we can just talk about what a 10x engineer is. No I'm only kidding, we're not going to go there. It's always a good debate on Twitter what a 10x engineer is. Sri, congratulations on the funding. $72.5 million in finance for global expansion on the team side as well as in geographies, congratulations. >> Thank you. >> H2O.ai >> The full stack engineer of the future is, finishing up your full stack engineer conversation is going to get that courage and become a leader. Going from managers to leaders, developers to founders. I think it's become easier to democratize entrepreneurship now than ever before and part of our mission as a company is to democratize things, democratize AI, democratize H2O like in the AI for Good, democratize water. But also democratize the art of making more entrepreneurs and remove the common ways to fail and that's also a way to create more opportunity more ownership in the world and so-- >> And I think society will benefit from this globally because in the data is truth, in the data is the notion of being transparent, if it's all there and we're going to get to the data faster and that's where AI helps us. >> That's what it is. >> Sri, congratulations, $72 million of funding for H2O. We're here with the founder and CEO Sri Ambati. Great success story here in Silicon Valley and around the world. I'm John Furrier with the Cube, thanks for watching. >> Sri: Thank you. (upbeat music)

Published Date : Aug 30 2019

SUMMARY :

in the heart of Silicon Valley, Palo Alto, California, I'm John Furrier, host of the Cube. and an endorsement of the amazing teamwork conversations going back to 2013, '14 on The Cube. As the quote for cloud is, that Amazon always quotes and that's been kind of the key north finder for us. What are you guys going to do with the funding AI is at the center of every software moment So a lot of, go to market expansion, more customer focus. and Ottawa, and so globalizing the operation, That's important to you guys, isn't it? and I think data is being collected globally. So the role of the developer is important. and the role of the software engineer. and a data scientist that's got the other half. So the full stack engineer is now going closer to the front The consistency of the lower end of the stack So the stack's growing, so it's still kind of full. so the stack's extending. So AI is beginning to eat hardware. Sri, congratulations on the funding. and remove the common ways to fail because in the data is truth, in the data is the notion and around the world. Sri: Thank you.

ENTITIES

Entity	Category	Confidence
NVIDIA	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Prague	LOCATION	0.99+
John Furrier	PERSON	0.99+
Chennai	LOCATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
18 months	QUANTITY	0.99+
Asia	LOCATION	0.99+
August 2019	DATE	0.99+
$72 million	QUANTITY	0.99+
H2O	ORGANIZATION	0.99+
Ottawa	LOCATION	0.99+
Sri Ambati	PERSON	0.99+
18 weeks	QUANTITY	0.99+
18 days	QUANTITY	0.99+
China	LOCATION	0.99+
H2O.ai	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2013	DATE	0.99+
147	QUANTITY	0.99+
$72.5 million	QUANTITY	0.99+
72 million	QUANTITY	0.99+
Python	TITLE	0.99+
millions of users	QUANTITY	0.99+
One	QUANTITY	0.99+
second generation	QUANTITY	0.99+
Sri	PERSON	0.99+
Palo Alto	LOCATION	0.98+
Cloud 2.0	TITLE	0.98+
Goldman Sachs	ORGANIZATION	0.98+
Africa	LOCATION	0.97+
10x	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
72.5 million	QUANTITY	0.96+
Cube	ORGANIZATION	0.94+
CND	LOCATION	0.93+
'14	DATE	0.93+
Palo Alto, California	LOCATION	0.92+
half	QUANTITY	0.92+
US	LOCATION	0.87+
India	LOCATION	0.86+
one end	QUANTITY	0.86+
two of our rounds	QUANTITY	0.84+
two important transformations	QUANTITY	0.78+
Last	QUANTITY	0.77+
double	QUANTITY	0.7+
DevOps	TITLE	0.69+
Ping An	ORGANIZATION	0.68+
Moore	ORGANIZATION	0.67+
H2O.ai	TITLE	0.61+
CEO	PERSON	0.61+
wave	EVENT	0.6+
Series C	EVENT	0.58+
The Cube	TITLE	0.53+
CUBE	EVENT	0.46+
Series	OTHER	0.27+

HPE Data Platform

from our studios in the heart of Silicon Valley Palo Alto California this is a cute conversation hi I'm Peter Burris analyst wiki Bond welcome to another wiki Bond the cube digital community event this one's sponsored by HPE like all of our digital community events this one will feature about 25 minutes of video followed by a crowd chat which will be your opportunity to ask your questions share your experiences and push forward the community's thinking on important issues facing business today so what are we talking about today over the course of the last say six months or so we've had a lot of conversations with our customers about the core issues that multi-cloud is going to engender with in business one of them clearly is how do we bring greater intelligence to how we move manage and administer data within the enterprise some of the more interesting conversations we've had turns out to have been with HPE and that's what we're going to talk about today we're going to be spending a few minutes with a number of HPE professionals as well as wiki bond professionals and thought leaders talking about the challenges that enterprises face as a consider intelligent data platforms so let's get started the first conversation that we're going to talk about is with Sandeep Singh who is the vice president at HPE Sandeep let's have that conversation about the challenges facing business today as it pertains to data so Sandeep I started off by making the observation that we've got this mountain of data coming in a lot of enterprises at the same time there seems to be a the the notion of how data is going to create new classes of business value seems to be pretty deeply ingrained and acculturated to a lot of decision-makers so they want more value out of their data but they're increasingly concerned about the volume of data that's going to hit them how in your conversations with customers are you hearing them talk about this fundamental challenge so that that's a great question you know across the board data is at the heart of applications pretty much everything that organizations do and when they look at it in conversations with customers it really boils down to a couple of areas one is how is my data just effortlessly available all the time it's always fast because fundamentally that's driving the speed of my business and that's incredibly important and how can my various audiences including developers just consume it like the public cloud in a self-service fashion and then the second part of that conversation is really about this massive data storm or mountain of data that's coming and it's gonna be available how do how do I Drive a competitive advantage how do i unlock these hidden insights in that data to uncover new revenue streams new customer experiences those are the areas that we hear about and fundamentally underlying it the challenge for customers is boy I have a lot of complexity and how do I ensure that I have the necessary insights in a the infrastructure management so I am not beholden am or my IT staff isn't beholden to fighting the IT fires that can cause disruptions and delays to projects so fundamentally we want to be able to push time and attention in the infrastructure in the administration of those devices that handle the data and move that time and attention up into how we deliver the data services and ideally up into the applications that are going to actually generate a new class of work within a digital business so I got that right absolutely it's about infrastructure that just runs seamlessly it's always on it's always fast people don't have to worry about what is it gonna go down is my data available or is it gonna slow down people don't want sometimes faster one always fast right I and that's governing the application performance that ultimately I can deliver and you talked about while geez if it if the data infrastructure just work seamlessly then can I eventually get to the applications and building the right pipelines ultimately for mining that data drive doing the AI and the machine learning analytics driven insides from there great discussion about the importance of data in the enterprise and how it's changing the way we think about business we're going to come back to Sandeep shortly but first let's spend some time talking with David floor who's the wiki bond analyst about the new mindset that is required to take advantage of some of these technologies and solve some of these problems specifically we need to think increasingly about data services let's hear what David has to say explain what that new mindset is yes I completely agree that that new mindset is required and it starts with you want to be able to deal with data wherever it's gonna be you in we are in a hybrid world hybrid cloud world your own clouds other public clouds partner clouds all of these need to be integrated and data is at the core of it so that the requirement then is to have rather than think about each individual piece is to think about services which are going to be applied to that data and can be applied not only to the data in one place but across all of that data and there isn't such a thing is just one set of services there going to be multiple sets of these services available but hope we will see some degree of conversion so they'll be the same lexicon and conceptual etcetera there'll be the same levels of things that are needed within each of these architectures but there'll be different emphasis on different areas we need to look at the way we administer data as a set of services that create outcomes for the business and as opposed to that are then translated into individual devices let me so let's jump into this notion of of what those services look like it seems as though we can list off a couple of them sure yeah so we must have of data reduction techniques so you must have deduplication compression type of techniques and you want to apply that our crosses bigger an amount of data as you can the more data you apply those the higher the levels of compression and deduplication you can get so that's clearly you've got those sort of sets of services across there you must backup and restore data in another place and be able to restore it quickly and easily there's that again is a service how quickly how integrated that recovery again that's going to be a variable that's a differentiation in the service exactly you're going to need data data protection in general end to end protection of once or another for example you need end-to-end encryption across there it's no longer good enough to say this bits been encrypted and then this bits the encrypted has got to be an end-to-end from one location to another location seamlessly provided that sort of thing well let me let me let me press on it cuz I think it's a really important point and and and it's you know the notion that the weakest link determines the strength of the chain right the what you just described says if you have encryption here and you don't have encryption there but because of the nature of digital you can start you start bringing that data together guess what the weakest link determines the protection of the overall data absolutely yes and then you need services like snapshots like like other services which provide much better usage of that data one of the great things about flash and that's brought about this about is that you can take a copy of that in real time and use that first totally different purpose and have that being changed in a different way so there are some really significantly great improvements you can have with services like snapshots and then you need some other services which are becoming even more important in my opinion the advent of [Music] bad actors in the in the world has really bought about the requirement for things like air gaps to have your data with the metadata all in one place and completely separated from everything else there are such things as called logical air gaps I think they as long as they're real in the real sense that the two paths can't interfere with each other those are going to be services which become very very important that's generally as an example of a general class of security data services they require so ultimately what we're describing is we're describing a new mindset that says that a storage administrator has to think about the services that the applications in the business requires and then seek out technologies that can provide those services at the price point with the degree of power consumption in the space or the environmental or with the type of maintenance and services related support that required based on the physical location the degree to which is under their control etc so that kind of what how we're thinking about this I think absolutely and the again if there's going to be multiple of these around in the marketplace one size is not going to fit all yeah you if you're wanting super fast response time at an edge and and if you don't get that response in time it's going to be no use whatsoever you're going to take you're going to have a different architecture a different way of doing it then if you need to be a hundred percent certain that every bit is captured and you know in a financial sort of environment but from a service standpoint you want to be able to look at that specific solution in a common way current policies current bilities correct great observations by David Flor it's very clear that for enterprises to get more control over their data their data assets and how they create value out of data they have to take a services mentality but the challenge that we all face is just taking a service mentality is not going to be enough we have to think about how we're going to organize those services into a platform that is pertinent and relevant to how business operates in a digital sense so let's go back to Sandeep saying and talk to him a little bit about this HPE notion of the intelligent data platform you've been one of the leaders in the complex systems arena for a long time and that includes storage where are you guys taking some of these technologies yeah so our strategy is to deliver an intelligent data platform and that intelligent data platform begins with workload optimized composable systems that can span the mission critical workloads general purpose secondary Big Data ai workloads we also deliver cloud data services that enable you to embrace hybrid cloud all of these systems including all the way to cloud data services are plumbed with data mobility and so for example use cases of even modernizing protection and going all the way to protecting cost effectively in the public cloud are enabled but really all of these systems then are imbued with a level of intelligence with a global intelligence engine that begins with predicting and proactively resolving issues before they occur but it goes way beyond that in delivering these prescriptive insights that are built on top of global learning across hundreds of thousands of systems with over a billion data points coming in on a daily basis to be able to deliver at the information at the fingertips of even the virtual machine admins to say this virtual machine is sapping the performance of this node and if you were to move it to this other node the performance or the SLA for all of the virtual machine farm will be even better we build on top of that to deliver pre-built automation so that it's hooked in with a REST API for strategy so that developers can consume it in a containerized application that's orchestrated with kubernetes or they can leverage it as an infrastructure as code whether it's with ansible puppet or chef we accelerate all of the application workloads and bring up where data protection and so it's available for the traditional business applications whether they're built on sa P or Oracle or sequel or the virtual machine farms or the new stack containerized applications and then customers can build their AI and big data pipelines on top of the infrastructure with a plethora of tools whether they're using basically Kafka lastic map our h2o that complete flexibility exists and within HPE were then able to turn around and deliver all of this with an as a service experience with HPE Greenlake to customers so that's where I want to take you next so how invasive is this going to be to a large shop well it is completely seamless in that way so with Greenlake we're able to deliver a fully managed service experience where the a cloud like page you go consumption model and combining it with HPE financial services we're also able to transform their organization in terms of this journey and make it a fully self-funding journey as well so today the typical administrator the typical shop has got a bunch of administrators that are administrating devices that's starting to change they've introduced automation that typically is associated with those devices but if we think three to five years out folks going to be thinking more in terms of data services and how those services get consumed and that's going to be what the storage part of I t's going to be thinking about they can almost become day to administrators if I got that right yes intelligence is fundamentally changing everything not only on the consumer side but on the business side of it a lot of what we've been talking about is intelligence is the game changer we actually see the dawn of the intelligence era and through this AI driven experience what it means for customers as a it enables a support experience that they just absolutely love secondly it means that the infrastructure is always on it's always fast it's always optimized in that sense and thirdly in terms of making these data services that are available and data insights that are being unlocked it's all about how can you enable your innovators and the data scientists and the data analysts to shrink that time to deriving insights from months literally down to minutes today there's this chasm that exists where there's a great concept of how can i leverage the AI technology and between that concept to making it real to thinking about a where can I actually fit and then how do i implement an end-to-end solution and a technology stack so then I just have a pipeline that's available to me that chasm literally is a matter of months and what we're able to deliver for example with HPE blue data is literally a catalog self-service experience where you can select and seamlessly build a pipeline literally in a matter of minutes and it's just all completely hosted seamlessly so making AI and machine learning essentially available for the mainstream through so the ontology data platform makes it possible to see these new classes of applications become routine without forcing the underlying storage administrators themselves to become data scientists absolutely all right the intelligent data platform is a very great concept but it's got to be made real and it's being made real today by HP Calvin Zito's a thought leader at HPE and he's done a series of chalk talks as it pertains to improving storage improving data management one of the more interesting ones was specifically on the intelligent data platform let's watch Calvin Zito's chalk talk hey guys I love it's time for another around the storage black chalk talk in this chalk top we're gonna look at the intelligent Data Platform let me set up the discussion at HP we see the dawn of the intelligence error the flatshare brought a speed with flash flash is now table stakes the cloud era brought new levels of agility and everyone expects as a service experience going forward the intelligence era with an AI driven experience for infrastructure operations in AI enabled unlocking of insights is poised to catapult businesses forward so the intelligent era will see the rise of the intelligent enterprise the enterprise will be always on always fast always agile to respond to different challenges but most of all the intelligent enterprise will be built for innovation innovation that can ilish new services revenue streams and business models every enterprise will need to have an intelligent data strategy where your data is always on and always fast automated an on-demand hybrid by design and applies global intelligence for visibility and lifecycle management our strategy is to deliver an intelligent data platform that turns your data challenges into business opportunities it begins with workload optimized composable systems for multiple workloads and we deliver cloud services for a hybrid cloud environment so that you can seamlessly move data throughout its lifecycle I'll have more on this in a moment the global intelligence engine infuses the entire infrastructure with intelligence it starts with predicting and proactively resolving issues before they occur it creates a unique workload fingerprint and these workload fingerprints combined with global learning enable us to drive recommendations to keep your app workloads and supporting infrastructure always optimized and delivering predictable speed we have a REST API first strategy and offer pre build automation connectors we bring Apple wear protection for both traditional and modern new stack application workloads and you can use the intelligent data platform to build and deliver flexible big data and AI pipelines for driving real-time analytics let's take a quick look at the portfolio of workload optimized composable systems these are systems across mission-critical general-purpose workloads as well secondary data and solutions for the emerging big data and AI applications because our portfolio is built for the cloud we offer comprehensive cloud data services for both production workloads and backup and archive in the cloud HPE info site provides the global intelligence across the portfolio and we give you flexibility of consuming these solutions as a service with HPE Greenlake I want to close with one more thing the HPE intelligent data platform has three main attributes first it's AI driven it removes the burden of managing infrastructure so that IT can focus on innovating and not administrating second it's built for cloud and it enables easy data and workload mobility across hybrid cloud environments finally the intelligent data platform delivers and as a service experience so you can be your own cloud provider to learn more go to hp.com intelligent data always love to hear from you on Twitter where you can find me as calvin zito you can find my blog at hp.com slash blog until next time thanks for joining me on this around the storage black chalk talk I think Calvin makes a compelling case that the opportunity to use these technologies is available today not something that we're just going to wait for in the future and that's good because one of the most important things that business has to think about is how are they going to utilize some of these new AI and related technologies to alter the way that they engage their customers run their businesses and handle their operations and ultimately improve their overall efficiency and effectiveness in the marketplaces it's very clear that this intelligent data platform is required to do many of the advanced AI things that business wants to do but it also requires AI in the platform itself so let's go back to Sandeep Singh and talk to Sandeep about how HPE foresees AI being embedded in them into the intelligent data platform so it can make possible greater utilization of AI and the rest of the application portfolio so we've got the significant problem we now have to figure out how to architect because we want predictability and certainty and and cost clarity and to how we're going to do this part of the challenge or part of the pushers new use cases for AI so we're trying to push data up so that we can build these new use cases but it seems that we have to also have to take some of those very same technologies and drive them down into the infrastructure so we get greater intelligence greater self meter and greater self management self administration within the infrastructure itself I got that right yes absolutely what becomes important for customers is when you think about data and ultimately storage that underlies the data is you can build and deploy fast and reliable storage but that's only solving half the problem greater than 50% of the issues actually end up arising from the higher layers for example you could change the firmware on the host bus adapter inside a server that can trickle down and cause a data unavailability or a performance slowdown issue you need to be able to predict that all the way at that higher level and then prevent that from occurring or your virtual machines might be in a state of over memory commitment at the server level or you CPU over commitment how do you discover those issues and prevent them from happening the other area that's becoming important is when we talk about this whole notion of cloud and hybrid cloud right that complexity tends to multiply exponentially so when the smarts you guys are going after building that hybrid cloud infrastructure fundamental challenges even as I've got a new workload and I want to place that you even on premises because you've had lots of silos how do you even figure out where should I place a workload a and how it'll react with workloads B and C on a given system and now you multiply that across hundreds of systems multiple clouds and the challenge you can see that it's multiplying exponentially oh yeah well I would say that having you know where do I put workload a the right answer today maybe here but the right answer tomorrow maybe some where else and you want to make sure that the service is right required to perform workload a our resident and available without a lot of administrative work necessary to ensure that there's commonality that's kind of what we mean by this hybrid multi cloud world isn't it absolutely and you when you start to think about it basically you end up in requiring and fundamentally needing the data mobility aspect of it because without the data you can't really move your workloads and you need consistency of data services so that your app if it's architected for reliability and a set of data services those just go along with the application and then you need building on top of that the portability for your actual application workload consistently managed with a hybrid management interface there so we want to use an intelligent data platform that's capable of assuring performance assuring availability and assuring security and going beyond that to then deliver a simplified automated experience right so that everything is just available through a self-service interface and then it brings along a level of intelligence that's just built into it globally so that in instead of trying to manually predict and landing in a world of reactive after IT fires have occurred is that there are sea of sensors and it's automatic the infrastructures automatically for predicting and preventing issues before they ever occur and then going beyond that how can you actually fingerprint the individual application workloads to then deliver prescriptive insights right to keep the infrastructure always optimized in that sense so discerning the patterns of data utilization so that the administrative costs of making sure the data is available where it needs to be number one number two assuring that data as assets is made available to developers as they create new applications new new things that create new work but also working very closely with the administrators so that they are not bound [Music] as you know an explosion in the number of tasks adapt to perform to keep this all working across the board yes I want to thank Sandeep Singh and calvin zito both of HPE as well as wiki bonds David Floyd for sharing their ideas on this crucially important topic of how we're going to take more of a platform approach to do a better job of managing crucial data assets in today's and tomorrow's digital businesses I'm Peter Burris and this has been another wiki bomb the cube digital community event sponsored by HPE now stay tuned for our crowd chat which will be your opportunity to ask your questions share your experiences and push for the community's thinking on important issues facing business today thank you very much for watching and now let's crouch [Music]

Published Date : Jul 26 2019

SUMMARY :

of it so that the requirement then is to

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Sandeep Singh	PERSON	0.99+
David Floyd	PERSON	0.99+
Peter Burris	PERSON	0.99+
David Flor	PERSON	0.99+
three	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
David floor	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
tomorrow	DATE	0.99+
calvin zito	PERSON	0.99+
HP	ORGANIZATION	0.99+
Calvin Zito	PERSON	0.99+
today	DATE	0.99+
greater than 50%	QUANTITY	0.99+
second part	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Calvin Zito	PERSON	0.98+
two paths	QUANTITY	0.98+
five years	QUANTITY	0.98+
over a billion data points	QUANTITY	0.98+
Sandeep	PERSON	0.98+
hundreds of thousands of systems	QUANTITY	0.97+
each individual piece	QUANTITY	0.97+
both	QUANTITY	0.97+
first conversation	QUANTITY	0.97+
hundreds of systems	QUANTITY	0.97+
each	QUANTITY	0.96+
one	QUANTITY	0.96+
first	QUANTITY	0.96+
three main attributes	QUANTITY	0.95+
one set	QUANTITY	0.95+
one place	QUANTITY	0.94+
about 25 minutes	QUANTITY	0.94+
Sandeep	ORGANIZATION	0.94+
one size	QUANTITY	0.94+
wiki Bond	ORGANIZATION	0.93+
hundred percent	QUANTITY	0.92+
HPE	TITLE	0.91+
Greenlake	ORGANIZATION	0.91+
second	QUANTITY	0.91+
half the problem	QUANTITY	0.91+
one location	QUANTITY	0.87+
Palo Alto California	LOCATION	0.86+
first strategy	QUANTITY	0.83+
kload	ORGANIZATION	0.83+
a lot of enterprises	QUANTITY	0.81+
hp.com	ORGANIZATION	0.81+
a lot of decision-makers	QUANTITY	0.81+
wiki bond	ORGANIZATION	0.81+
h2o	TITLE	0.81+
Kafka lastic	TITLE	0.79+
Twitter	ORGANIZATION	0.79+
of sensors	QUANTITY	0.71+
six months	QUANTITY	0.69+
Oracle	ORGANIZATION	0.67+

Joe DosSantos, Qlik | CUBE Conversation, April 2019

>> From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE! Now here's your host, Stu Miniman! >> I'm Stu Miniman and this is a CUBE Conversation from our Boston area studio. Going to dig in to discuss the data catalog and to help me do that, I want to welcome to the program first-time guest Joe DosSantos who is the global Head of Data Management Strategy at Qlik. Joe, thank you so much for joining us. >> Good to be here Stu. >> All right so the data catalog, let's start there. People, in general, know what a catalog is. well maybe some of the millenniums might not know as much as those of us that been in the industry a little bit longer might have. So start there and help level set us. >> So our thinking is that there are lots of data assets around and people can't get at them. And just like you might be able to go to Amazon and shop for something, and you go through a catalog or you go to the library and you can see what's available, we're trying to approximate that same kind of shopping experience for data. You should be able to see what you have, you should be able to look for things that you need, you should be able to find things you didn't even know were available to you. And then you should be able to be able to put them into your cart in a secure way. >> So Joe, the step one is, I've gathered my data lake, or whatever oil or water analogy we want to use for gathering the data on, and then we've usually got analytic tools and lots of things there but this is a piece of that overall puzzle, do I have that right? >> That's exactly right so, if you think about what are the obstacles to analytics, there are studies out there that say less than one percent of analytics data is actually being analyzed. We're having a trouble with the pipelines to get data into the hands of people who can do something meaningful with it. So what is meaningful? Could be data science, could be natural language, which maybe if you have an Alexa at home or you just ask a question and that information is provided right back to you. So somebody wants to do something meaningful with data but they can't get it. Step one is go retrieve it, so our Attunity solution is really about how do we start to effectively build pipelines to go retrieve data from the source? The next step though is how do I understand that data? Cataloging isn't about just having a whole bunch of boxes on a shelf, it's being able to describe the contents of those shelves, it's being able to know that I need that thing. If you were to go into an Amazon.com experience and you say I'm going on a fishing trip and you're looking for a canoe, it'll offer you a paddle, it'll offer you lifejackets. It guides you through that experience. We want data to be the same way, this guided trip through the data that's available to you in that environment. >> Yes, it seems like - metadata is something we often talk about but it seems like even more than that. >> It really is, metadata is a broad term. If you want to know about your data, you want to know where it came from. I often joke that there are three things you want to know about data: what is it, where did it come from and who can have access to it under what circumstances. Now those are really simple concepts but they're really complex under the covers. What is data? Well, is this private information, is this person identifiable information, is a tax ID, is it a credit card? I come from TD Bank and we were very preoccupied with the idea of someone getting data that they shouldn't. You don't want everyone running around with credit cards, how do I recognize a credit card, how do I protect a credit card? So the idea of cataloging is not just available for everything, it's security. I'm going to give you an example of what happens when you walk into a pharmacy. If you walk into a pharmacy and you want a pack of gum or shampoo you walk up to the shelf and you grab it, it's carefully marked in the aisles, it's described but it's public, it's easy to get, there aren't any restrictions. If you wanted chewing tobacco or cigarettes you would need to present somebody with an ID who need to say that you are of age, who would need to validate that you are authorized to see that and if you wanted Oxycontin, you'd best have a prescription. Why isn't data like that, why don't we have rules that stipulate what kind of data belong in what kind of category and who can have access to it? We believe that you can, so a lot of impediments to that are about availability and visibility but also about security and we believe that once you've provisioned that data to a place then the next step is understanding clearly what it is, and who can have access to it so that you can provision it downstream to all of these different analytic consumers that need it. >> Yeah, data security is absolutely front and center, it's the conversation at board levels today, so the catalog, is it a security tool or it works with kind of your overall policies and procedures? >> So you need to have a policy. One of the fascinating things that exists in a lot of companies is you ask people please give me the titles of the columns that constitute personally identifiable information, you'll get blank stares. So if you don't have a policy, you don't have a construct, you're hopelessly lost. But as soon as you write that down now you can start building rules around that. You can know who can have access to what under what circumstances. When I was at TD we took care to try and figure out what the circumstances were that allowed people to do their job. If you're in marketing you need to understand the demographic information, you need to be able to distribute a marketing list that actually has people's names and addresses on it. Do you need their credit card number, probably not. We started to work through these scenarios of understanding what the nature of data was on a must-have basis and then you don't have to ask for approval every single time. If you go to Amazon you don't ask for approval to buy the canoe, you just know whether it's in stock, if it's available and if it's in your area. Same thing with data, we want to remove all of the friction associated with that just because the rules are in place. >> Okay, so now that I have the data what do I do with it? >> Well this is actually really an important part of out Qlik story. So Qlik is not trying to lock people into a Qlik visualization scenario. Once you have data what we're trying to do is to say that discovery might happen across lots of different platforms. Maybe you're a Tableau user, I don't know why, but there are Tableau users - no in fact we did use Tableau at TD - but if you wanted provision data and discover things and comparable BI tools, no problem. Maybe you want to move that into a machine learning type of environment, you have TensorFlow, you have H2O libraries doing predictive modeling, you have R and Python, all of those things are things that you might want to do, in fact these days a lot of times people don't want analytics and visualizations, they want to ask the questions, do you have an Amazon Alexa in your house? >> I have an Alexa and a Google Home. >> That's right so you don't want a fancy visualization, you want the answer to a question so a catalog enables that, a catalog helps you figure out where the data is that asks a question. So when you ask Alexa what's the capital of Kansas it's going through the databases that it has that are neatly tagged and cataloged and organized and it comes back with Topeka. >> Yeah. >> I didn't want to stump you there. >> Thank you Joe, boy, I think back in the world, there are people, ontological studies as to how I put these things together. As a user I'm guessing, using a tool like this, I don't need to have to figure how to set all this up, there's got to be way better tools and things like that just like in the discussion of metadata, most systems today do that for me or at least a lot of it but how much do I as a customer customize stuff and how much does it do it for me? >> So when you and I have a conversation we share a language and if I say where do you live you know that living implies a house, implies an address and you've made that connection. And so effectively all businesses have their own terminology and ontology of how they speak and what we do is, if we have that ontology described to us we will enforce those rules so we are able to then discover the data that fits that categorization of data. So we need the business to define that force and again a lot of this is about processing procedure. Anyone who works in technology knows that very little of the technological problems are actually about technology, they're about process and people and psychology. What we're doing is if someone says I care deeply and passionately about customers and customers have addresses and these are the rules around them, we can then apply those rules. Imagine the governance tools are there to make laws, we're like the police, we enforce those laws at time of shopping in that catalog metaphor. >> Wow Joe, my mind is spinning a little bit because one of the problems you have if you work for a big customer, you'd have different parts of the company that would all want the same answer but they'd ask it in very different ways and they don't speak the same language so does a catalog help with that? >> Well it does and it doesn't. I think that we are moving to a world in which for a lot of questions, truth is in the eye of the beholder. So if you think about a business that wants to close the books, you can't have revenue that was maybe three million, maybe four million. But if you want to say what was the effectiveness of the campaign that we ran last night? Was it more effective with women or men - why? Anytime someone asks a question like why, or I wonder if, these are questions that invite investigation, analysis and we can come to the table with different representations of that data, it's not about truth, it's about how we interpret that. So one of the peculiar and difficult things for people to wrap their arm around is in the modern data world with data democratization, two people can go in search of the same question and get wildly different answers. That's not bad, that's life, right? So what's the best movie that's out right now? There's no truth, it's a question of your tastes and what you need to be able to do is, as we move to a democratized world is, what were the criteria that were used? What was the data that was used? And so we need those things to be cited but the catalog is effectively the thing that puts you in touch with the data that's available. Think about your college research projects. You wrote a thesis or a paper, you were meant to draw a conclusion, you had to go to the library and get the books that you needed. And maybe, hopefully, no one had ever combined all of those ideas from those books to create the conclusion that you did. That's what we're trying to do every single day in the businesses of the world in 2019. >> Yeah it's a little scary in the world of science most things don't come down to a binary answer, there's the data to prove it and what we understand today might not be - if we look and add new data to it it could change. Bring in some customer examples as to what they're doing, how this impacts it and I wish brings more certainty into our world. >> Absolutely, so I come from TD Bank and I was the Vice President of Information Management Technology there, and we used Data Catalyst to catalog a very large data lake so we had a Hadoop data lake that was six petabytes, had about 200 different applications in it. And what we were able to do was to allow self service to those data assets in that lake. So imagine you're just looking for data and instead of having to call somebody or get a pipeline built and spend the next six months getting data, you go to a portal, you grab that data. So what we were able to do was to make it very simple to reduce that. We usually think that it takes about 50% of your time in an analysis context to find the data, to make the data useful, what if that was all done for you? So we created a shopping experience for that at an enterprise level. What was the goal - well at TD, we were all about legendary customer experience so we found very important were customer interactions and their experiences, their transactions, their web Qliks, their behavioral patterns and if you think about it what any company is looking to do is to catch a customer in the act of deciding and what are those critical things that people decide? In a bank it might be when to buy a house, when you need mortgages and you need potentially loans and insurance. For a healthcare company it might be when they change jobs, for a hospital it might be when the weather changes. And everybody's looking for an advantage to do that and you can only get that advantage if you're creative about recognizing those moments through analytics and then acting in real time with streaming to do something about that moment. >> All right so Joe one of the questions I have is is there an aspect of time when you go into this because I understand if I ask questions based on the data that I have available today but if I'd asked that two weeks before that it would be some different data and if I kept watching it, it would do that and so I've got certain apps I use like when's the best time to buy a ticket, when is the best time to do that, how does that play in? >> So there are two different dimensions to this, the first is what we call algorithmic decay. If you're going to try and develop an algorithm you don't want the data shifting under your feet as you do things because all of a sudden your results will change if you're not right and the sad reality is that most humans are not very original so if I look at your behavior for the past ten years and if I look at the past twenty it won't be necessarily different from somebody else, so what we're looking to do is catch mass patterns, that's the power of big data, to look at a lot of patterns to figure out the repeatability in most patterns. At that point you're not really looking for the data to change, then you go to score it and this is where the data changes all the time. So think about big data as looking at a billion rows and figuring out what's going on. The next thing would be traditionally called fast data which is now based on an algorithm - this event just happened, what should I do? That data is changing under your feet regularly, you're looking to stream that data, maybe with a change data capture tool like Attunity, you're looking to get that into the hands of people in applications to make decisions really quickly. Now what happens over time is people's behaviors change - only old people are on Facebook now right, you know this, so demographics change and the things that used to be very predictive fail to be and there has to be capability in an industry, in an enterprise to be able deal with those algorithms as they start to decay and replace them with something fresher. >> All right Joe, how do things like government compliance fit into this? >> So governance is really at the core of the catalog. You really need to understand what the rules are if you want to have an effective catalog. We don't believe that every single person in a data democratized world should have access to every single data element. So you need to understand what is this data, how should I protect it and how should I think about the overall protection of this data and the use of this data. This is a really important governance principle to figure out who can have access to these data sets under what circumstances. Again nothing to do with technology but the catalog should really enforce your policy and a really good catalog should help to enforce the policies that you're coming up with, with who should have access to that data under what circumstances. >> Okay so Joe this is a pretty powerful tool, how do customers measure that they're getting adoption, that they're getting the results that they were hoping to when they roll this out? >> No one ever woke up one day and said boy would it be great if I stockpiled petabytes of data. At the end of the day, >> I know some storage companies that say that. >> They wish the customers would say that but at the end of the day you have data for analytics value and so what is analytics value? Maybe it's about a predictive algorithm. Maybe it's about a vizualisation, maybe its about a KPI for your executive suite. If you don't know, you shouldn't start. What we want to start to do is to think about use cases that make a difference to an enterprise. At TD that was fundamentally about legendary customer experience, offering the next best action to really delight that customer. At SunLife that was about making sure that they had an understand from a customer support perspective about their consumers. At some of our customers, at a healthcare company it was about faster discovery of drugs. So if you understand what those are you then start from the analytical outcome to the data that supports that and that's how you get started. How can I get the datasets that I'm pretty sure are going to drive the needle and then start to build from there to make me able to answer more and more complex questions. >> Well great those are some pretty powerful use cases, I remember back in the early Hadoop days it was like let's not have the best minds of our time figuring out how you can get better ad clicks right? >> That's right it's much easier these days. Effectively Hadoop really allows you to do, what big data really allows you to do is to answer questions more comprehensively. There was a time when cost would prevent you from being able to look at ten years worth of history, those cost impediments are gone. So your analytics are can be much better as a result, you're looking at a much broader section of data and you can do much richer what-if analysis and I think that really the secret of any good analytics is encouraging the what-if kind of questions. So you want in a data democratized world to be able to encourage people to say I wonder if this is true, I wonder if this happened and have the data to support that question. And people talk a lot about failing fast, glibly, what does that mean? Well I wonder if right now women in Montana in summertime buy more sunglasses. Where's the data that can answer that question? I want that quickly to me and I want in five minutes to say boy Joe, that was really stupid. I failed and I failed fast but it wasn't because I spent the next six weeks looking for the data assets, it's because I had the data, got analysis really quickly and then moved on to something else. The people that can churn through those questions fastest will be the ones that win. >> Very cool, I'm one of those people I love swimming into data always seeing what you can learn. Customers that want to get started, what do you recommend, what are the first steps? >> So the first thing is really about critical use case identification. Again no one wants to stockpile data so we need to start to think about how the data is going to affect an outcome and think about that user outcome. Is it someone asking in natural language a question of an application to drive a certain behavior? Is it a real time decision, what is the thing that you want to get good at? I've mentioned that TD wanted to be good about customer experience and offer development. If you think about what Target did there's a notorious story about them being able to predict pregnancy because they recognized that there was an important moment, there was a behavioral change in consumers that would overall change how they buy. What's important to you, what data might be relevant for that, anchor it there, start small, go start to operationalize the pipes that get you the data that you need and encourage a lot of experimentation with these data assets that you've got. You don't need to create petabytes of data. Create the data sets that matter and then grow from use case to use case. One of our customers SunLife did a wonderful job of really trying to articulate seven or eight key use cases that would matter and built their lake accordingly. First it was about customer behavior then it was employee behavior. If you can start to think about your customers and what they care about there's a person out there that cares about customer attrition. There's a person out there that cares about employee attrition, there's a person out there that cuts costs about cost of delivery of goods. Let's figure out what they need and how to use analytics to drive that and then we can start to get smart about the data assets that can really cause that analytics to explode. >> All right well Joe, really appreciate all the updates on the catalogs there, data at the center of digital transformation for so many customers and illuminating some key points there. >> Happy to be here. >> All right thank you so much for watching theCUBE, I'm Stu Miniman. (upbeat music)

Published Date : May 17 2019

SUMMARY :

and to help me do that, I want to welcome All right so the data catalog, let's start there. You should be able to see what you have, that's available to you in that environment. Yes, it seems like - metadata is something we often are authorized to see that and if you wanted the demographic information, you need to be able do you have an Amazon Alexa in your house? That's right so you don't want Thank you Joe, boy, I think back in the world, So when you and I have a conversation and what you need to be able to do is, there's the data to prove it and what we and instead of having to call somebody for the data to change, then you go to score it So you need to understand what is this data, At the end of the day, but at the end of the day you have data and have the data to support that question. what do you recommend, what are the first steps? the pipes that get you the data that you need data at the center of digital All right thank you so much

ENTITIES

Entity	Category	Confidence
TD Bank	ORGANIZATION	0.99+
Joe DosSantos	PERSON	0.99+
Joe	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
three million	QUANTITY	0.99+
SunLife	ORGANIZATION	0.99+
April 2019	DATE	0.99+
Montana	LOCATION	0.99+
2019	DATE	0.99+
four million	QUANTITY	0.99+
Boston	LOCATION	0.99+
seven	QUANTITY	0.99+
five minutes	QUANTITY	0.99+
ten years	QUANTITY	0.99+
two people	QUANTITY	0.99+
today	DATE	0.99+
less than one percent	QUANTITY	0.99+
Kansas	LOCATION	0.99+
TD	ORGANIZATION	0.99+
six petabytes	QUANTITY	0.99+
One	QUANTITY	0.99+
first	QUANTITY	0.99+
first steps	QUANTITY	0.99+
First	QUANTITY	0.99+
Amazon.com	ORGANIZATION	0.99+
three things	QUANTITY	0.99+
Facebook	ORGANIZATION	0.98+
Boston, Massachusetts	LOCATION	0.98+
Tableau	TITLE	0.98+
first-time	QUANTITY	0.98+
about 50%	QUANTITY	0.97+
Target	ORGANIZATION	0.97+
Python	TITLE	0.97+
Alexa	TITLE	0.97+
one	QUANTITY	0.97+
about 200 different applications	QUANTITY	0.97+
Stu	PERSON	0.97+
last night	DATE	0.96+
eight key use cases	QUANTITY	0.94+
TensorFlow	TITLE	0.9+
step one	QUANTITY	0.89+
Information Management Technology	ORGANIZATION	0.89+
one day	QUANTITY	0.89+
R	TITLE	0.88+
SiliconANGLE	ORGANIZATION	0.87+
Oxycontin	COMMERCIAL_ITEM	0.87+
H2O	TITLE	0.86+
Step one	QUANTITY	0.86+
Qlik	PERSON	0.86+
Hadoop	TITLE	0.85+
Qlik	TITLE	0.84+
Qlik	ORGANIZATION	0.83+
single time	QUANTITY	0.81+
billion rows	QUANTITY	0.81+
two weeks before	DATE	0.81+
next six months	DATE	0.8+
Vice President	PERSON	0.8+
two different dimensions	QUANTITY	0.8+
petabytes	QUANTITY	0.75+
Google	COMMERCIAL_ITEM	0.75+
first thing	QUANTITY	0.75+
single data element	QUANTITY	0.7+
next six weeks	DATE	0.7+
past ten	DATE	0.66+
single person	QUANTITY	0.65+
one of the questions	QUANTITY	0.64+
single day	QUANTITY	0.64+

Sandeep Singh, HPE | CUBEConversation, May 2019

from our studios in the heart of Silicon Valley Palo Alto California this is a cute conversation welcome to the cube studios for another cube conversation where we go in-depth with thought leaders driving business outcomes with technology I'm your host Peter Burris one of the challenges enterprises face as they consider the new classes of applications that they are going to use to create new levels of business value is how to best deploy their data in ways that don't add to the overall complexity of how the business operates and to have that conversation we're here with Sandeep Singh who's the VP of storage marketing at HPE Sandeep welcome to the cube Peter thank you I'm very excited so Sandeep I started off by making the observation that we've got this mountain of data coming in a lot of enterprises at the same time there seems to be a the the notion of how data is going to create new classes of business value seems to be pretty deeply ingrained and acculturated to a lot of decision makers so they want more value out of their data but they're increasingly concerned about the volume of data that's going to hit them how in your conversations with customers are you hearing them talk about this fundamental challenge and so that that's a great question you know across the board data is at the heart of applications pretty much everything that organizations do and when they look at it in conversations with customers it really boils down to a couple of areas one is how is my data just effortlessly available all the time it's always fast because fundamentally that's driving the speed of my business and that's incredibly important and how can my various audiences including developers just consume it like the public cloud in a self-service fashion and then the second part of that conversation is really about this massive data storm or mountain of data that's coming and it's gonna be available how do how do I Drive a competitive advantage how do i unlock these hidden inside in that data to uncover new revenue streams new customer experiences those are the areas that we hear about and fundamentally underlying it the challenge for customers is boy I have a lot of complexity and how do I ensure that I have the necessary insights in a the infrastructure management so I am not beholden and more my IT staff isn't beholden to fighting the IT fires that can cause disruptions and delays to projects so fundamentally we want to be able to push time and attention in the infrastructure in the administration of those devices that handle the data and move that time and attention up into how we deliver the data services and ideally up into the applications that are going to actually generate dense new class of work within a digital business so I got that right absolutely it's about infrastructure that just runs seamlessly it's always on it's always fast people don't have to worry about what is it gonna go down is my data available or is it gonna slow down people don't want sometimes faster one always fast right and that's governing the application performance that ultimately I can deliver and you talked about well geez if it if the data infrastructure just works seamlessly then can I eventually get to the applications and building the right pipelines ultimately for mining that data drive doing the AI and the machine learning analytics driven insights from that so we've got the significant problem we now have to figure out how to architect because we want predictability and certainty and and and cost clarity and to how we're going to do this part of the challenge or part of the pushier is new use cases for AI so we're trying to push data up so that we can build these new use cases but it seems as though we have to also have to take some of those very same technologies and drive them down into the infrastructure so we get greater intelligence greater self meter and greater self management self administration within the infrastructure itself oh I got that right yes absolutely lay what becomes important for customers is when you think about data and ultimately storage that underlies the data is you can build and deploy fast and reliable storage but that's only solving half the problem greater than 50% of the issues actually end up arising from the higher layers for example you could change the firmware on the host bus adapter inside a server that can trickle down and cause a data unavailability or a performance low down issue you need to be able to predict that all the way at that higher level and then prevent that from occurring or your virtual machines might be in a state of over memory commitment at the server level or you could CPU over-commitment how do you discover those issues and prevent them from happening the other area that's becoming important is when we talk about this whole notion of cloud and hybrid cloud right that complexity tends to multiply exponentially so when the smarts you guys are going after building that hybrid cloud infrastructure fundamental challenges even as I've got a new workload and I want to place that you even on-premises because you've had lots of silos how do you even figure out where should I place a workload a and how it'll react with workloads B and C on a given system and now you multiply that across hundreds of systems multiple clouds and the challenge you can see that it's multiplying exponentially oh yeah well I would say that having you know where do I put workload a the right answer today maybe here but the right answer tomorrow maybe somewhere else and you want to make sure that the service is right required to perform workload a our resident and available without a lot of administrative work necessary to ensure that there's commonality that's kind of what we mean by this hybrid multi-cloud world isn't it absolutely and yet when you start to think about it basically you end up in requiring and fundamentally meeting the data mobility aspect of it because without the data you can't really move your workloads and you need consistency of data services so that your app if it's architected for reliability and a set of data services those just go along with the application and then you need building on top of that the portability for your actual application workload consistently managed with a hybrid management interface there so we want to use an intelligent data platform that's capable of assuring performance assuring availability and assuring security and going beyond that to then deliver a simplified automated experience right so that everything is just available through a self-service interface and then it brings along a level of intelligence that's just built into it globally so that in instead of trying to manually predict and landing in a world of reactive after IT fires have occurred is that there are sea of sensors and it's automatic the infrastructures automatically for predicting and preventing issues before they ever occur and then going beyond that how can you actually fingerprint the individual application workloads to then deliver prescriptive insights right to keep the infrastructure always optimized in that sense so discerning the patterns of data utilization so that the administrative costs of making sure the data is available where it needs to be number one number two assuring that data as assets is made available to developers as they create new applications new new things that create new work but also working very closely with the administrators so that they are not bound to as an explosion of the number of tasks adapt to perform to keep this all working across the board yes ok so we've got we've we've got a number of different approaches to how this class of solution is going to hit the marketplace look HP he's been around for 70 years yeah something along those lines you've been one of the leaders in the complex systems arena for a long time and that includes storage where are you guys taking some of these to oh geez yeah so our strategy is to deliver an intelligent data platform and that intelligent data platform begins with workload optimized composable systems that can span the mission critical workloads general purpose secondary Big Data ai workloads we also deliver cloud data services that enable you to embrace hybrid cloud all of these systems including all the way to Cloud Data Services are plumbed with data mobility so for example use cases of even modernizing protection and going all the way to protecting cost effectively in the public cloud are enabled but really all of these systems then are imbued with a level of intelligence with a global intelligence engine that begins with predicting and proactively resolving issues before they occur but it goes way beyond that in delivering these prescriptive insights that are built on top of global learning across hundreds of thousands of systems with over a billion data points coming in on a daily basis to be able to deliver at the information at the fingertips so even the virtual machine admins to say this virtual machine is sapping the performance of this node and if you were to move it to this other node the performance or the SLA for all of the virtual machine farm will be even better we build on top of that to deliver pre-built automation so that it's hooked in with a REST API for strategy so that developers can consume it in a containerized application that's orchestrated with kubernetes or they can leverage it as infrastructure eyes code whether it's with ansible puppet or chef we accelerate all of the application workloads and bring up where data protection so it's available for the traditional business applications whether they're built on SA P or Oracle or sequel or the virtual machine farms or the new stack containerized applications and then customers can build their ai and big data pipelines on top of the infrastructure with a plethora of tools whether they're using basically Kafka elastic map are h2o that complete flexibility exists and within HPE were then able to turn around and deliver all of this with an as a service experience with HPE Green Lake to customers so that's where I want to take you next so how invasive is this going to be to a large shop well it is completely seamless in that way so with Green Lake we're able to deliver a fully managed service experience with a cloud like pay-as-you-go consumption model and combining it with HPE financial services we're also able to transform their organization in terms of this journey and make it a fully self-funding journey as well so today the typical administrator of this typical shop has got a bunch of administrators that are administrating devices that's starting to change they've introduced automation that typically is associated with those devices but in we think three to five years out folks gonna be thinking more in terms of data services and how those services get consumed and that's going to be what the storage part of I t's can be thinking about it can almost become day to administrators if I got that right yes intelligence is fundamentally changing everything not only on the consumer side but on the business side of it a lot of what we've been talking about is intelligence is the game changer we actually see the dawn of the intelligence era and through this AI driven experience what it means for customers as a it enables a support experience that they just absolutely love secondly it means that the infrastructure is always on it's always fast it's always optimized in that sense and thirdly in terms of making these data services that are available and data insights that are being unlocked it's all about how can enable your innovators and the data scientists and the data analysts to shrink that time to deriving insights from months literally down to minutes today there's this chasm that exists where there's a great concept of how can i leverage the AI technology and between that concept to making it real to thinking about a where can it actually fit and then how do i implement an end-to-end solution and a technology stack so that I just have a pipeline that's available to me that chasm you literally as a matter of months and what we're able to deliver for example with HPE blue data is literally a catalog self-service experience where you can select and seamlessly build a pipeline literally in a matter of minutes and it's just all completely hosted seamlessly so making AI and machine learning essentially available for the mainstream through so the ontology data platform makes it possible to see these new classes of applications become routine without forcing the underlying storage administrators themselves to become data scientists absolutely all right well thank you for joining us for another cute conversation Sandeep Singh really appreciate your time in the cube thank you Peter and fundamentally what we're helping customers do is really to unlock data potential to transform their businesses and we look forward to continuing that conversation excellent I'm Peter Burris see you next time you [Music]

Published Date : May 15 2019

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Sandeep Singh	PERSON	0.99+
May 2019	DATE	0.99+
Peter Burris	PERSON	0.99+
three	QUANTITY	0.99+
tomorrow	DATE	0.99+
Peter	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
today	DATE	0.99+
second part	QUANTITY	0.99+
HPE	ORGANIZATION	0.98+
Kafka	TITLE	0.98+
70 years	QUANTITY	0.98+
over a billion data points	QUANTITY	0.98+
hundreds of systems	QUANTITY	0.97+
greater than 50%	QUANTITY	0.97+
Green Lake	ORGANIZATION	0.97+
five years	QUANTITY	0.96+
hundreds of thousands of systems	QUANTITY	0.96+
Sandeep	PERSON	0.92+
Palo Alto California	LOCATION	0.92+
HP	ORGANIZATION	0.9+
HPE Green Lake	ORGANIZATION	0.85+
one	QUANTITY	0.81+
SA P	TITLE	0.8+
sea of sensors	QUANTITY	0.8+
half the problem	QUANTITY	0.77+
secondly	QUANTITY	0.74+
Oracle	ORGANIZATION	0.72+
two	QUANTITY	0.7+
lots of silos	QUANTITY	0.68+
one of the leaders	QUANTITY	0.62+
intelligence	ORGANIZATION	0.61+
thirdly	QUANTITY	0.55+
sequel	TITLE	0.54+
HPE	TITLE	0.54+
lot	QUANTITY	0.52+
issues	QUANTITY	0.51+
REST	TITLE	0.49+
h2o	TITLE	0.46+

Deploying AI in the Enterprise

(orchestral music) >> Hi, I'm Peter Burris and welcome to another digital community event. As we do with all digital community events, we're gonna start off by having a series of conversations with real thought leaders about a topic that's pressing to today's enterprises as they try to achieve new classes of business outcomes with technology. At the end of that series of conversations, we're gonna go into a crowd chat and give you an opportunity to voice your opinions and ask your questions. So stay with us throughout. So, what are we going to be talking about today? We're going to be talking about the challenge that businesses face as they try to apply AI, ML, and new classes of analytics to their very challenging, very difficult, but nonetheless very value-producing outcomes associated with data. The challenge that all these businesses have is that often, you spend too much time in the infrastructure and not enough time solving the problem. And so what's required is new classes of technology and new classes of partnerships and business arrangements that allow for us to mask the underlying infrastructure complexity from data science practitioners, so that they can focus more time and attention on building out the outcomes that the business wants and a sustained business capability so that we can continue to do so. Once again, at the end of this series of conversations, stay with us, so that we can have that crowd chat and you can, again, ask your questions, provide your insights, and participate with the community to help all of us move faster in this crucial direction for better AI, better ML and better analytics. So, the first conversation we're going to have is with Anant Chintamaneni. Anant's the Vice President of Products at BlueData. Anant, welcome to theCUBE. >> Hi Peter, it's great to be here. I think the topic that you just outlined is a very fascinating and interesting one. Over the last 10 years, data and analytics have been used to create transformative experiences and drive a lot of business growth. You look at companies like Uber, AirBnB, and you know, Spotify, practically, every industry's being disrupted. And the reason why they're able to do this is because data is in their DNA; it's their key asset and they've leveraged it in every aspect of their product development to deliver amazing experiences and drive business growth. And the reason why they're able to do this is they've been able to leverage open-source technologies, data science techniques, and big data, fast data, all types of data to extract that business value and inject analytics into every part of their business process. Enterprises of all sizes want to take advantage of that same assets that the new digital companies are taking and drive digital transformation and innovation, in their organizations. But there's a number of challenges. First and foremost, if you look at the enterprises where data was not necessarily in their DNA and to inject that into their DNA, it is a big challenge. The executives, the executive branch, definitely wants to understand where they want to apply AI, how to kind of identify which huge cases to go after. There is some recognition coming in. They want faster time-to-value and they're willing to invest in that. >> And they want to focus more on the actual outcomes they seek as opposed to the technology selection that's required to achieve those outcomes. >> Absolutely. I think it's, you know, a boardroom mandate for them to drive new business outcomes, new business models, but I think there is still some level of misalignment between the executive branch and the data worker community which they're trying to upgrade with the new-age data scientists, the AI developer and then you have IT in the middle who has to basically bridge the gap and enable the digital transformation journey and provide the infrastructure, provide the capabilities. >> So we've got a situation where people readily acknowledge the potential of some of these new AI, ML, big data related technologies, but we've got a mismatch between the executives that are trying to do evidence-based management, drive new models, the IT organization who's struggling to deal with data-first technologies, and data scientists who are few and far between, and leave quickly if they don't get the tooling that they need. So, what's the way forward, that's the problem. How do we move forward? >> Yeah, so I think, you know, I think we have to double-click into some of the problems. So the data scientists, they want to build a tool chain that leverages the best in-class, open source technologies to solve the problem at hand and they don't want, they want to be able to compile these tool chains, they want to be able to apply and create new algorithms and operationalize and do it in a very iterative cycle. It's a continuous development, continuous improvement process which is at odds with what IT can deliver, which is they have to deliver data that is dispersed all over the place to these data scientists. They need to be able to provide infrastructure, which today, they're not, there's an impotence mismatch. It takes them months, if not years, to be able to make those available, make that infrastructure available. And last but not the least, security and control. It's just fundamentally not the way they've worked where they can make data and new tool chains available very quickly to the data scientists. And the executives, it's all about faster time-to-value so there's a little bit of an expectation mismatch as well there and so those are some of the fundamental problems. There's also reproducibility, like, once you've created an analytics model, to be able to reproduce that at scale, to be then able to govern that and make sure that it's producing the right results is fundamentally a challenge. >> Audibility of that process. >> Absolutely, audibility. And, in general, being able to apply this sort of model for many different business problems so you can drive outcomes in different parts of your business. So there's a huge number of problems here. And so what I believe, and what we've seen with some of these larger companies, the new digital companies that are driving business valley ways, they have invested in a unified platform where they've made the infrastructure invisible by leveraging cloud technologies or containers and essentially, made it such that the data scientists don't have to worry about the infrastructure, they can be a lot more agile, they can quickly create the tool chains that work for the specific business problem at hand, scale it up and down as needed, be able to access data where it lies, whether it's on-prem, whether it's in the cloud or whether it's a hybrid model. And so that's something that's required from a unified platform where you can do your rapid prototyping, you can do your development and ultimately, the business outcome and the value comes when you operationalize it and inject it into your business processes. So, I think fundamentally, this start, this kind of a unified platform, is critical. Which, I think, a lot of the new age companies have, but is missing with a lot of the enterprises. >> So, a big challenge for the enterprise over the next few years is to bring these three groups together; the business, data science world and infrastructure world or others to help with those problems and apply it successfully to some of the new business challenges that we have. >> Yeah, and I would add one last point is that we are on this continuous journey, as I mentioned, this is a world of open source technologies that are coming out from a lot of the large organizations out there. Whether it's your Googles and your Facebooks. And so there is an evolution in these technologies much like we've evolved from big data and data management to capture the data. The next sort of phase is around data exploitation with artificial intelligence and machine learning type techniques. And so, it's extremely important that this platform enables these organizations to future proof themselves. So as new technologies come in, they can leverage them >> Great point. >> for delivering exponential business value. >> Deliver value now, but show a path to delivery value in the future as all of these technologies and practices evolve. >> Absolutely. >> Excellent, all right, Anant Chintamaneni, thanks very much for giving us some insight into the nature of the problems that enterprises face and some of the way forward. We're gonna be right back, and we're gonna talk about how to actually do this in a second. (light techno music) >> Introducing, BlueData EPIC. The leading container-based software platform for distributed AI, machine learning, deep learning and analytics environments. Whether on-prem, in the cloud or in a hybrid model. Data scientists need to build models utilizing various stacks of AI, ML and DL applications and libraries. However, installing and validating these environments is time consuming and prone to errors. BlueData provides the ability to spin up these environments on demand. The BlueData EPIC app store includes, best of breed, ready to run docker based application images. Like TensorFlow and H2O driverless AI. Teams can also add their own images, to provide the latest tools that data scientists prefer. And ensure compliance with enterprise standards. They can use the quick launch button. which provides pre configured templates with the appropriate application image and resources. For example, they can instantly launch a new Sandbox environment using the template for TensorFlow with a Jupyter Notebook. Within just a few minutes, it'll be automatically configured with GPUs and easy access to their data. Users can launch experiments and make GPUs automatically available for analysis. In this case, the H2O environment was set up with one GPU. With BlueData EPIC, users can also deploy end points with the appropriate run time. And the inference run times can use CPUs or GPUs. With a container based BlueData Platform, you can deploy fully configured distributed environments within a matter of minutes. Whether on-prem, in the public cloud, or in a hybrid a architecture. BlueData was recently acquired by Hewlett Packward Enterprise. And now, HPE and BlueData are joining forces to help you on your AI journey. (light techno music) To learn more, visit www.BlueData.com >> And we're back. I'm Peter Burris and we're continuing to have this conversation about how businesses are turning experience with the problems of advance analytics and the solutions that they seek into actual systems that deliver continuous on going value and achieve the business capabilities required to make possible these advanced outcomes associated with analytics, AI and ML. And to do that, we've got two great guests with us. We've got Kumar Sreekanti, who is the co-founder and CEO of BlueData. Kumar, welcome back to theCUBE. >> Thank you, it is nice to be here, back again. >> And Kumar, you're being joined by a customer. Ramesh Thyagarajan, is the executive director of the Advisory Board Company which is part of Optum now. Ramesh, welcome to theCUBE. >> Great to be here. >> Alright, so Kumar let's start with you. I mentioned up front, this notion of turning technology and understanding into actual business capabilities to deliver outcomes. What has been BlueData's journey along, to make that happen? >> Yeah, it all started six years ago, Peter. It was a bold vision and a big idea and no pun intended on big data which was an emerging market then. And as everybody knows, the data was enormous and there was a lot of innovation around the periphery. but nobody was paying attention to how to make the big data consumable in enterprise. And I saw an enormous opportunity to make this data more consumable in the enterprise and to give a cloud-like experience with the agility and elasticity. So, our vision was to build a software infrastructure platform like VMware, specially focused on data intensity distributed applications and this platform will allow enterprises to build cloud like experiences both on enterprise as well as on hybrid clouds. So that it pays the journey for their cloud experience. So I was very fortunate to put together a team and I found good partners like Intel. So that actually is the genesis for the BlueData. So, if you look back into the last six years, big data itself has went through a lot of evolution and so the marketplace and the enterprises have gone from offline analytics to AI, ML based work loads that are actually giving them predictive and descriptive analytics. What BlueData has done is by making the infrastructure invisible, by making the tool set completely available as the tool set itself is evolving and in the process, we actually created so many game changing software technologies. For example, we are the first end-to-end content-arised enterprise solution that gives you distributed applications. And we built a technology called DataTap, that provides computed data operation so that you don't have to actually copy the data, which is a boom for enterprises. We also actually built multitenancy so those enterprises can run multiple work loads on the same data and Ramesh will tell you in a second here, in the healthcare enterprise, the multitenancy is such a very important element. And finally, we also actually contributed to many open source technologies including, we have a project called KubeDirector which is actually is our own Kubernetes and how to run stateful workloads on Kubernetes. which we have actually very happy to see that people like, customers like Ramesh are using the BlueData. >> Sounds like quite a journey and obviously you've intercepted companies like the advisory board company. So Ramesh, a lot of enterprises have mastered or you know, gotten, understood how to create data lakes with a dupe but then found that they still weren't able to connect to some of the outcomes that they saw. Is that the experience that you had. >> Right, to be precise, that is one of the kind of problems we have. It's not just the data lake that we need to be able to do the workflows or other things, but we also, being a traditional company, being in the business for a long time, we have a lot of data assets that are not part of this data lake. We're finding it hard to, how do we get the data, getting them and putting them in a data lake is a duplication of work. We were looking for some kind of solutions that will help us to gather the benefits of leaving the data alone but still be able to get into it. >> This is where (mumbles). >> This is where we were looking for things and then I was lucky and fortunate to run into Kumar and his crew in one of the Hadoop conferences and then they demonstrated the way it can be done so immediately hit upon, it's a big hit with us and then we went back and then did a POC, very quickly adapt to the technology and that is also one of the benefits of corrupting this technology is the level of contrary memorization they are doing, it is helping me to address many needs. My data analyst, the data engineers and the data scientists so I'm able to serve all of them which otherwise wouldn't be possible for me with just this plain very (mumbles). >> So it sounds as though the partnership with BlueData has allowed you to focus on activities and problems and challenges above the technology so that you can actually start bringing data science, business objectives and infrastructure people together. Have I got that right? >> Absolutely. So BlueData is helping me to tie them all together and provide an excess value to my business. We being in the healthcare, the importance is we need to be able to look at the large data sets for a period of time in order to figure out how a patient's health journey is happening. That is very important so that we can figure out the ways and means in which we can lower the cost of health care and also provide insights to the physician, they can help get people better at health. >> So we're getting great outcomes today especially around, as you said that patient journey where all the constituents can get access to those insights without necessarily having to learn a whole bunch of new infrastructure stuff but presumably you need more. We're talking about a new world that you mentioned before upfront, talking about a new world, AI, ML, a lot of changes. A lot of our enterprise customers are telling us it's especially important that they find companies that not only deliver something today but demonstrate a commitment to sustain that value delivery process especially as the whole analytics world evolves. Are you experiencing that as well? >> Yes, we are experiencing and one of the great advantage of the platform, BlueData platform that gave me this ability to, I had the new functionality, be it the TensorFlow, be it the H2O, be it the heart studio, anything that I needed, I call them, they give me the images that are plug-and-play, just put them and all the prompting is practically transparent to nobody need to know how it is achieved. Now, in order to get to the next level of the predictive and prescriptive analytics, it is not just you having the data, you need to be able to have your curated data asset set process on top of a platform that will help you to get the data scientists to make you. One of the biggest challenges that are scientist is not able to get their hands on data. BlueData platform gives me the ability to do it and ensure all the security meets and all the compliances with the various other regulated compliances we need to make. >> Kamar, congratulations. >> Thank you. >> Sounds like you have a happy customer. >> Thank you. >> One of the challenges that every entrepreneur faces is how did you scale the business. So talk to us about where you are in the decisions that you made recently to achieve that. >> As an entrepreneur, when you start a company, odds are against you, right? You're always worried about it, right. You make so many sacrifices, yourself and your team and all that but the the customer is the king. The most important thing for us to find satisfied customers like Rameshan so we were very happy and BlueData was very successful in finding that customer because i think as you pointed out, as Ramesh pointed out, we provide that clean solution for the customer but as you go through this journey as a co-founder and CEO, you always worry about how do you scale to the next level. So we had partnerships with many companies including HPE and we found when this opportunity came in front of me with myself and my board, we saw this opportunity of combining the forces of BlueData satisfied customers and innovative technology and the team with the HPs brand name, their world-class service, their investment in R&D and they have a very long, large list of enterprise customers. We think putting these two things together provides that next journey in the BlueData's innovation and BlueData's customers. >> Excellent, so once again Kumar Sreekanti, co-founder and CEO of BlueData and Ramesh Thyagarajan who is the executive director of the advisory board company and part of Optum, I want to thank both of you for being on theCUBE. >> Thank you >> Thank you, great to be here. >> Now let's hear a little bit more about how this notion of bringing BlueData and HPE together is generating new classes of value that are making things happen today but are also gonna make things happen for customers in the future and to do that we've got Dave Velante who's with Silicon Angle Wiki Bond joined by Patrick Osbourne who's with HPE in our Marlborough studio so Dave over to you. >> Thanks Peter. We're here with Patrick Osbourne, the vice president and general manager of big data and analytics at Hewlett Packard Enterprise. Patrick, thanks for coming on. >> Thanks for having us. >> So we heard from Kumar, let's hear from you. Why did HPE purchase, acquire BlueData? >> So if you think about it from three angles. Platform, people and customers, right. Great platform, built for scale addressing a number of these new workloads and big data analytics and certainly AI, the people that they have are amazing, right, great engineering team, awesome customer success team, team of data scientists, right. So you know, all the folks that have some really, really great knowledge in this space so they're gonna be a great addition to HPE and also on the customer side, great logos, major fortune five customers in the financial services vertical, healthcare, pharma, manufacturing so a huge opportunity for us to scale that within HP context. >> Okay, so talk about how it fits into your strategy, specifically what are you gonna do with it? What are the priorities, can you share some roadmap? >> Yeah, so you take a look at HPE strategy. We talk about hybrid cloud and specifically edge to core to cloud and the common theme that runs through that is data, data-driven enterprises. So for us we see BlueData, Epic platform as a way to you know, help our customers quickly deploy these new mode to applications that are fueling their digital transformation. So we have some great plans. We're gonna certainly invest in all the functions, right. So we're gonna do a force multiplier on not only on product engineering and product delivery but also go to market and customer success. We're gonna come out in our business day one with some really good reference architectures, with some of our partners like Cloud Era, H2O, we've got some very scalable building block architectures to marry up the BlueData platform with our Apollo systems for those of you have seen that in the market, we've got our Elastic platform for analytics for customers who run these workloads, now you'd be able to virtualize those in containers and we'll have you know, we're gonna be building out a big services practice in this area. So a lot of customers often talk to us about, we don't have the people to do this, right. So we're gonna bring those people to you as HPE through Point Next, advisory services, implementation, ongoing help with customers. So it's going to be a really fantastic start. >> Apollo, as you mentioned Apollo. I think of Apollo sometimes as HPC high performance computing and we've had a lot of discussion about how that's sort of seeping in to mainstream, is that what you're seeing? >> Yeah absolutely, I mean we know that a lot of our customers have traditional workloads, you know, they're on the path to almost completely virtualizing those, right, but where a lot of the innovation is going on right now is in this mode two world, right. So your big data and analytics pipeline is getting longer, you're introducing new experiences on top of your product and that's fueling you know, essentially commercial HPC and now that folks are using techniques like AI and modeling inference to make those services more scalable, more automated, we're starting to bringing these more of these platforms, these scalable architectures like Apollo. >> So it sounds like your roadmap has a lot of integration plans across the HPE portfolio. We certainly saw that with Nimble, but BlueData was working with a lot of different companies, its software, is the plan to remain open or is this an HPE thing? >> Yeah, we absolutely want to be open. So we know that we have lots of customers that choose, so the HP is all about hybrid cloud, right and that has a couple different implications. We want to talk about your choice of on-prem versus off-prem so BlueData has a great capability to run some of these workloads. It essentially allows you to do separation of compute and storage, right in the world of AI and analytics we can run it off-prem as well in the public cloud but then we also have choice for customers, you know, any customer's private cloud. So that means they want to run on other infrastructure besides HPE, we're gonna support that, we have existing customers that do that. We're also gonna provide infrastructure that marries the software and the hardware together with frameworks like Info Site that we feel will be a you know, much better experience for the customers but we'll absolutely be open and absolutely have choice. >> All right, what about the business impact to take the customer perspective, what can they expect? >> So I think from a customer perspective, we're really just looking to accelerate deployment of AI in the enterprise, right and that has a lot of implications for us. We're gonna have very scalable infrastructure for them, we're gonna be really focused on this very dynamic AI and ML application ecosystems through partnerships and support within the BlueData platform. We want to provide a SAS experience, right. So whether that's GPUs or accelerators as a service, analytics as a service, we really want to fuel innovation as a service. We want to empower those data scientists there, those are they're really hard to find you know, they're really hard to retain within your organization so we want to unlock all that capability and really just we want to focus on innovation of the customers. >> Yeah, and they spend a lot of time wrangling data so you're really going to simplify that with the cloud (mumbles). Patrick thank you, I appreciate it. >> Thank you very much. >> Alright Peter, back to you in Palo Alto. >> And welcome back, I'm Peter Burris and we've been talking a lot in the industry about how new tooling, new processes can achieve new classes of analytics, AI and ML outcomes within a business but if you don't get the people side of that right, you're not going to achieve the full range of benefits that you might get out of your investments. Now to talk a little bit about how important the data science practitioner is in this equation, we've got two great guests with us. Nanda Vijaydev is the chief data scientists of BlueData. Welcome to theCUBE. >> Thank you Peter, happy to be here. >> Ingrid Burton is the CMO and business leader at H2O.AI, Ingrid, welcome to the CUBE. >> Thank you so much for having us. >> So Nanda Vijaydev, let's start with you. Again, having a nice platform, very, very important but how does that turn into making the data science practitioner's life easier so they can deliver more business value. >> Yeah thank you, it's a great question. I think end of the day for a data scientist, what's most important is, did you understand the question that somebody asked you and what is expected of you when you deliver something and then you go about finding, what do I need for them, I need data, I need systems and you know, I need to work with people, the experts in the process to make sure that the hypothesis I'm doing is structured in a nice way where it is testable, it's modular and I have you know, a way for them to go back to show my results and keep doing this in an iterative manner. That's the biggest thing because the satisfaction for a data scientist is when you actually take this and make use of it, put it in production, right. To make this whole thing easier, we definitely need some way of bringing it all together. That's really where, especially compared to the traditional data science where everything was monolithic, it was one system, there was a very set way of doing things but now it is not so you know, with the growing types of data, with the growing types of computation algorithms that's available, there's a lot of opportunity and at the same time there is a lot of uncertainty. So it's really about putting that structure and it's really making sure you get the best of everything and still deliver the results, that is the focus that all data scientists strive for. >> And especially you wanted, the data scientists wants to operate in the world of uncertainty related to the business question and reducing uncertainty and not deal with the underlying some uncertainty associated with the infrastructure. >> Absolutely, absolutely you know, as a data scientist a lot of time used to spend in the past about where is the data, then the question was, what data do you want and give it to you because the data always came in a nice structured, row-column format, it had already lost a lot of context of what we had to look for. So it is really not about you know, getting the you know, it's really not about going back to systems that are pre-built or pre-processed, it's getting access to that real, raw data. It's getting access to the information as it came so you can actually make the best judgment of how to go forward with it. >> So you describe the world with business, technology and data science practitioners are working together but let's face it, there's an enormous amount of change in the industry and quite frankly, a deficit of expertise and I think that requires new types of partnerships, new types of collaboration, a real (mumbles) approach and Ingrid, I want to talk about what H2O.AI is doing as a partner of BlueData, HPE to ensure that you're complementing these skills in pursuit or in service to the customer's objectives. >> Absolutely, thank you for that. So as Nanda described, you know, data scientists want to get to answers and what we do at H2O.AI is we provide the algorithms, the platforms for data scientist to be successful. So when they want to try and solve a problem, they need to work with their business leaders, they need to work with IT and they actually don't want to do all the heavy lifting, they want to solve that problem. So what we do is we do automatic machine learning platforms, we do that with optimizing algorithms and doing all the kind of, a lot of the heavy lifting that novice data scientists need and help expert data scientists as well. I talk about it as algorithms to answers and actually solving business problems with predictions and that's what machine learning is really all about but really what we're seeing in the industry right now and BlueData is a great example of kind of taking away some of the hard stuff away from a data scientist and making them successful. So working with BlueData and HPE, making us together really solve the problems that businesses are looking for, it's really transformative and we've been through like the digital transformation journey, all of us have been through that. We are now what I would term an AI transformation of sorts and businesses are going to the next step. They had their data, they got their data, infrastructure is kind of seamlessly working together, the clusters and containerization that's very important. Now what we're trying to do is get to the answers and using automatic machine learning platforms is probably the best way forward. >> That's still hard stuff but we're trying to get rid of data science practitioners, focusing on hard stuff that doesn't directly deliver value. >> It doesn't deliver anything for them, right. They shouldn't have to worry about the infrastructure, they should worry about getting the answers to the business problems they've been asked to solve. >> So let's talk a little bit about some of the new business problems that are going to be able to be solved by these kinds of partnerships between BlueData and H2O.AI. Start, Nanda, what do you, what gets you excited when we think about the new types of business problems that customers are gonna be able to solve. >> Yeah, I think it is really you know, the question that comes to you is not filtered through someone else's lens, right. Someone is trying an optimization problem, someone is trying to do a new product discovery so all this is based on a combination of both data-driven and evidence-based, right. For us as a data scientist, what excites me is that I have the flexibility now that I can choose the best of the breed technologies. I should not be restricted to what is given to me by an IT organization or something like that but at the same time, in an organization, for things to work, there has to be some level of control. So it is really having this type of environments or having some platforms where some, there is a team that can work on the control aspect but as a data scientist, I don't have to worry about it. I have my flexibility of tools of choice that I can use. At the same time, when you talk about data, security is a big deal in companies and a lot of times data scientists don't get access to data because of the layers and layers of security that they have to go through, right. So the excitement of the opportunity for me is if someone else takes care of the problem you know, just tell me where is the source of data that I can go to, don't filter the data for me you know, don't already structure the data for me but just tell me it's an approved source, right then it gives me more flexibility to actually go and take that information and build. So the having those controls taken care of well before I get into the picture as a data scientist, it makes it extremely easy for us to focus on you know, to her point, focus on the problem, right, focus on accessing the best of the breed technology and you know, give back and have that interaction with the business users on an ongoing basis. >> So especially focus on, so speed to value so that you're not messing around with a bunch of underlying infrastructure, governance remaining in place so that you know what are the appropriate limits of using the data with security that is embedded within that entire model without removing fidelity out of the quality of data. >> Absolutely. >> Would you agree with those? >> I totally agree with all the points that she brought up and we have joint customers in the market today, they're solving very complex problems. We have customers in financial services, joint customers there. We have customers in healthcare that are really trying to solve today's business problems and these are everything from, how do I give new credit to somebody? How do I know what next product to give them? How do I know what customer recommendations can I make next? Why did that customer churn? How do I reach new people? How do I do drug discovery? How do I give a patient a better prescription? How do I pinpoint disease than when I couldn't have seen it before? Now we have all that data that's available and it's very rich and data is a team sport. It takes data scientists, it takes business leaders and it takes IT to make it all work together and together the two companies are really working to solve problems that our customers are facing, working with our customers because they have the intellectual knowledge of what their problems are. We are providing the tools to help them solve those problems. >> Fantastic conversation about what is necessary to ensure that the data science practitioner remains at the center and is the ultimate test of whether or not these systems and these capabilities are working for business. Nanda Vijaydev, chief data scientist of BlueData, Ingrid Burton CMO and business leader, H2O.AI, thank you very much for being on theCUBE. >> Thank you. >> Thank you so much. >> So let's now spend some time talking about how ultimately, all of this comes together and what you're going to do as you participate in the crowd chat. To do that let me throw it back to Dave Velante in our Marlborough studios. >> We're back with Patrick Osbourne, alright Patrick, let's wrap up here and summarize. We heard how you're gonna help data science teams, right. >> Yup, speed, agility, time to value. >> Alright and I know a bunch of folks at BlueData, the engineering team is very, very strong so you picked up a good asset there. >> Yeah, it means amazing technology, the founders have a long lineage of software development and adoption in the market so we're just gonna, we're gonna invested them and let them loose. >> And then we heard they're sort of better together story from you, you got a roadmap, you're making some investments here, as I heard. >> Yeah, I mean so if we're really focused on hybrid cloud and we want to have all these as a services experience, whether it's through Green Lake or providing innovation, AI, GPUs as a service is something that we're gonna be you know, continuing to provide our customers as we move along. >> Okay and then we heard the data science angle and the data science community and the partner angle, that's exciting. >> Yeah, I mean, I think it's two approaches as well too. We have data scientists, right. So we're gonna bring that capability to bear whether it's through the product experience or through a professional services organization and then number two, you know, this is a very dynamic ecosystem from an application standpoint. There's commercial applications, there's certainly open source and we're gonna bring a fully vetted, full stack experience for our customers that they can feel confident in this you know, it's a very dynamic space. >> Excellent, well thank you very much. >> Thank you. Alright, now it's your turn. Go into the crowd chat and start talking. Ask questions, we're gonna have polls, we've got experts in there so let's crouch chat.

Published Date : May 7 2019

SUMMARY :

and give you an opportunity to voice your opinions and to inject that into their DNA, it is a big challenge. on the actual outcomes they seek and provide the infrastructure, provide the capabilities. and leave quickly if they don't get the tooling So the data scientists, they want to build a tool chain that the data scientists don't have to worry and apply it successfully to some and data management to capture the data. but show a path to delivery value in the future that enterprises face and some of the way forward. to help you on your AI journey. and the solutions that they seek into actual systems of the Advisory Board Company which is part of Optum now. What has been BlueData's journey along, to make that happen? and in the process, we actually created Is that the experience that you had. of leaving the data alone but still be able to get into it. and that is also one of the benefits and challenges above the technology and also provide insights to the physician, that you mentioned before upfront, and one of the great advantage of the platform, So talk to us about where you are in the decisions and all that but the the customer is the king. and part of Optum, I want to thank both of you in the future and to do that we've got Dave Velante and general manager of big data and analytics So we heard from Kumar, let's hear from you. and certainly AI, the people that they have are amazing, So a lot of customers often talk to us about, about how that's sort of seeping in to mainstream, and modeling inference to make those services more scalable, its software, is the plan to remain open and storage, right in the world of AI and analytics those are they're really hard to find you know, Yeah, and they spend a lot of time wrangling data of benefits that you might get out of your investments. Ingrid Burton is the CMO and business leader at H2O into making the data science practitioner's life easier and at the same time there is a lot of uncertainty. the data scientists wants to operate in the world of how to go forward with it. and Ingrid, I want to talk about what H2O and businesses are going to the next step. that doesn't directly deliver value. to the business problems they've been asked to solve. of the new business problems that are going to be able and a lot of times data scientists don't get access to data So especially focus on, so speed to value and it takes IT to make it all work together to ensure that the data science practitioner remains To do that let me throw it back to Dave Velante We're back with Patrick Osbourne, Alright and I know a bunch of folks at BlueData, and adoption in the market so we're just gonna, And then we heard they're sort of better together story that we're gonna be you know, continuing and the data science community and then number two, you know, Go into the crowd chat and start talking.

ENTITIES

Entity	Category	Confidence
Peter	PERSON	0.99+
Ramesh Thyagarajan	PERSON	0.99+
Kumar Sreekanti	PERSON	0.99+
Dave Velante	PERSON	0.99+
Peter Burris	PERSON	0.99+
Kumar	PERSON	0.99+
Nanda Vijaydev	PERSON	0.99+
AirBnB	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
BlueData	ORGANIZATION	0.99+
Patrick Osbourne	PERSON	0.99+
Patrick	PERSON	0.99+
Ingrid Burton	PERSON	0.99+
Ramesh	PERSON	0.99+
Anant Chintamaneni	PERSON	0.99+
Spotify	ORGANIZATION	0.99+
Nanda	PERSON	0.99+
HPE	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
two companies	QUANTITY	0.99+
Ingrid	PERSON	0.99+
Anant	PERSON	0.99+
Hewlett Packward Enterprise	ORGANIZATION	0.99+
H2O.AI	ORGANIZATION	0.99+
both	QUANTITY	0.99+
HPs	ORGANIZATION	0.99+
Facebooks	ORGANIZATION	0.99+
Googles	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Intel	ORGANIZATION	0.99+
Marlborough	LOCATION	0.99+
First	QUANTITY	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
one system	QUANTITY	0.99+
today	DATE	0.99+
two approaches	QUANTITY	0.99+
Apollo	ORGANIZATION	0.99+
www.BlueData.com	OTHER	0.99+
HP	ORGANIZATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
six years ago	DATE	0.98+
two things	QUANTITY	0.98+
One	QUANTITY	0.98+

*** UNLISTED Kumar Sreekanti, BlueData | CUBEConversation, May 2018

(upbeat trumpet music) >> From our studios in the heart of Silicon Valley, Palo Alto, California. This is a CUBE Conversation. >> Welcome, everybody, I'm Dave Vellante and we're here in our Palo Alto studios and we're going to talk about big data. For the last ten years, we've seen organizations come to the realization that data can be used to drive competitive advantage and so they dramatically lowered the cost of collecting data. We certainly saw this with Hadoop, but you know what data is plentiful, insights aren't. Infrastructure around big data is very challenging. I'm here with Kumar Sreekanti, co-founder and CEO of BlueData, and a long time friend of mine. Kumar, it's great to see you again. Thanks so much for coming to theCUBE. >> Thank you, Dave, thank you. Good to see you as well. >> We've had a number of conversations over the years, the Hadoop days, on theCUBE, you and I go way back, but I said up front, big data sounded so alluring, but it's very, very complex to get started and we're going to get into that. I want to talk about BlueData. Recently sold to company to HPE, congratulations. >> Thank you, thank you. >> It's fantastic. Go back, why did you start BlueData? >> When I started BlueData, prior to that I was at VMware and I had a great opportunity to be in the driving seat, working with many talented individuals, as well as with many customers and CIOs. I saw while VMware solved the problem of single instance of virtual machines and transform the data center, I see the new wave of distributed systems, vis-a-vis first example of that is Hadoop, were quite rigid. They were running on bare metal and they were not flexible. They were having, customers, a lot of issues, the ones that you just talked about. There's a new stack coming up everyday. They're running on bare metal. I can't run the production and the DevOps on the same systems. Whereas the cloud was making progress so we felt that there is an opportunity to build a Vmware-like platform that focuses on big data applications. This was back in 2013, right. That was the early genesis. We saw that data is here and data is the new oil as many people have said and the organizations have to figure out a way to harness the power of that and they need an invisible infrastructure. They need very innovative platforms. >> You know, it's funny. We see data as even more valuable than oil because you can only once. (Kumar laughs) You can use data many, many times. >> That's a very good one. >> Companies are beginning to realize that and so talk about the journey of big data. You're a product guy. You've built a lot of products, highly technical. You know a lot of people in the valley. You've built great teams. What was the journey like with BlueData? >> You know, a lot of people would like it to be a straight line from the starting to that point. (Dave laughs) It is not, it's fascinating. At the same time, a stressful, up and downs journey, but very fulfilling. A, this is probably one of the best products that I've built in my career. B, it actually solves a real problem to the customers and in the process you actually find a lot of satisfaction not only building a great product. It actually building the value for the customers. Journey has been very good. We were very blessed with extremely good advisors from the right beginning. We were really fortunate to have good investors and I was very, as you said, my knowledge and my familiarity in the valley, I was able to build a good team. Overall, an extremely good journey. It's putting a bow on the top, as you pointed out, the exit, but it's a good journey. There's a lot of nuance I learned in the process. I'm happy to share as we go through. >> Let's double-click on the problem. We talked a little bit about it. You referenced it. Everyday there's a new open source project coming out. There's The Scoop and The Hive and a new open open source database coming out. Practitioners are challenged. They don't have the skillsets. The Ubers and the Facebooks, they could probably figure it out and have the engineers to do it, but the average enterprise may not. Clearly complexity is the problem, but double-click on that and talk a little bit about, from your perspective, what that challenge is. >> That's a very good point. I think when we started the company, we exactly noticed that. There are companies that have the muscle to hire the set of engineers and solve the problem, vertically specific to their application or their use case, but the average, which is Fortune 500 companies, do not have that kind of engineering man power. Then I also call this day two operations. When you actually go back to Vmware or Windows, as soon as you buy the piece of software, next day it's operational and you know how to use it, but with these new stacks, by the time stack is installed, you already have a newer version. It's actually solutions-led meaning that you want to have a solution understanding, but you want to make the infrastructure invisible meaning, I want to create a cluster or I want to funnel the data. I don't want to think about those things. I just wanted to directly worry about what is my solution and I want BlueData to worry about creating me a cluster, automating it. It's automation, automation, automation, orchestration, orchestration, orchestration. >> Okay, so that's the general way in which you solve this problem. Automate, you got to take the humans out of the equation. Talk specifically about the BlueData architecture. What's the secret sauce behind it? >> We were very fortunate to see containers as the new lightweight virtual machines. We have taken an approach. There are certain applications, particularly stateful, need a different handling than cloud-native non-stateful applications so what we said was, in fact our architecture predates Kubernetes, so we built a bottoms-up, pure white-paper architecture that is geared towards big data, AIML applications. Now, actually, even HPC is starting to move into that direction. >> Well, tell me actually, talk a little bit about that in terms of the evolution of the types of workloads that we've seen. You know, it started all out, Hadoop was batch, and then very quickly that changed. Talk about that spectrum. >> It's actually when we started, the highest ask from the customers were Hadoop and batch processing, but everybody knew that was the beginning and with the streaming and the new streaming technologies, it's near realtime analytics and moving to now AIML applications like H2O and Cafe and now I'm seeing the customer's asking and say, I would like to have a single platform that actually runs all these applications to me. The way we built it, going back to your previous question, the architecture is, our goal is for you to be able to create these clusters and not worry about the copying the data, single copy of the data. We built a technology called DataTap which we talked about in the past and that allows you to have a single copy of the data and multiple applications to be able to access that. >> Now, HPC, you mentioned HPC. It used to be, maybe still is, this sort of crazy crowd. (laughter) You know, they do things differently and everybody bandwidth, bandwidth, bandwidth and very high-end performance. How do you see that fitting in? Do you see that going mainstream? >> I'm glad you pointed out because I'm not saying everything is moving over, but I am starting to see, in fact, I was in a conversation this morning with an HPC team and an HPC customer. They are seeing the value of the scale of distributed systems. HPC tend to be scale up and single high bandwidth. They are seeing the value of how can I actually bring these two pieces together? I would say it's in infancy. Don't take me to say, look how long Hadoop take, 10 years so it's probably going to take a longer time, but I can see enterprises thinking of a single unified platform that's probably driven by Kubernetes and have these applications instantiated, orchestrated, and automated on that type. >> Now, how about the cloud? Where does that fit? We often say in theCUBE that it's not Moore's Law anymore. The innovation cocktail is data, all this data that we've collected, applying machine intelligence, and then scaling with the cloud. Obviously cloud is hugely important. It gobbled up the whole Hadoop business, but where do you see it fitting? >> Cloud is a big elephant in the room. We all have to acknowledge. I think it provides significant advantages. I always used to say this, and I may have said this in my previous CUBE interviews, cloud is all about the innovation. The reason cloud got so much traction, is because if you compare the amount of innovation to on-prem, they were at least five years ahead of that. Even the BlueData technology that we brought to the barer, EMR on Amazon was in front of the data, but it was only available Amazon. It's what we call an opinionated stack. That means you are forced to use what they give you as opposed to, I want to bring my own piece of software. We see cloud, as well as on-prem pretty much homogenous. In fact, BlueData software runs both on-prem, on the cloud, in a hybrid fashion. Same software and you can bring your stack on the top of the BlueData. >> Okay, so hybrid was the next piece of it. >> What we see is cloud has, at least from the angle from my exposure, cloud is very useful for certain applications, especially what I'm seeing is, if you are collecting the large amounts of data on the cloud, I would rather run a batch processing and curate the data and bring the very important amount of data back into the on-prem and run some realtime. It's just one example. I see a balance between the two. I also see a lot of organizations still collecting terabits of data on-prem and they're not going to take terabits of data overnight to the cloud. We are seeing all the customers asking, we would like to see a hybrid solution. >> The reason I like the acquisition by HPE because not only is it a company started by a friend and someone that I respect and knows how to build solid technology that can last, but it's software. HPE, as a company, my view needs more software content. (Kumar laughs) Software's eating the world as Marc Andressen says. It would be great to see that software live as an independent entity. I'm sure decisions are still being made, but how do you see that playing out? What are the initial discussions like? What can you share with us? >> That's a very, very, well put there. Currently, the goal from my boss and the teams there is, we want to keep the BlueData software independent. It runs on all x86 hardware platforms and we want to drive the roadmap driven by the customer needs on the software like we want to run more HPC applications. Our roadmap will be driven by the customer needs and the change in the stack on the top, not by necessarily the hardware. >> Well, that fits with HPE's culture of always trying to give optionality and we've had this conversation many, many times with senior-level people like Antonio. It's very important that there's no lock-in, open mindset, and certainly HPE lives up to that. Thanks so much for coming-- >> You're welcome. Back into theCUBE. >> I appreciate you having me here as well. >> Your career has been amazing as we go back a long time. Wow. From hardware, software, all these-- >> Great technologies. (laughter) >> Yeah, solving hard problems and we look forward to tracking your career going forward. >> Thank you, thank you. Thanks so much. >> And thank you for watching, everybody. This is Dave Vellante from our Palo Alto Studios. We'll see ya next time. (upbeat trumpet music)

Published Date : Mar 22 2019

SUMMARY :

in the heart of Silicon Valley, Palo Alto, California. Kumar, it's great to see you again. Good to see you as well. the Hadoop days, on theCUBE, you and I go way back, Go back, why did you start BlueData? and the organizations have to figure out a way because you can only once. and so talk about the journey of big data. and in the process you actually find a lot and have the engineers to do it, There are companies that have the muscle Okay, so that's the general way as the new lightweight virtual machines. in terms of the evolution of the types of workloads in the past and that allows you to have a single copy and very high-end performance. They are seeing the value of the scale Now, how about the cloud? Even the BlueData technology that we brought to the barer, and curate the data and bring the very important amount What are the initial discussions like? and the change in the stack on the top, and certainly HPE lives up to that. You're welcome. Your career has been amazing as we go back a long time. (laughter) and we look forward to tracking your career going forward. Thanks so much. And thank you for watching, everybody.

ENTITIES

Entity	Category	Confidence
Marc Andressen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
2013	DATE	0.99+
Dave	PERSON	0.99+
Kumar	PERSON	0.99+
HPE	ORGANIZATION	0.99+
BlueData	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Kumar Sreekanti	PERSON	0.99+
Ubers	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Facebooks	ORGANIZATION	0.99+
May 2018	DATE	0.99+
Palo Alto	LOCATION	0.99+
two	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
first	QUANTITY	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
single copy	QUANTITY	0.97+
Antonio	PERSON	0.97+
Hadoop	TITLE	0.96+
single platform	QUANTITY	0.96+
Windows	TITLE	0.95+
single	QUANTITY	0.95+
Kubernetes	TITLE	0.94+
Silicon Valley,	LOCATION	0.94+
Palo Alto Studios	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.91+
this morning	DATE	0.9+
Vmware	ORGANIZATION	0.85+
single instance	QUANTITY	0.85+
Palo Alto, California	LOCATION	0.84+
least five years	QUANTITY	0.84+
CUBE	ORGANIZATION	0.83+
double	QUANTITY	0.81+
once	QUANTITY	0.81+
BlueData	OTHER	0.79+
H2O	TITLE	0.79+
two operations	QUANTITY	0.78+
Cafe	TITLE	0.78+
Kubernetes	ORGANIZATION	0.76+
500	QUANTITY	0.76+
HPE	TITLE	0.73+
next day	DATE	0.68+
Moore	PERSON	0.68+
last ten years	DATE	0.64+
DataTap	TITLE	0.64+
The Scoop	TITLE	0.59+
CUBEConversation	EVENT	0.56+
BlueData	TITLE	0.52+
The Hive	TITLE	0.49+
theCUBE	ORGANIZATION	0.42+

Patrick Osborne, HPE | CUBEConversation, November 2018

>> From the SiliconANGLE Media Office in Boston, Massachusets, it's theCUBE. Now, here's your host, Dave Vellante. >> Hi everybody, welcome to this preview of HPE's, Discover Madrid storage news. We're gonna unpack that. My name is Dave Vellante and Hewlett Packard Enterprise has a six-month cadence of shows. They have one in the June timeframe in Las Vegas, and then one in Europe. This year, again, it's in Madrid and you always see them announce products and innovations coinciding with those big user shows. With me here is Patrick Osborne who's the Vice President and General Manager of Big Data and Secondary Storage at HPE. Patrick, great to see you again. >> Great to be here, love theCUBE, thanks for having us. >> Oh, you're very welcome. So let's, let's unpack some of these announcements. You guys, as I said, you're on this six-month cadence. You've got sort of three big themes that you're vectoring into, maybe you could start there. >> Yeah, so within HP Storage and Big Data where, you know, where our point of view is around intelligent storage and intelligent data management and underneath that we've kind of vectored in on three pillars that you talked about. AI driven, so essentially bringing the intelligence, self-managing, self-healing, to all of our storage platforms, and big-data platforms, built for the Cloud, right? We've got a lot of use cases, and user stories, and you've seen from an HPE perspective, Hybrid Cloud, you know, is a big investment we're making in addition to the edge. And the last is delivering all of our capabilities, from product perspective, solutions and services as a service, right? So GreenLake is something that we started a few years ago and being able to provide that type of elastic, you know, purchasing experience for our customers is gonna weave itself in further products and solutions that we announce. >> So I like your strategy around AI. AI of course gets a lot of buzz these days. You guy are taking a practical approach. The Nimble acquisition gave you some capabilities there in predictive maintenance. You've pushed it into your automation capabilities. So let's talk about the hard news specifically around InfoSight. >> Yeah, so InfoSight is an incredible platform and what you see is that we've been not only giving customers richer experiences on top of InfoSight that go further up into the stack so we're providing recommendation engines so we've got this whole concept of Cross-stack Analytics that go from, you know, your app and your virtualization layer through the physical infrastructure. So we've had a number of pieces of that, that we're announcing to give very rich, AI-driven guidance, to customers, you know, to fix specific problems. We're also extending it to more platforms. Right, we just announced last week the ability to run InfoSight on our server platforms, right? So we're starting off on a journey of providing that which we're doing at the storage and networking layer weaving in our server platform. So essentially platforms like ProLiant, Synergy, Apollo, all of our value compute platforms. So we are, we're doing some really cool stuff not only providing the experience on new platforms, but richer experiences certainly around performance bottlenecks on 3PAR so we're getting deeper AI-driven recommendation engines as well as what we call an AI-driven resource planner for Nimble. So if you take a look at it from a tops-down view this isn't AI marketing. We're actually applying these techniques and machine learning within our install base in our fleet which is growing larger as we extend support from our platforms that actually make people's lives easier from a storage administration perspective. >> And that was a big part of the acquisition that IP, that machine intelligence IP. Obviously you had to evaluate that and the complexity of bringing it across the portfolio. You know we live in this API-driven world, Nimble was a very modern platform so that facilitated that injection of that intelligence across the platform and that's what we're seeing now isn't it. >> Yeah, absolutely. You go from essentially tooling up these platforms for this very rich telemetry really delivering a differentiated support experience that takes a lot of the manual interactions and interventions from a human perspective out of it and now we're moving in with these three announcements that we've made into things that are doing predictive analytics, recommendations and automation at the end of the day. So we're really making, trying to make people's lives easier from an admin perspective and giving them time back to work on higher value activities. >> Well let's talk about Cloud. HP doesn't have a public Cloud like an Amazon or an Azure, you partner with those guys, but you have Cloud Volumes, which is Cloud-like, it's actually Cloud from a business model perspective. Explain what Cloud Volumes is and what's the news here? >> Yeah, so, we've got a great service, it's called HPE Cloud Volumes and you'll see throughout the year us extending more user stories and experiences for Hybrid Cloud, right. So we have CloudBank, which focuses on secondary storage, Cloud Volumes is for primary storage users, so it is a Cloud, public Cloud adjacent storage as a service and it allows you to go into the portal, into your credentials. You can enter in your credit card number and essentially get storage as a service as an adjacent, or replacement data service for, for example, EBS from Amazon. So you're able to stand up storage as a service within a co-location facility that we manage and it's completely delivered as a service and then our announcement for that is that, so what we've done in the Americas is you can essentially apply compute instances from the public Cloud to that storage, so it's in a co-location facility it's very close from a latency standpoint to the public Cloud. Now we're gonna be extending that service into Europe, so UK, Ireland, and for the EMEA users as well as now we can also support persistent storage work loads for Docker and Kubernetes and this is a big win for a lot of customers that wanna do continuous improvement, continuous development, and use those containerized frameworks and then you can essentially, you know, integrate with your on-prem storage to your off-prem and then pull in the compute from the Cloud. >> Okay so you got that, write once, run anywhere sort of model. I was gonna ask you well why would I do this instead of EBS, I think you just answered that question. It's because you now can do that anywhere, hybrid is a key theme here, right? >> Yeah, also too from a resiliency perspective, performance, and durability perspective, the service that we provide is, you know, certainly six-nines, very high performant, from a latency perspective. We've been in the enterprise-storage game for quite some time so we feel we've got a really good service just from the technology perspective as well. >> And the European piece, I presume a lot of that is, well of course, GDPR, the fines went into effect in May of 2018. There's a lot of discussion about okay, data can't leave a particular locality, it's especially onerous in Europe, but probably other places as well. So there's a, there's a data locality governance compliance angle here too, is there not? >> Yeah, absolutely, and for us if you take a specific industry like healthcare, you know, for example, so you have to have pretty clear line of sight for your data provenance so it allows us to provide the service in these locations for a healthcare customer, or a healthcare ISV, you know, SAS provider to be able to essentially point to where that data is, you know, and so for us it's gonna be an entrance into that vertical for hybrid Cloud use cases. >> Alright so, so again, we've got the AI-driven piece, the Cloud piece, I see as a service, which is the third piece, I see Cloud as one, and as a service is one-A, it's almost like a feature of Cloud. So let's unpack that a little bit. What are you announcing in as a service and what's your position there? >> Yeah, so our vision is to be able to provide, and as a service experience, for almost everything we have that we provide our customers. Whether it's an individual product, whether it's a solution, or actually like a segment, right? So in the space that I work in, in Big Data and secondary service, secondary storage, backup is a service, for example, right, it's something that customers want, right? They don't want to be able to manage that on their own by piece parts, architect the whole thing, so what we're able to do is provide your primary storage, your secondary storage, your backup ISV, so in this case we're gonna be providing backup as a service through GreenLake with Vim. And then we even can bring in your Cloud capacity, so for example, Azure Blob Storage which will be your tertiary storage, you know, from an archive perspective. So for us it really allows us to provide customers an experience that, you know, is more of an, it's an experienced, Cloud is a destination, we're providing a multi-Cloud, a Hybrid-Cloud experience not only from a technology perspective, but also from a purchasing flex up, flex down, flex out experience and we're gonna keep on doing that over and over for the next, you know, foreseeable future. >> So you've been doing GreenLake for awhile here-- >> Yeah, absolutely. >> So how's that going and what's new here? >> Yeah, so that's been going great. We have well over, I think at this point, 500 petabytes on our management under GreenLake and so the service is, it's interesting when you think about it, when we were designing this we thought, just like the public Cloud, the compute as a service would take off, but from our perspective I think one of the biggest pain points for customers is managing data, you know, storage and Big Data, so storage as a service has grown very rapidly. So these services are very popular and we'll keep on iterating on them to create maximum velocity. One of the other things that's interesting about some of these accounting rules that have taken place, is that customers seed to us the, the ability to do architecture, right, so we're essentially creating no Snowflakes for our customers and they get better outcomes from a business perspective so we help them with the architecture, we help them with planning an architecture of the actual equipment and then they get a very defined business outcome in SLA that they pay for as a service, right? So it's a win-win across the board, is really good. >> Okay, so no Snowflakes as in, not everything's custom-- >> Absolutely. >> And then that, so that lowers not only your cost, it lowers the customer's cost. So let's take an example like that, let's take backup as a service which is part of GreenLake. How does that work if I wanna engage with you on backup as a service? >> Yeah, so we have a team of folks in Pointnext that can engage like very far up in the front end, right, so they say, hey, listen, I know that I need to do a major re-architecture for my secondary storage, HPE, can you help me out? So we provide advisory services, we have well-known architectures that fit a set of well-known mission critical, business critical applications at a typical customer site so we can drive that all the way from the inception of that project to implementation. We can take more customized view, or a road-mapped approach to customers where they want to bite off a little bit at a time and use things like Flex Capacity, and then weave in a full GreenLake implementation so it's very flexible in terms of the way we can implement it. So we can go soup to nuts, or we can get down to a very small granular pieces of infrastructure. >> Just sticking on data protection for a second, I saw a stat the other day, it's a fairly well, you know, popular, often quoted stat, it was Gartner I think, is 50% of customers are gonna change their backup platform by like 2023 or something. And you think about, and by the way, I think that's a legitimate stat and when you talk to customers about why, well things are changing, the Cloud, Multicloud, things like GDPR, Ransomware, digital transformation, I wanna get more out of my data then just insurance, my backup then just insurance, I wanna do analytics. So there's all these other sort of evolving things. I presume your backup as a service is evolving with that? >> Absolutely. >> What are you seeing there? >> Yeah, we're definitely seeing that the secondary storage market is very dynamic in terms of the expectations from customers, are, you know, they're changing, and changing very rapidly. And so not only are providing things like GreenLake and backup as a service we're also seeking new partners in this space so one of the big announcements that we'll make at Discover is we are doing a pretty big amplification of our partnership in an OEM relationship with Cohesity, right, so a lot of customers are looking for a secondary platform from a consolidation standpoint, so being able to run a number of very different disparate workloads from a secondary storage perspective and make them, you know, work. So it's a great platform scale-out. It's gonna run on a number of our HPE platforms, right, so we're gonna be able to provide customers that whole solution from HPE partnering with Cohesity. So, you know, in general this secondary storage market's hot and we're making some bets in our ecosystem right now. >> You also have Big Data in your title so you're responsible for that portfolio. I know Apollo in the HPC world has been at a foothold there. There's a lot of synergies between high-performance computing and Big Data-- >> Absolutely. >> What's going on in the Big Data world? >> Yeah, so Big Data is one of our fastest growing segments within HPE. I'd say Big Data and Analytics and some of the things that are going on with AI, and commercial high-performance applications. So for us we're, we have a new platform that we're announcing, our Gen10 version of Apollo 4200, it's definitely the workhorse of our Apollo server line for applications like, Cloudera, Hortonworks, MapR, we see Apache Spark, Kafka, a number of these as well as some of these newer workloads around HPC, so TensorFlow, Caffe, H2O, and so that platform allows us with a really good compute memory and storage mix, from a footprint perspective, and it certainly scales into rack-level infrastructure. That part of the business for us is growing very quickly. I think a lot of customers are using these Big Data Analytics techniques to transform their business and, you know, as we go along and help them it certainly, it's been a really cool ride to see all this implemented at customer sites. >> You know with all this talk about sort of Big Data and Analytics, and Cloud, and AI, you sort of, you know, get lost, the infrastructure kinda gets lost, but you know, the plumbing still matters, right, and so underneath this. So we saw the flash trend, and that really had a major impact on certainly the storage business specifically, but generally, the overall marketplace, I mean, you really, it'd be hard to support a lot of these emerging workloads without flash and that stack continues to evolve, the pyramid if you will. So you've got flash memory now replacing much of the spinning disk space, you've got DRAM which obviously is the most expensive, highest performance, and there seems to be this layer emerging in the middle, this storage-class memory layer. What are you guys doing there? Is there anything new there? >> Yeah, so we've got a couple things cooking in that space. In general, like when you talk about the infrastructure it is important, right, and we're trying to help customers not only by providing really good product in scalable infrastructure, things like Apollo, you know our system's Nimble 3PAR. We're also trying to provide experience around that too. So, you know, combining things like InfoSight, InfoSight on storage, InfoSight on servers and Apollo for Big Data workloads is something that we're gonna be delivering in the future. The platforms really matter. So we're gonna be introducing NVME and storage class memory into our, what we feel is the industry-leading portfolio for our, for flash storage. So between Nimble and 3PAR we'll have, those platforms will be, and they're NVME ready and we'll be making some product announcements on the availability of that type of medium. So if you think about using it in a platform like 3PAR, right, industry leading from a performance perspective allows to get sub 200 millisecond performance for very mission-critical latency intolerant applications and it's a great architecture. It scales in parallel, active, active, active, right, so you can get quite a bit of performance from a very, a large 3PAR system and we're gonna be introducing NVME into that equation as a part of this announcement. >> So, we see this as critical, for years, in the storage business, you talk about how storage is growing, storage is growing, storage is growing, and we'd show the charts upper to the right, and, but it always like yeah, and somehow you gotta store it, you gotta manage it, you might have to move it, it's a real pain. The whole equation is changing now because of things like flash, things like GPU, storage class memory, NVME, now you're seeing, and of course all this ML and deep learning tech, and now you're seeing things that you're able to do with the data that you've never been able to do before-- >> Absolutely. >> And emerging use cases and so it's not just lots of data, it's completely new use cases and it's driving new demands for infrastructure isn't it? >> Absolutely, I mean, there's some macro economic tailwinds that we had this year, but HP had a phenomenal year this year and we're looking at some pretty good outlooks into next year as well. So, yeah, from our perspective the requirement for customers, for latency improvements, bandwidth improvements, and total addressable capacity improvements is, never stops, right? So it's always going on and it's the data pipeline is getting longer. The amount of services and experiences that you're tying on to, existing applications, keeps on augmenting, right? So for us there's always new capabilities, always new ways that we can improve our products. We use for things like InfoSight, and a lot of the predictive Analytics, we're using those techniques for ourselves to improve our customers experience with our products. So it's been, it's a very, you know, virtual cycle in the industry right now. >> Well Patrick, thanks for coming in to theCube and unpacking these announcements at Discover Madrid. You're doing a great job sort of executing on the storage plan. Every time I see you there's new announcements, new innovations, you guys are hittin' all your marks, so congratulations on that. >> HPE, intelligent storage, intelligent data management, so if you guys have data needs you know where to come to. >> Alright, thanks again Patrick. >> Great, thank you so much. >> Talk to you soon. Alright, thanks for watching everybody. This is Dave Vellante from theCUBE. We'll see ya next time. (upbeat music)

Published Date : Nov 27 2018

SUMMARY :

From the SiliconANGLE Media Office and you always see them announce products and innovations Great to be here, love theCUBE, maybe you could start there. that type of elastic, you know, So let's talk about the hard news and what you see is that we've been not only of that intelligence across the platform that takes a lot of the manual interactions but you have Cloud Volumes, which is Cloud-like, from the public Cloud to that storage, Okay so you got that, write once, run anywhere the service that we provide is, you know, And the European piece, I presume a lot of that is, Yeah, absolutely, and for us if you take What are you announcing in as a service for the next, you know, foreseeable future. and so the service is, How does that work if I wanna engage with you of the way we can implement it. and when you talk to customers about why, and make them, you know, work. I know Apollo in the HPC world has been and so that platform allows us the pyramid if you will. right, so you can get quite a bit of performance in the storage business, you talk about how So it's been, it's a very, you know, virtual cycle new innovations, you guys are hittin' all your marks, so if you guys have data needs Talk to you soon.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Madrid	LOCATION	0.99+
Patrick Osborne	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Ireland	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
six-month	QUANTITY	0.99+
50%	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
May of 2018	DATE	0.99+
Americas	LOCATION	0.99+
November 2018	DATE	0.99+
UK	LOCATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
next year	DATE	0.99+
Discover	ORGANIZATION	0.99+
Apollo	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
last week	DATE	0.99+
500 petabytes	QUANTITY	0.99+
third piece	QUANTITY	0.99+
this year	DATE	0.99+
This year	DATE	0.99+
EBS	ORGANIZATION	0.99+
three announcements	QUANTITY	0.98+
Discover Madrid	ORGANIZATION	0.98+
June	DATE	0.98+
Cohesity	ORGANIZATION	0.98+
InfoSight	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
GDPR	TITLE	0.97+
Big Data	ORGANIZATION	0.97+
SAS	ORGANIZATION	0.96+
Kafka	TITLE	0.96+
Cloud	TITLE	0.96+
One	QUANTITY	0.95+
Synergy	ORGANIZATION	0.95+
SiliconANGLE Media Office	ORGANIZATION	0.95+
Cloud Volumes	TITLE	0.94+
few years ago	DATE	0.93+
Massachusets	LOCATION	0.93+
EMEA	ORGANIZATION	0.91+
Apache	ORGANIZATION	0.91+
GreenLake	ORGANIZATION	0.91+
Vim	ORGANIZATION	0.85+
six-nines	QUANTITY	0.84+
Pointnext	ORGANIZATION	0.83+
GreenLake	TITLE	0.83+
MapR	TITLE	0.82+
three	QUANTITY	0.79+
ProLiant	ORGANIZATION	0.79+
theCUBE	ORGANIZATION	0.79+

Ken King & Sumit Gupta, IBM | IBM Think 2018

>> Narrator: Live from Las Vegas, it's the Cube, covering IBM Think 2018, brought to you by IBM. >> We're back at IBM Think 2018. You're watching the Cube, the leader in live tech coverage. My name is Dave Vellante and I'm here with my co-host, Peter Burris. Ken King is here; he's the general manager of OpenPOWER from IBM, and Sumit Gupta, PhD, who is the VP, HPC, AI, ML for IBM Cognitive. Gentleman, welcome to the Cube >> Sumit: Thank you. >> Thank you for having us. >> So, really, guys, a pleasure. We had dinner last night, talked about Picciano who runs the OpenPOWER business, appreciate you guys comin' on, but, I got to ask you, Sumit, I'll start with you. OpenPOWER, Cognitive systems, a lot of people say, "Well, that's just the power system. "This is the old AIX business, it's just renaming it. "It's a branding thing.", what do you say? >> I think we had a fundamental strategy shift where we realized that AI was going to be the dominant workload moving into the future, and the systems that have been designed today or in the past are not the right systems for the AI future. So, we also believe that it's not just about silicon and even a single server. It's about the software, it's about thinking at the react level and the data center level. So, fundamentally, Cognitive Systems is about co-designing hardware and software with an open ecosystem of partners who are innovating to maximize the data and AI support at a react level. >> Somebody was talkin' to Steve Mills, probably about 10 years ago, and he said, "Listen, if you're going to compete with Intel, "you can copy them, that's not what we're going to do." You know, he didn't like the spark strategy. "We have a better strategy.", is what he said, and "Oh, strategies, we're going to open it up, "we're going to try to get 10% of the market. "You know, we'll see if we can get there.", but, Ken, I wonder if you could sort of talk about, just from a high level, the strategy and maybe go into the segments. >> Yeah, absolutely, so, yeah, you're absolutely right on the strategy. You know, we have completely opened up the architecture. Our focus on growth is around having an ecosystem and an open architecture so everybody can innovate on top of it effectively and everybody in the ecosystem can profit from it and gains good margins. So, that's the strategy, that's how we design the OpenPOWER ecosystem, but, you know, our segments, our core segments, AIX in Unix is still a core, very big core segment of ours. Unix itself is flat to declining, but AIX is continuing to take share in that segment through all the new innovations we're delivering. The other segments are all growth segments, high growth segments, whether it's SAP HANA, our cognitive infrastructure in modern day to platform, or even what we're doing in the HyperScale data centers. Those are all significant growth opportunities for us, and those are all Linux based, and, so, that is really where a lot of the OpenPOWER initiatives are driving growth for us and leveraging the fact that, through that ecosystem, we're getting a lot of incremental innovation that's occurring and it's delivering competitive differentiation for our platform. I say for our platform, but that doesn't mean just for IBM, but for all the ecosystem partners as well, and a lot of that was on display on Monday when we had our OpenPOWER summit. >> So, to talk about more about the OpenPOWER summit, what was that all about, who was there? Give us some stats on OpenPOWER and ecosystem. >> Yeah, absolutely. So, it was a good day, we're up to well over 300 members. We have over 50 different systems that are coming out in the market from IBM or our partners. Over 20 different manufacturers out there actually developing OpenPOWER systems. A lot of announcements or a lot of statements that were made at the summit that we thought were extremely valuable, first of all, we got the number one server vendor in Europe, Atos, designing and developing P9, the number on in Japan, Hitachi, the number one in China, Inspur. We got top ODMs like Super Micro, Wistron, and others that are also developing their power nine. We have a lot of different component providers on the new PCIe gen four, on the open cabinet capabilities, a lot of announcements made by a number of component partners and accelerator partners at the summit as well. The other thing I'm excited about is we have over 70 ISVs now on the platform, and a number of statements were made and announcements on Monday from people like MapD, Anaconda, H2O, Conetica and others who are leveraging those innovations bought on the platform like NVLink and the coherency between GPU and CPU to do accelerated analytics and accelerated GPU database kind of capabilities, but the thing that had me the most excited on Monday were the end users. I've always said, and the analysts always ask me the questions of when are you going to start penetration in the market? When are you going to show that you've got a lot of end users deploying this? And there were a lot of statements by a lot of big players on Monday. Google was on stage and publicly said the IO was amazing, the memory bandwidth is amazing. We are deploying Zaius, which is the power nine server, in our data centers and we're ready for scale, and it's now Google strong which is basically saying that this thing is hardened and ready for production, but we also (laughs) had a number of other significant ones, Tencent talkin' about deploying OpenPOWER, 30% better efficiency, 30% less server resources required, the cloud armor of Alibaba talkin' about how they're putting on their on their X-Dragon, they have it in a piler program, they're asking everybody to use it now so they can figure out how do they go into production. PayPal made statements about how they're using it, but the machine learning and deep learning to do fraud detection, and we even had Limelight, who is not as big a name, but >> CDN, yeah. >> They're a CDN tool provider to people like Netflix and others. We're talkin' about the great capability with the IO and the ability to reduce the buffering and improve the streaming for all these CDN providers out there. So, we were really excited about all those end users and all the things they're saying. That demonstrates the power of this ecosystem. >> Alright, so just to comment on the architecture and then, I want to get into the Cognitive piece. I mean, you guys did, years ago, little Indians, recognizing you got to get software based to be compatible. You mentioned, Ken, bandwidth, IO bandwidth, CAPI stuff that you've done. So, there's a lot of incentives, especially for the big hyperscale guys, to be able to do more with less, but, to me, let's get into the AI, the Cognitive piece. Bob Picciano comes over from running a $15 billion analytics business, so, obviously, he's got some knowledge. He's bringin' in people like you with all these cool buzzwords in your title. So, talk a little bit about infrastructure for AI and why power is the right platform. >> Sure, so, I think we all recognize that the performance advantages and even power advantages that we were getting from Dennard scaling, also known as Moore's law, is over, right. So, people talk about the end of Moore's Law, and that's really the end of gaining processor performance with Dennard scaling and the Moore's Law. What we believe is that to continue to meet the performance needs of all of these new AI and data workloads, you need accelerators, and not just computer accelerators, you actually need accelerated networking. You need accelerated storage, you need high-density memory sitting very close to the compute power, and, if you really think about it, what's happened is, again, system view, right, we're not silicon view, we're looking at the system. The minute you start looking at the silicon you realize you want to get the data to where the computer is, or the computer where the data is. So, it all becomes about creating bigger pipelines, factor of pipelines, to move data around to get to the right compute piece. For example, we put much more emphasis on a much faster memory system to make sure we are getting data from the system memory to the CPU. >> Coherently. >> Coherently, that's the main memory. We put interfaces on power nine including NVLink, OpenCAPI, and PCIe gen four, and that enabled us to get that data either from the network to the system memory, or out back to the network, or to storage, or to accelerators like GPUs. We built and embedded these high-speed interconnects into power nine, into the processor. Nvidia put NVLink into their GPU, and we've been working with marketers like Xilinx and Mellanox on getting OpenCAPI onto their components. >> And we're seeing up to 10x for both memory bandwidth and IO over x86 which is significant. You should talk about how we're seeing up to 4x improvement in training of MLDL algorithms over x86 which is dramatic in how quickly you can get from data to insight, right? You could take training and turn it from weeks to days, or days to hours, or even hours to minutes, and that makes a huge difference in what you can do in any industry as far as getting insight out of your data which is the competitive differentiator in today's environment. >> Let's talk about this notion of architecture, or systems especially. The basic platform for how we've been building systems has been relatively consistent for a long time. The basic approach to how we think about building systems has been relatively consistent. You start with the database manager, you run it on an Intel processor, you build your application, you scale it up based on SMP needs. There's been some variations; we're going into clustering, because we do some other things, but you guys are talking about something fundamentally different, and flash memory, the ability to do flash storage, which dramatically changes the relationship between the processor and the data, means that we're not going to see all of the organization of the workloads around the server, see how much we can do in it. It's really going to be much more of a balanced approach. How is power going to provide that more balanced systems approach across as we distribute data, as we distribute processing, as we create a cloud experience that isn't in one place, but is in more places. >> Well, this ties exactly to the point I made around it's not just accelerated compute, which we've all talked about a lot over the years, it's also about accelerated storage, accelerated networking, and accelerated memories, right. This is really, the point being, that the compute, if you don't have a fast pipeline into the processor from all of this wonderful storage and flash technology, there's going to be a choke point in the network, or they'll be a choke point once the data gets to the server, you're choked then. So, a lot of our focus has been, first of all, partnering with a company like Mellanox which builds extremely high bandwidth, high-speed >> And EOF. >> Right, right, and I'm using one as an example right. >> Sure. >> I'm using one as an example and that's where the large partnerships, we have like 300 partnerships, as Ken talked about in the OpenPOWER foundation. Those partnerships is because we brought together all of these technology providers. We believe that no one company can own the agenda of technology. No one company can invest enough to continue to give us the performance we need to meet the needs of the AI workloads, and that's why we want to partner with all these technology vendors who've all invested billions of dollars to provide the best systems and software for AI and data. >> But fundamentally, >> It's the whole construct of data centric systems, right? >> Right. >> I mean, sometimes you got to process the data in the network, right? Sometimes you got to process the data in the storage. It's not just at the CPU, the GPUs a huge place for processing that data. >> Sure. >> How do you do that all coherently and how do things work together in a system environment is crucial versus a vertically integrated capability where the CPU provider continues to put more and more into the processor and disenfranchise the rest of the ecosystem. >> Well, that was the counter building strategies that we want to talk about. You have Intel who wants to put as much on the die as possible. It's worked quite well for Intel over the years. You had to take a different strategy. If you tried to take Intel on with that strategy, you would have failed. So, talk about the different philosophies, but really I'm interested in what it means for things like alternative processing and your relationship in your ecosystem. >> This is not about company strategies, right. I mean, Intel is a semiconductor company and they think like a semiconductor company. We're a systems and software company, we think like that, but this is not about company strategy. This is about what the market needs, what client workloads need, and if you start there, you start with a data centric strategy. You start with data centric systems. You think about moving data around and making sure there is heritage in this computer, there is accelerated computer, you have very fast networks. So, we just built the US's fastest supercomputer. We're currently building the US's fastest supercomputer which is the project name is Coral, but there are two supercomputers, one at Oak Ridge National Labs and one at Lawrence Livermore. These are the ultimate HPC and AI machines, right. Its computer's a very important part of them, but networking and storage is just as important. The file system is just as important. The cluster management software is just as important, right, because if you are serving data scientists and a biologist, they don't want to deal with, "How many servers do I need to launch this job on? "How do I manage the jobs, how do I manage the server?" You want them to just scale, right. So, we do a lot of work on our scalability. We do a lot of work in using Apache Spark to enable cluster virtualization and user virtualization. >> Well, if we think about, I don't like the term data gravity, it's wrong a lot of different perspectives, but if we think about it, you guys are trying to build systems in a world that's centered on data, as opposed to a world that's centered on the server. >> That's exactly right. >> That's right. >> You got that, right? >> That's exactly right. >> Yeah, absolutely. >> Alright, you guys got to go, we got to wrap, but I just want to close with, I mean, always says infrastructure matters. You got Z growing, you got power growing, you got storage growing, it's given a good tailwind to IBM, so, guys, great work. Congratulations, got a lot more to do, I know, but thanks for >> It's going to be a fun year. comin' on the Cube, appreciate it. >> Thank you very much. >> Thank you. >> Appreciate you having us. >> Alright, keep it right there, everybody. We'll be back with our next guest. You're watching the Cube live from IBM Think 2018. We'll be right back. (techno beat)

Published Date : Mar 21 2018

SUMMARY :

covering IBM Think 2018, brought to you by IBM. Ken King is here; he's the general manager "This is the old AIX business, it's just renaming it. and the systems that have been designed today or in the past You know, he didn't like the spark strategy. So, that's the strategy, that's how we design So, to talk about more about the OpenPOWER summit, the questions of when are you going to and the ability to reduce the buffering the big hyperscale guys, to be able to do more with less, from the system memory to the CPU. Coherently, that's the main memory. and that makes a huge difference in what you can do and flash memory, the ability to do flash storage, This is really, the point being, that the compute, Right, right, and I'm using one as an example the large partnerships, we have like 300 partnerships, It's not just at the CPU, the GPUs and disenfranchise the rest of the ecosystem. So, talk about the different philosophies, "How do I manage the jobs, how do I manage the server?" but if we think about it, you guys are trying You got Z growing, you got power growing, comin' on the Cube, appreciate it. We'll be back with our next guest.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ken King	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Steve Mills	PERSON	0.99+
Ken	PERSON	0.99+
Sumit	PERSON	0.99+
Bob Picciano	PERSON	0.99+
China	LOCATION	0.99+
Monday	DATE	0.99+
Europe	LOCATION	0.99+
Mellanox	ORGANIZATION	0.99+
PayPal	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Alibaba	ORGANIZATION	0.99+
Japan	LOCATION	0.99+
Sumit Gupta	PERSON	0.99+
OpenPOWER	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
$15 billion	QUANTITY	0.99+
one	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
Hitachi	ORGANIZATION	0.99+
Conetica	ORGANIZATION	0.99+
Xilinx	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
OpenPOWER	EVENT	0.99+
Google	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Atos	ORGANIZATION	0.99+
Picciano	PERSON	0.99+
300 partnerships	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
Anaconda	ORGANIZATION	0.99+
Inspur	ORGANIZATION	0.98+
two supercomputers	QUANTITY	0.98+
Linux	TITLE	0.98+
Moore's Law	TITLE	0.98+
over 300 members	QUANTITY	0.98+
US	LOCATION	0.98+
SAP HANA	TITLE	0.97+
AIX	ORGANIZATION	0.97+
over 50 different systems	QUANTITY	0.97+
Wistron	ORGANIZATION	0.97+
both	QUANTITY	0.97+
Limelight	ORGANIZATION	0.97+
H2O	ORGANIZATION	0.97+
Unix	TITLE	0.97+
over 70 ISVs	QUANTITY	0.97+
Over 20 different manufacturers	QUANTITY	0.97+
billions of dollars	QUANTITY	0.96+
MapD	ORGANIZATION	0.96+
Dennard	ORGANIZATION	0.95+
OpenCAPI	TITLE	0.95+
Moore's law	TITLE	0.95+
today	DATE	0.95+
single server	QUANTITY	0.94+
Lawrence	LOCATION	0.93+
Oak Ridge National Labs	ORGANIZATION	0.93+
IBM Cognitive	ORGANIZATION	0.93+
Tencent	ORGANIZATION	0.93+
nine	QUANTITY	0.92+
one place	QUANTITY	0.91+
up to 10x	QUANTITY	0.9+
X-Dragon	COMMERCIAL_ITEM	0.9+
30% less	QUANTITY	0.9+
P9	COMMERCIAL_ITEM	0.89+
last night	DATE	0.88+
Coral	ORGANIZATION	0.88+
AIX	TITLE	0.87+
Cognitive Systems	ORGANIZATION	0.86+

Joel Horwitz, IBM | IBM Think 2018

>> Narrator: Live from Las Vegas, it's theCUBE! Covering IBM Think 2018. Brought to you by IBM. >> Hello everyone and welcome back to theCUBE's exclusive, three days of coverage here at IBM Think 2018. I'm John Furrier co-host with Dave Vellante, hosting three days and next is Joel Horowitz, Vice President Strategic Partnerships and Offering, of The Digital Business Group. >> Thanks. >> Welcome back to theCUBE. Good to see you. >> Good to see you guys. Thanks for having me here. >> Thanks for coming on. >> You've been on theCUBE, probably so many times, talking big data, talking analytics, now, in your new role in The Digital Group, the digital transformation. I really want to just ask you right off the bat about your new role, and how it relates to the changing ecosystem. >> Joel: Yeah. >> All of these markets are changing big time, the role of the ecosystem, the leverage that they have with technology and the value propositions, whether it's decentralized applications in Blockchain to storage and infrastructure, and big data. What is your role, take a minute to explain what you're doing, because you have a unique position, because this demand for partnerships, this demand for collaboration at many levels. What's the latest? >> So I would describe my role as being a champion of our partners, for sure. I look at, you know, I take, a very outside in perspective on IBM. Joining just over three years ago now, I came in, really through analytics, as you know, focused on machine learning, data science, and the growth of A.I. at that time. Last year I was part of the corporate development team over there. So looking, really, at a lot of the industry trends and what's going on, as well, in analytics, data, and A.I. This year, you know, we recognize that we're only going to do so many strategic partnerships a year, right, where there's probably a handful that we're going to work with. For example, last year we did a great partnership with Lightbend to bring their reactive platform to IBM, and we launched the iPhone 10, with Verizon on Lightbend's platform. But, these days, my team, can't be everywhere, obviously, and part of the value of digital, and that route to market is really the idea that partner should be able to self service. So, you know, my job this year, is frankly to put myself out of a job, right. Meaning, if I can get, you know, 70% of the work my team does, right, contracting, legal, setting up, provisioning, all of that on our cloud, and partners can just do that themselves. Then we'll capture a much larger swath of the emerging A.I., data, and cloud market. >> I want to talk about the killer app creating value and then the role the market place is playing. You mentioned self service. I want to kind of go down that. Before we get there, I want to get your thoughts on this because I noticed, in your role you're covering, it's cutting across a lot of different things, and you know we've been talking about cloud, as a horizontally disrupting technology, >> Joel: Yeah. >> Certainly in the data space you saw that. And stacks will be horizontally scalable with the cloud. >> Yeah. >> But you could be vertical specialization in the applications. So I noticed you're covering analytics, Watson, Cloud, hybrid cloud, emerging technologies. >> Yeah. >> Blockchain, and many others. >> Yeah. >> So talk about, it's obvious you guys are now cutting across, horizontally, across the different IBM divisions. Is that by design? >> Yeah. >> What's the impact of the ecosystem and partners for that horizontally cut over? >> Yeah, I know, I mean it's a great question, I think. Look, there are some specific design patterns that we see across every technology, across every, you know, business at IBM. One design pattern is pretty obvious, you saw it with the launch of the IBM cloud private data. Following up on last years IBM Cloud Private. And that design pattern is really about people containerizing applications. And so, at the end of the week, we have the business partner, or PartnerWorld Leadership conference. Excuse me. Where a number of our partners really are looking at how do I bring that work load to the cloud. And it's not so much the cloud is the end point. That's really the starting off point to A; Get much wider distribution and B; Be able to take advantage of a lot of these emerging technologies, like Blockchain, like A.I. Like IOT, and numerous others, Quantum, et cetera, they'll just keep coming. So really cloud to me is just a way for us to open the door to a lot of the technology that's flooding the market. >> Dave: Joel, can you talk about partnership, you mentioned before that you guys are kind of selective, John calls them Barney deals, ya know. I love you, you love me. You guys sound like you don't look for those, not volume, it's quality. >> Yeah. >> What are the criteria that you're looking for? How do you get value out of those? How do you measure that value out of the partnerships? If someone is a prospective partner out there, how should I be interacting with you? >> Yeah, I think, there's probably two steps. I think one is really recognizing that, in my own personal view, is that we really want to partner with folks who embrace open standards. Now I'm not going to like go as far to say open source, 'cause I think there is a lot that goes into that. But I will say open standards, meaning, not these like large monolithic applications, but can you actually integrate with us in some meaningful way? And to do that, that's why we actually started on this new platform that we are launching today. Called IBM Partner Self-service. Is the ability to first integrate with IBM. So, if you can demonstrate that you can build with IBM first, whether that's a startup, an ISV, a business partner. Like that's criteria number one. Criteria number two is are you a trusted partner? So, do you actually have the same level of competency that we would expect from, frankly, our own sellers, and our own people. And so, to do that, we've also launched new competency paths for business partners and partners as well. So, those are the two major criterias. And then the third one, which I think is kind of the holy grail, is selling with IBM. So we also launched a sell with path today where you can actually list in our marketplace. And then we will actually help you reach new markets. And then demonstrate there's clients, there's a client need that really wants our joint solution, right? And so, to me, those are the three things, to re-state. Like, you know, building with us, having a level of competency with us, and then demonstrating client success with us. >> Okay, so, integrate, you really don't need you guys to do that. I can just dive in and do that. Bake it out a little bit, and then approach you. What kind of help do you give? Do you have programs once you get by those gates? >> So, you know, I would categorize into two groups, I think we have a ton of online support. So, you know, we even embrace Slack at IBM. If you're not aware of that, we have Slack everywhere. And, so, for a self service, I want to say, look, what does zero touch mean, right, in this day and age, for a partner. And so, they can go to our site today, and actually get, you know, sign up for Slack, and talk directly to our technical specialist as well as to our developer advocates. And so, on the enablement and integration side, my colleague, Angel Diaz and team, have done a great job of launching hundreds of IBM code patterns. So that you can just pick these artifacts up, these assets up, and leverage them to integrate all sorts of capabilities into your product. >> You know, Dave, I want to get your thoughts on this, because you and I have been talking about the API integration, and I want to get back to Joel's point in a second because I think this is critical for startups and ecosystem partners. API's are the (speaking quickly) for developers right now, so if I don't want to take a big chance on being all in on IBM, say I want to kick the tires, API's are critical. So the question is, are you seeing that traction on your side of the house, in terms of the end now, since the level of API integration, is that the touchpoint? Is it like the beginning phases? And what level of commitment that you're seeing with people. >> Well, John, to me it comes down to innovation, and it's interesting because Joel came out of the data world. To me, the innovation in the next 10 years starts with data. The second component of that innovation, I think, over the next decade or so is going to be, really, A.I., whether you call it cognitive or machine intelligence or artificial intelligence. And then third, I think is cloud economics and that's really where the API economy fits in. You got to have API'S to integrate, as Joel was saying. You've got to have marginal... You've got to have scale, marginal costs go to zero eventually. You've got to have network effects and you've got to be able to track startups, which is another question I have. >> Now Joel, back to you, on the start on the integration, whether it's a startup or a big company. It used to be, the old days, you got to go all in. You've got to get the developer kit, >> Joel: Yeah. >> Download it, line it to a swim lane, get deeper, prove your value. >> Yeah. >> Find the value's faster; what's the first hurdle if someone wants, hey I want to give IBM a shot here? Love the sell, holy grail option, is it API'S, can people integration on their own? Talk about that specific first step because some people might open up the door and go whoa! There's more here than I thought. Or, wow, there's some real tech. Or, I don't want to use IBM tech, I want to use some of mine. There's that first indifference point. >> Yeah, I think there are areas where we've seen dramatic customer experience improvements. So to give one example, as we've partnered with Ubisoft, Redstorm last year around a new title game that they released called Star Trek Bridge Crew, and so, you know, to me, we went on our own merit, and I think that publisher chose IBM because Watson Conversation is absolutely the best on the market. And so what that did is it enabled game players, their end customer, their end user, to speak into a VR headset and just give commands, as you would naturally. And so, I think a lot of, as you think about IBM, it's, yeah, we've made it completely easy to access our API'S. I think, there's a great quote from the founder of Flickr that I read years ago, I'll go dig it up for you guys later, but it was along the lines of business development means, today is exposing your API'S, like, that's it! And, on the other side of it, we give a lot away in terms of cloud credits, right, and so, today, if you go and sign up on our self service platform, we'll give you $10,000 a month in free cloud credits to build and build quickly. Because, at the end of the day, if it's not self service, if it requires more heavy lifting, then, frankly, we're not doing our jobs. And so that's my commitment, is to make sure that is available, is accessible, and there's experts there that can help you on your journey. >> So that attracts startups, obviously, 10K a month is a honey pot for those guys. What about existing IBM clients that want to get to the cloud. Migrate to the cloud. How do you help those guys? >> Yeah, so, in the migration front, we have a great team in place with IBM services, who basically have set up a migration factor, if you will, and there are numerous ways to chart your course to the cloud. Whether it's, you know, full cloud or hybrid cloud, or some offloading, some aspects to the cloud. There's a lot of different paths you can take and so to do that, we're offering $50,000 in migration credits for the first couple months. We're also offering 35% off for professional services. So, we have a great offer going on over the next few months to help people make that first step. >> Incentives are key. >> And, look, we're here with you so it's not like, here, throwing it over the fence, and good luck! You know, tweet at me, instant message me, I'm around. And I will be absolutely committed to partner success. >> Yeah, you know, incentives are critical, that's going to get the market going. But, the end of the day, it's the type of value, and I want to get your thoughts, it's something that's come up that I've heard people talk about in the hallways and other conferences. They kind of chirp about "Hey, you know, "I'd like to get this, from suppliers. "I want to see more tools, more programs "to help me get more customers, to get more value. "I'm building apps, but also got a business to run." What are some of the conversations you've had over the past year with customers and partners? Stack rank the top three or four things that they talk about, either their pain points or things that are on their mind, that's worth noting? >> I mean, I would say first and foremost, I mean, me, myself, being in a startup at H2O. Three, four years ago. We used to walk in there and sell into the data scientists, right, so if you don't know H2O, they're a great company, a machine learning company, but we would get the data scientists really excited about working with our product, and then lo and behold, we'd get to the CIO office saying, "Hey, what is this stuff? "Get it out of here." You know, Hadoop was the same way, by the way, 2010 working at AVG, like, we'd bring in Hadoop. Like what is this data like thing? There's no governance, it's a mess. Where they could really, you know, work with IBM, where they see value from IBM is when we go into the CIO office together and say, look, we've demonstrated that there's value here. We've demonstrated that there's actual customer need. We can create a lot of help in terms of getting the rest of the organization bought in. Put in the right governance around it. Because, look, I mean GDPR is real, it's a big deal. Like, data privacy, is huge. So, you know, Rob Thomas likes to say, "You can't have good A.I. without I.A." I think that's a great information architecture. So, I agree, and so I think that's what the number one benefit is. Really get in there, move quickly, demonstrate value, and then when you're ready to make that next step of how you roll that out to the rest of the enterprise, that's when IBM becomes a huge help. >> You know, you mentioned GDPR. With regulatory issues now are becoming criteria for a lot of application developers that are small that may not have the resources to handle the right to get your name out of a database or other tools, and other regulations, certainly. Decentralize applications with Blockchain, another regulatory challenge-- >> Yep. >> Opportunity as well. Are you guys having those kinds of conversations, like putting specific things in place beyond GDPR, and if so what regulatory and legal things do you see out there that could be blockers for customers, that you guys hope to go after? >> I mean, I don't think there's a one word answer here. I do think that you take it on a case by case basis. I think you're seeing different countries adopt GDPR differently. Germany, obviously, being a very strict kind of country in doing that. So, you know, IBM services, as well as our analytics team, are really focused on that. I think, like I said, what you saw with ICP data coming out this week, I think that's a really important way to look at it. My own personal view, I think, for sure there's a lot of compliance, They have to look at, and understand the workflow, workflows of how people are using that data, as well as application architecture is big. And those are all the considerations, I think, that you are going to see as people move. I read a statistic that 40% of all CSP'S, MSP'S, are moving, are growing, like it's 40% growth from IBC, 50% of all developers are now embedding A.I. So, this market is growing and growing fast. But, you're right. If folks out there aren't really taking GDPR seriously, you can get yourself into some hot water. >> Well, we've observed that scale matters, certainly, whether it's a partner or cloud, that gets, that helps people. >> Yeah. >> Joel, well, thanks for coming onto theCUBE, we really appreciate it. >> Yeah, my pleasure. >> Before we end, I want to get your thoughts, just share with the folks that are watching. What kind of deals do you want to do? What's on your radar? What's the priorities for you? From a strategic business development standpoint. To develop across that horizontally scalable, IBM division space, as well as technology space? >> You know, it's not what deals I want to do, it's really what deals our partners want to do. >> Come on, your in charge, come on. >> It's really what deals our partners want to do, ya know. I mean, look, I get excited about transforming industries, I really do, so I look at, not what's the transactional partnership, like go, we'll do something, and there's some revenue, or something. I look at how do we transform an industry? >> Let me rephrase the question. What's on the priority list for you guys, from a transformational area, that's important for your partners. >> Yeah, I would say for sure, obviously, A.I. is huge. Obviously data is huge, obviously cloud is huge. But, looking really specific, I think you just add tech after each industry. So Addtech, Fintech, Healthtech obviously. Game tech and, I think, probably the last one, to me personally, is the most exciting. We signed an amazing deal with Unity at the end of last year, the start of this year. In fact GDC game developer conference is going on as we speak in San Francisco. So half my team right now is over there, demonstrating Watson as like VR, AR, and it's not just for games, right. It's like with BMW and VW doing some cool stuff there as well. So, I'm really excited about the, AR, VR, industry growing, especially with our partner Unity. >> There's a new creative out there-- >> Can I jump in before you exit? I want to ask you a follow up on that, because if transformation is sort of the target for your partnerships. Healthcare is an area that should be transformed. But, needs to be transformed, but it's hard to transform healthcare. >> Joel: It is, yeah. >> Do you feel like you could start moving the needle from a partnership perspective? Or is that going to take some more time? >> You know, I think there's a lot of great work being done there. I do believe... Look, in general, I think we can move a lot faster with partners, in fact, I like to call it like the Nordstrom model. Right? Like IBM in the past has been Barney's of New York, forever, right? From a branding and from how we partner with folks, like I think we need to move more to a Nordstrom, like, yeah, we'll sell our own offerings off the rack, but then we need to help partners come in and create the right styles for the right need and the right industry. >> Yeah and then there's a Nordstrom Rack you're going to need to put that on. (laughing) Over technology goes the Nordstrom Rack. Joel Horowitz, thanks for coming out. Vice President Strategic Partnerships and Offerings, here on theCUBE. I'm John Furrier with Dave Vellante, with three days of IBM Think live streaming, all of the videos will be up on thecube.net sports live now. Youtube.com/siliconangle for all the ondemands when the show's over. We'll be right back with more after this short break. (light techno music)

Published Date : Mar 19 2018

SUMMARY :

Brought to you by IBM. back to theCUBE's exclusive, Good to see you. Good to see you guys. and how it relates to the role of the ecosystem, and that route to market and you know we've been Certainly in the in the applications. So talk about, it's obvious you guys And so, at the end of the week, You guys sound like you Is the ability to first What kind of help do you give? So that you can just is that the touchpoint? came out of the data world. the start on the integration, Download it, line it to a swim lane, Find the value's faster; and so, you know, to me, How do you help those guys? and so to do that, with you so it's not like, They kind of chirp about "Hey, you know, of how you roll that out to that may not have the resources to handle for customers, that you I do think that you take that gets, that helps people. we really appreciate it. What kind of deals do you want to do? our partners want to do. I look at how do we transform an industry? What's on the priority list for you guys, I think you just add I want to ask you a follow up on that, and create the right all of the videos will be up

ENTITIES

Entity	Category	Confidence
Joel	PERSON	0.99+
Samsung	ORGANIZATION	0.99+
Huawei	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
ZTE	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Joel Horowitz	PERSON	0.99+
Lightbend	ORGANIZATION	0.99+
Jon Troyer	PERSON	0.99+
John	PERSON	0.99+
$50,000	QUANTITY	0.99+
Orange	ORGANIZATION	0.99+
Telefonica	ORGANIZATION	0.99+
John Troyer	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Nordstrom	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Ubisoft	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Star Trek Bridge Crew	TITLE	0.99+
BMW	ORGANIZATION	0.99+
VW	ORGANIZATION	0.99+
US	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Deutsche Telekom	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Joel Horwitz	PERSON	0.99+
Darrell Jordan-Smith	PERSON	0.99+
Darrell	PERSON	0.99+
San Francisco	LOCATION	0.99+
Googles	ORGANIZATION	0.99+
Olympics	EVENT	0.99+
50%	QUANTITY	0.99+
AT&T	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
telcos	ORGANIZATION	0.99+
12	QUANTITY	0.99+
Monday	DATE	0.99+
Last year	DATE	0.99+
telco	ORGANIZATION	0.99+
North America	LOCATION	0.99+
35%	QUANTITY	0.99+
AVG	ORGANIZATION	0.99+
Addtech	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
three	QUANTITY	0.99+
70%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Stu Miniman	PERSON	0.99+
40%	QUANTITY	0.99+
H2O	ORGANIZATION	0.99+
last year	DATE	0.99+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for H2O: