Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business
>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)
SUMMARY :
bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Erik | PERSON | 0.99+ |
Alex Myerson | PERSON | 0.99+ |
Ken Shifman | PERSON | 0.99+ |
Sanjay Poonen | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Erik Bradley | PERSON | 0.99+ |
November 21 | DATE | 0.99+ |
Darren Bramen | PERSON | 0.99+ |
Alex | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
Postgres | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Netskope | ORGANIZATION | 0.99+ |
Adobe | ORGANIZATION | 0.99+ |
Rob Hof | PERSON | 0.99+ |
Fivetran | ORGANIZATION | 0.99+ |
$50 million | QUANTITY | 0.99+ |
21% | QUANTITY | 0.99+ |
Chris Lynch | PERSON | 0.99+ |
19% | QUANTITY | 0.99+ |
Jeremy Burton | PERSON | 0.99+ |
$800 million | QUANTITY | 0.99+ |
6,000 | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Redis Labs | ORGANIZATION | 0.99+ |
November '21 | DATE | 0.99+ |
ETR | ORGANIZATION | 0.99+ |
First | QUANTITY | 0.99+ |
25% | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
OneTrust | ORGANIZATION | 0.99+ |
two dimensions | QUANTITY | 0.99+ |
two groups | QUANTITY | 0.99+ |
November of 21 | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
more than 400 companies | QUANTITY | 0.99+ |
Kristen Martin | PERSON | 0.99+ |
MySQL | TITLE | 0.99+ |
Moogsoft | ORGANIZATION | 0.99+ |
The Cube | ORGANIZATION | 0.99+ |
third | QUANTITY | 0.99+ |
Grafana | ORGANIZATION | 0.99+ |
H2O | ORGANIZATION | 0.99+ |
Mike Speiser | PERSON | 0.99+ |
david.vellante@siliconangle.com | OTHER | 0.99+ |
second | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
28% | QUANTITY | 0.99+ |
16% | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
Natasha | DigitalBits VIP Gala Dinner Monaco
(upbeat music) >> Hello, everyone. Welcome back to theCUBE's extended coverage. I'm John Furrier, host of theCUBE. We are here in Monaco at the Yacht Club, part of the VIP Gala with Prince Albert, DigitalBits, theCUBE. theCUBE and Prince Albert celebrating Monaco leaning into crypto. I'm here with Natasha Mahfar, who's our guest. She just came on theCUBE. Great story. Great to see you. Thanks for coming on. >> Thank you so much for having me. >> Tell the folks what you do real quick. >> Sure. So I actually started my career in Silicon Valley, like you have. And I had the idea of creating a startup in mental health that was voice based only. So it was peer to peer support groups via voice. So I created this startup, pretended to be a student at Stanford and built out a whole team, and unfortunately, at that time, no one was in the space of mental health and voice. Now, as you know, it's a $30 billion industry that's one of the biggest in Silicon Valley. So my career really started from there. And due to that startup, I got involved in the World XR Forum. Now, the World XR Forum is kind of like a mini Davos, but a little bit more exclusive, where we host entrepreneurs, people in blockchain, crypto, and we have a five day event covering all sorts of topics. So- >> When you host them, you mean like host them and they hang out and sleep over? It's a hotel? Is it an event? A workshop? >> There's workshops. We arrange hotels. We pretty much arrange everything that there is. >> It's a group get together. >> It's a group get together. Pretty much like Davos. >> And so Natasha, I wanted to talk to you about what we're passionate about which is theCUBE is bringing people up to have a voice and give them a voice. Give people a platform. You don't have to be famous. If you have something to say and share, we found that right now in this environment with media, we go out to an event, we stream as many stories, but we also have the virtual version of our studio. And I could tell you, I've found that internationally now as we bring people together, there are so many great stories. >> Absolutely. >> Out there that need to be told. And the bottleneck isn't the media, it's the fact that it's open now. >> Yes. >> So why aren't the stories coming out? So our mission is to get the stories. >> Wow. >> Scale stories. The more stories that are scaled, the more people can feel it. More people are impacted by it, and it changes the world. It gets people serendipity with data 'cause we're, you know, you shared some data about what you're working on. >> Yeah, of course. It's all about data these days. And the fact that you're doing it so openly is great because there is a need for that today, so. >> What do you see right now in the market for media? I mean, we got emerging markets, a lot of misinformation. Trust is a big problem. >> Right. >> Bullying, harassing. Smear campaigns. What's news, what's not news. I mean, how do you get your news? I mean, how do people figure out what's going on? >> No, absolutely. And this is such a pure format and a way of doing it. How did you come up with the idea, and how did you start? >> Well, I started... I realized after the Web 2.0, when social media started taking over and ruining the democratization . Blogging, podcasting, which I started in 2004, one of the first podcasts in Silicon Valley. >> Wow. >> I saw the network of that. I saw the value that people had when normal people, they call it user generated content, shared information. And I discovered something amazing that a nobody like me can have a really top podcast. >> Well, you're definitely not a nobody, but... >> Well, I was back then. And nobody knew me back then. But what it is is that even... If you put your voice out there, people will connect to it. And if you have the ability to bring other people in, you start to see a social dynamic. And what social media ruined, Facebook, Twitter, not so much Twitter 'cause Twitter's more smeary, but it's still got to open the API, LinkedIn, they're all terrible. They're all gardens. They don't really bring people together, so I think that stalled for about almost eight years or nine years. Now, with crypto and decentralization, you start to see the same thing come back. Democratization, level the playing field, remove the middle man and person, intermediate the middle bottlenecks. So with media, we found that live streaming and going to events was what the community wants. And then interviewing people, and getting their ideas out there. Not promotional, not getting paid to say stuff. Yeah, they get the plug in for the company that they're working on, that's good for everybody. But more share something that you're passionate about, data. And it works. And people like it. And we've been doing it for 12 years, and it creates a great brand of openness, community, and network effect. So we scaled up the brand to be- >> And it seems like you're international now. I mean, we're sitting in Monte Carlo, so I don't think it gets better than that. >> Well, in 2016, we started going international. 2017, we started doing stuff in Europe. 2018, we did the crypto, Middle East. And we also did London, a lot of different events. We had B2B Enterprise and Crypto Blooming. 2019, we were like, "Let's go global with staff and whatnot." >> Wow. >> And the pandemic hits. >> I know. >> And that really kind of allowed us to pivot and turn us into a virtual hybrid. And that's why we're into the metaverse, as we see the value of a physical face to face event where intimacy's there, but why aren't my friends connected first party? >> Right. How much would you say the company has grown from the time that you kind of pivoted? >> Well, we've grown in a different direction with new capabilities because the old way is over. >> Right. >> Every event right now, this event here, is in person. People are talking. They get connections. But every person that's connecting has a social graph behind them that's online too, and immediately available. And with Instagram, direct messaging, Telegram, Signal, all there. >> It's brilliant. Honestly, it was brilliant idea and a brilliant pivot. >> Thank you for interviewing me. >> Yeah, of course. (Natasha and John laugh) >> Any other questions? >> That should do it. >> Okay. Are you going to have fun tonight? >> Absolutely. >> What is your take of the Monaco scene here? What's it like? >> You know, I think it's a really interesting scene. I think there's a lot of potential because this is such an international place so it draws a very eclectic crowd, and I think there's a lot that could be done here. And you have a lot of people from Europe that are starting to get into this whole crypto, leaving kind of the traditional banks and finance behind. So I think the potential is very strong. >> Very progressive. Well, Natasha, thank you for sharing. >> Thank you so much. >> Here on theCUBE. We're the extended edition CUBE here in Monaco with Prince Albert, theCUBE, and Prince Albert, DigitalBits Al Burgio, a great market here for them. And just an amazing time. And thanks for watching. Natasha, thanks for coming on. Thanks for watching theCUBE. We'll be back with more after this break. (upbeat music)
SUMMARY :
part of the VIP Gala with Prince Albert, And I had the idea of creating everything that there is. It's a group get together. And so Natasha, I wanted to talk to you And the bottleneck isn't the media, So our mission is to get the stories. the more people can feel it. And the fact that you're now in the market for media? I mean, how do you get your news? And this is such a pure I realized after the Web 2.0, I saw the network of that. Well, you're definitely And if you have the ability And it seems like And we also did London, a And that really kind from the time that you kind of pivoted? because the old way is over. And with Instagram, direct it was brilliant idea Yeah, of course. to have fun tonight? And you have a lot of people from Europe Well, Natasha, thank you for sharing. We're the extended edition
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Natasha Mahfar | PERSON | 0.99+ |
Natasha | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
2004 | DATE | 0.99+ |
Europe | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
2018 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
2019 | DATE | 0.99+ |
2016 | DATE | 0.99+ |
2017 | DATE | 0.99+ |
$30 billion | QUANTITY | 0.99+ |
Monaco | LOCATION | 0.99+ |
DigitalBits | ORGANIZATION | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
five day | QUANTITY | 0.99+ |
Monte Carlo | LOCATION | 0.99+ |
London | LOCATION | 0.99+ |
Middle East | LOCATION | 0.98+ |
today | DATE | 0.98+ |
ORGANIZATION | 0.98+ | |
ORGANIZATION | 0.97+ | |
tonight | DATE | 0.97+ |
ORGANIZATION | 0.96+ | |
one | QUANTITY | 0.96+ |
nine years | QUANTITY | 0.96+ |
World XR Forum | EVENT | 0.95+ |
first podcasts | QUANTITY | 0.95+ |
Stanford | ORGANIZATION | 0.93+ |
first party | QUANTITY | 0.9+ |
B2B Enterprise | ORGANIZATION | 0.89+ |
Prince Albert | ORGANIZATION | 0.88+ |
Albert | ORGANIZATION | 0.86+ |
Prince | PERSON | 0.84+ |
Prince Albert | PERSON | 0.82+ |
Davos | PERSON | 0.8+ |
eight years | QUANTITY | 0.8+ |
ORGANIZATION | 0.76+ | |
Dinner | EVENT | 0.74+ |
Yacht Club | LOCATION | 0.72+ |
Telegram | TITLE | 0.71+ |
pandemic | EVENT | 0.68+ |
CUBE | ORGANIZATION | 0.67+ |
Al Burgio | ORGANIZATION | 0.61+ |
Signal | ORGANIZATION | 0.5+ |
Crypto Blooming | EVENT | 0.41+ |
theCUBE | TITLE | 0.4+ |
Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization
>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)
SUMMARY :
bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Frank Slootman | PERSON | 0.99+ |
Frank Slootman | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
Stephanie Chan | PERSON | 0.99+ |
Christian Kleinerman | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Rob Hof | PERSON | 0.99+ |
Benoit Dageville | PERSON | 0.99+ |
2014 | DATE | 0.99+ |
Matt Sulkins | PERSON | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
2019 | DATE | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Denise Persson | PERSON | 0.99+ |
Alex Myerson | PERSON | 0.99+ |
Tony Bear | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
July | DATE | 0.99+ |
Geico | ORGANIZATION | 0.99+ |
November | DATE | 0.99+ |
Snowflake | TITLE | 0.99+ |
40% | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
App Store | TITLE | 0.99+ |
Capital One | ORGANIZATION | 0.99+ |
second principle | QUANTITY | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
1300 native apps | QUANTITY | 0.99+ |
Tony Bear | PERSON | 0.99+ |
David.Vellante@siliconangle.com | OTHER | 0.99+ |
Kristin Martin | PERSON | 0.99+ |
Mongo | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Snowflake Summit 2022 | EVENT | 0.99+ |
First | QUANTITY | 0.99+ |
two | DATE | 0.99+ |
Python | TITLE | 0.99+ |
10 different tables | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
ETR | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Snowflake | EVENT | 0.98+ |
one place | QUANTITY | 0.98+ |
each week | QUANTITY | 0.98+ |
O'Reilly | ORGANIZATION | 0.98+ |
This week | DATE | 0.98+ |
Hadoop World | EVENT | 0.98+ |
this week | DATE | 0.98+ |
Pure | ORGANIZATION | 0.98+ |
about 40 partners | QUANTITY | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
last quarter | DATE | 0.98+ |
One | QUANTITY | 0.98+ |
S3 | TITLE | 0.97+ |
Hadoop | LOCATION | 0.97+ |
single | QUANTITY | 0.97+ |
Caesars Forum | LOCATION | 0.97+ |
Iceberg | TITLE | 0.97+ |
single source | QUANTITY | 0.97+ |
Silicon | ORGANIZATION | 0.97+ |
Nearly 10,000 people | QUANTITY | 0.97+ |
Apache Iceberg | ORGANIZATION | 0.97+ |
Jon Loyens, data.world | Snowflake Summit 2022
>>Good morning, everyone. Welcome back to the Cube's coverage of snowflake summit 22 live from Caesar's forum in Las Vegas. Lisa Martin, here with Dave Valante. This is day three of our coverage. We've had an amazing, amazing time. Great conversations talking with snowflake executives, partners, customers. We're gonna be digging into data mesh with data.world. Please welcome John loins, the chief product officer. Great to have you on the program, John, >>Thank you so much for, for having me here. I mean, the summit, like you said, has been incredible, so many great people, so such a good time, really, really nice to be back in person with folks. >>It is fabulous to be back in person. The fact that we're on day four for, for them. And this is the, the solution showcase is as packed as it is at 10 11 in the morning. Yeah. Is saying something >>Yeah. Usually >>Chopping at the bit to hear what they're doing and innovate. >>Absolutely. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, especially >>In Vegas. This is impressive. Talk to the audience a little bit about data.world, what you guys do and talk about the snowflake relationship. >>Absolutely data.world is the only true cloud native enterprise data catalog. We've been an incredible snowflake partner and Snowflake's been an incredible partner to us really since 2018. When we became the first data catalog in the snowflake partner connect experience, you know, snowflake and the data cloud make it so possible. And it's changed so much in terms of being able to, you know, very easily transition data into the cloud to break down those silos and to have a platform that enables folks to be incredibly agile with data from an engineering and infrastructure standpoint, data out world is able to provide a layer of discovery and governance that matches that agility and the ability for a lot of different stakeholders to really participate in the process of data management and data governance. >>So data mesh basically Jamma, Dani lays out the first of all, the, the fault domains of existing data and big data initiatives. And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams that you have to go through and it just slows everything down and it doesn't scale. They don't have domain context. So she came up with four principles if I may, yep. Domain ownership. So push it out to the businesses. They have the context they should own the data. The second is data as product. We're certainly hearing a lot about that today this week. The third is that. So that makes it sounds good. Push out the, the data great, but it creates two problems. Self-serve infrastructure. Okay. But her premises infrastructure should be an operational detail. And then the fourth is computational governance. So you talked about data CA where do you fit in those four principles? >>You know, honestly, we are able to help teams realize the data mesh architecture. And we know that data mesh is really, it's, it's both a process in a culture change, but then when you want to enact a process in a culture change like this, you also need to select the appropriate tools to match the culture that you're trying to build the process in the architecture that you're trying to build. And the data world data catalog can really help along all four of those axes. When you start thinking first about, let's say like, let's take the first one, you know, data as a product, right? We even like very meta of us from metadata management platform at the end of the day. But very meta of us. When you talk about data as a product, we track adoption and usage of all your data assets within your organization and provide program teams and, you know, offices of the CDO with incredible evented analytics, very detailed that gives them the right audit trail that enables them to direct very scarce data engineering, data architecture resources, to make sure that their data assets are getting adopted and used properly. >>On the, on the domain driven side, we are entirely knowledge graph and open standards based enabling those different domains. We have, you know, incredible joint snowflake customers like Prologis. And we chatted a lot about this in our session here yesterday, where, because of our knowledge graph underpinnings, because of the flexibility of our metadata model, it enables those domains to actually model their assets uniquely from, from group to group, without having to, to relaunch or run different environments. Like you can do that all within one day catalog platform without having to have separate environments for each of those domains, federated governance. Again, the amount of like data exhaust that we create that really enables ambient governance and participatory governance as well. We call it agile data governance, really the adoption of agile and open principles applied to governance to make it more inclusive and transparent. And we provide that in a way that Confederate across those means and make it consistent. >>Okay. So you facilitate across that whole spectrum of, of principles. And so what in the, in the early examples of data mesh that I've studied and actually collaborated with, like with JPMC, who I don't think is who's not using your data catalog, but hello, fresh who may or may not be, but I mean, there, there are numbers and I wanna get to that. But what they've done is they've enabled the domains to spin up their own, whatever data lakes, data, warehouses, data hubs, at least in, in concept, most of 'em are data lakes on AWS, but still in concept, they wanna be inclusive and they've created a master data catalog. And then each domain has its sub catalogue, which feeds into the master and that's how they get consistency and governance and everything else is, is that the right way to think about it? And or do you have a different spin on that? >>Yeah, I, I, you know, I have a slightly different spin on it. I think organizationally it's the right way to think about it. And in absence of a catalog that can truly have multiple federated metadata models, multiple graphs in one platform, I, that is really kind of the, the, the only way to do it, right with data.world. You don't have to do that. You can have one platform, one environment, one instance of data.world that spans all of your domains, enable them to operate independently and then federate across. So >>You just answered my question as to why I should use data.world versus Amazon glue. >>Oh, absolutely. >>And that's a, that's awesome that you've done now. How have you done that? What, what's your secret >>Sauce? The, the secret sauce era is really an all credit to our CTO. One of my closest friends who was a true student of knowledge graph practices and principles, and really felt that the right way to manage metadata and knowledge about the data analytics ecosystem that companies were building was through federated linked data, right? So we use standards and we've built a, a, an open and extensible metadata model that we call costs that really takes the best parts of existing open standards in the semantics space. Things like schema.org, DCA, Dublin core brings them together and models out the most typical enterprise data assets providing you with an ontology that's ready to go. But because of the graph nature of what we do is instantly accessible without having to rebuild environments, without having to do a lot of management against it. It's, it's really quite something. And it's something all of our customers are, are very impressed with and, and, and, and, you know, are getting a lot of leverage out of, >>And, and we have a lot of time today, so we're not gonna shortchange this topic. So one last question, then I'll shut up and let you jump in. This is an open standard. It's not open source. >>No, it's an open built on open standards, built on open standards. We also fundamentally believe in extensibility and openness. We do not want to vertically like lock you into our platform. So everything that we have is API driven API available. Your metadata belongs to you. If you need to export your graph, you know, instantly available in open machine readable formats. That's really, we come from the open data community. That was a lot of the founding of data.world. We, we worked a lot in with the open data community and we, we fundamentally believe in that. And that's enabled a lot of our customers as well to truly take data.world and not have it be a data catalog application, but really an entire metadata management platform and extend it even further into their enterprise to, to really catalog all of their assets, but also to build incredible integrations to things like corporate search, you know, having data assets show up in corporate Wiki search, along with all the, the descriptive metadata that people need has been incredibly powerful and an incredible extension of our platform that I'm so happy to see our customers in. >>So leasing. So it's not exclusive to, to snowflake. It's not exclusive to AWS. You can bring it anywhere. Azure GCP, >>Anytime. Yeah. You know where we are, where we love snowflake, look, we're at the snowflake summit. And we've always had a great relationship with snowflake though, and really leaned in there because we really believe Snowflake's principles, particularly around cloud and being cloud native and the operating advantages that it affords companies that that's really aligned with what we do. And so snowflake was really the first of the cloud data catalogs that we ultimately or say the cloud data warehouses that we integrated with and to see them transition to building really out the data cloud has been awesome. >>Talk about how data world and snowflake enable companies like per lodges to be data companies. These days, every company has to be a data company, but they, they have to be able to do so quickly to be competitive and to, to really win. How do you help them if we like up level the conversation to really impacting the overall business? >>That's a great question, especially right now, everybody knows. And pro is a great example. They're a logistics and supply chain company at the end of the day. And we know how important logistics and supply chain is nowadays and for them and for a lot of our customers. I think one of the advantages of having a data catalog is the ability to build trust, transparency and inclusivity into their data analytics practice by adopting agile principles, by adopting a data mesh, you're able to extend your data analytics practice to a much broader set of stakeholders and to involve them in the process while the work is getting done. One of the greatest things about agile software development, when it became a thing in the early two thousands was how inclusive it was. And that inclusivity led to a much faster ROI on software projects. And we see the same thing happening in data analytics, people, you know, we have amazing data scientists and data analysts coming up with these insights that could be business changing that could make their company significantly more resilient, especially in the face of economic uncertainty. >>But if you have to sit there and argue with your business stakeholders about the validity of the data, about the, the techniques that were used to do the analysis, and it takes you three months to get people to trust what you've done, that opportunity's passed. So how do we shorten those cycles? How do we bring them closer? And that's, that's really a huge benefit that like Prologis has, has, has realized just tightening that cycle time, building trust, building inclusion, and making sure ultimately humans learn by doing, and if you can be inclusive, it, even, it even increases things like that. We all want to, to, to, to help cuz Lord knows the world needs it. Things like data literacy. Yeah. Right. >>So data.world can inform me as to where on the spectrum of data quality, my data set lives. So I can say, okay, this is usable, shareable, you know, exactly of gold standard versus fix this. Right. Okay. Yep. >>Yep. >>That's yeah. Okay. And you could do that with one data catalog, not a bunch of >>Yeah. And trust trust is really a multifaceted and multi multi-angle idea, right? It's not just necessarily data quality or data observability. And we have incredible partnerships in that space, like our partnership with, with Monte Carlo, where we can ingest all their like amazing observability information and display that in a really like a really consumable way in our data catalog. But it also includes things like the lineage who touch it, who is involved in the process of a, can I get a, a, a question answered quickly about this data? What's it been used for previously? And do I understand that it's so multifaceted that you have to be able to really model and present that in a way that's unique to any given organization, even unique within domains within a single organization. >>If you're not, that means to suggest you're a data quality. No, no supplier. Absolutely. But your partner with them and then that you become the, the master catalog. >>That's brilliant. I love it. Exactly. And you're >>You, you just raised your series C 15 million. >>We did. Yeah. So, you know, really lucky to have incredible investors like Goldman Sachs, who, who led our series C it really, I think, communicates the trust that they have in our vision and what we're doing and the impact that we can have on organization's ability to be agile and resilient around data analytics, >>Enabling customers to have that single source of truth is so critical. You talked about trust. That is absolutely. It's no joke. >>Absolutely. >>That is critical. And there's a tremendous amount of business impact, positive business impact that can come from that. What are some of the things that are next for data.world that we're gonna see? >>Oh, you know, I love this. We have such an incredibly innovative team. That's so dedicated to this space and the mission of what we're doing. We're out there trying to fundamentally change how people get data analytics work done together. One of the big reasons I founded the company is I, I really truly believe that data analytics needs to be a team sport. It needs to go from, you know, single player mode to team mode and everything that we've worked on in the last six years has leaned into that. Our architecture being cloud native, we do, we've done over a thousand releases a year that nobody has to manage. You don't have to worry about upgrading your environment. It's a lot of the same story that's made snowflake. So great. We are really excited to have announced in March on our own summit. And we're rolling this suite of features out over the course of the year, a new package of features that we call data.world Eureka, which is a suite of automations and, you know, knowledge driven functionality that really helps you leverage a knowledge graph to make decisions faster and to operationalize your data in, in the data ops way with significantly less effort, >>Big, big impact there. John, thank you so much for joining David, me unpacking what data world is doing. The data mesh, the opportunities that you're giving to customers and every industry. We appreciate your time and congratulations on the news and the funding. >>Ah, thank you. It's been a, a true pleasure. Thank you for having me on and, and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. Thank you. >>We will. All right. For our guest and Dave ante, I'm Lisa Martin. You're watching the cubes third day of coverage of snowflake summit, 22 live from Vegas, Dave and I will be right back with our next guest. So stick around.
SUMMARY :
Great to have you on the program, John, I mean, the summit, like you said, has been incredible, It is fabulous to be back in person. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, what you guys do and talk about the snowflake relationship. And it's changed so much in terms of being able to, you know, very easily transition And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams about, let's say like, let's take the first one, you know, data as a product, We have, you know, incredible joint snowflake customers like Prologis. governance and everything else is, is that the right way to think about it? And in absence of a catalog that can truly have multiple federated How have you done that? of knowledge graph practices and principles, and really felt that the right way to manage then I'll shut up and let you jump in. an incredible extension of our platform that I'm so happy to see our customers in. It's not exclusive to AWS. first of the cloud data catalogs that we ultimately or say the cloud data warehouses but they, they have to be able to do so quickly to be competitive and to, thing happening in data analytics, people, you know, we have amazing data scientists and data the data, about the, the techniques that were used to do the analysis, and it takes you three So I can say, okay, this is usable, shareable, you know, That's yeah. that you have to be able to really model and present that in a way that's unique to any then that you become the, the master catalog. And you're that we can have on organization's ability to be agile and resilient Enabling customers to have that single source of truth is so critical. What are some of the things that are next for data.world that we're gonna see? It needs to go from, you know, single player mode to team mode and everything The data mesh, the opportunities that you're giving to customers and every industry. and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. So stick around.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jon Loyens | PERSON | 0.99+ |
Monte Carlo | ORGANIZATION | 0.99+ |
John loins | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
March | DATE | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
Vegas | LOCATION | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
yesterday | DATE | 0.99+ |
three months | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
one platform | QUANTITY | 0.99+ |
one day | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
two problems | QUANTITY | 0.99+ |
fourth | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
15 million | QUANTITY | 0.98+ |
Dani | PERSON | 0.98+ |
second | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
third day | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
Snowflake | ORGANIZATION | 0.98+ |
DCA | ORGANIZATION | 0.98+ |
one last question | QUANTITY | 0.98+ |
data.world. | ORGANIZATION | 0.97+ |
Prologis | ORGANIZATION | 0.97+ |
JPMC | ORGANIZATION | 0.97+ |
each domain | QUANTITY | 0.97+ |
today this week | DATE | 0.97+ |
Jamma | PERSON | 0.97+ |
both | QUANTITY | 0.97+ |
first data catalog | QUANTITY | 0.95+ |
Snowflake Summit 2022 | EVENT | 0.95+ |
each | QUANTITY | 0.94+ |
today | DATE | 0.94+ |
single | QUANTITY | 0.94+ |
data.world | ORGANIZATION | 0.93+ |
day three | QUANTITY | 0.93+ |
one | QUANTITY | 0.93+ |
one instance | QUANTITY | 0.92+ |
over a thousand releases a year | QUANTITY | 0.92+ |
day four | QUANTITY | 0.91+ |
Snowflake | TITLE | 0.91+ |
four | QUANTITY | 0.91+ |
10 11 in the morning | DATE | 0.9+ |
22 | QUANTITY | 0.9+ |
one environment | QUANTITY | 0.9+ |
single organization | QUANTITY | 0.88+ |
four principles | QUANTITY | 0.86+ |
agile | TITLE | 0.85+ |
last six years | DATE | 0.84+ |
one data catalog | QUANTITY | 0.84+ |
Eureka | ORGANIZATION | 0.83+ |
Azure GCP | TITLE | 0.82+ |
Caesar | PERSON | 0.82+ |
series C | OTHER | 0.8+ |
Cube | ORGANIZATION | 0.8+ |
data.world | OTHER | 0.78+ |
Lord | PERSON | 0.75+ |
thousands | QUANTITY | 0.74+ |
single source | QUANTITY | 0.74+ |
Dublin | ORGANIZATION | 0.73+ |
snowflake summit 22 | EVENT | 0.7+ |
Wiki | TITLE | 0.68+ |
schema.org | ORGANIZATION | 0.67+ |
early two | DATE | 0.63+ |
CDO | TITLE | 0.48+ |
Breaking Analysis: Technology & Architectural Considerations for Data Mesh
>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)
SUMMARY :
bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kristen Martin | PERSON | 0.99+ |
Rob Hoff | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
Stephanie Chan | PERSON | 0.99+ |
Alex Myerson | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Zhamak | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 lakes | QUANTITY | 0.99+ |
Sanji Mohan | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Paul Andrew | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Zhamak Dehghani | PERSON | 0.99+ |
Data Mesh: Delivering Data-Driven Value at Scale | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
14 plus years | QUANTITY | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
two points | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
second layer | QUANTITY | 0.99+ |
2016 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
hundreds of lakes | QUANTITY | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
david.vellante@siliconangle.com | OTHER | 0.99+ |
theCUBE Studios | ORGANIZATION | 0.98+ |
SQL | TITLE | 0.98+ |
one unit | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
100 level | QUANTITY | 0.98+ |
third point | QUANTITY | 0.98+ |
Databricks | ORGANIZATION | 0.98+ |
Europe | LOCATION | 0.98+ |
three buckets | QUANTITY | 0.98+ |
ETR | ORGANIZATION | 0.98+ |
DevStack | TITLE | 0.97+ |
One | QUANTITY | 0.97+ |
wikibon.com | OTHER | 0.97+ |
both | QUANTITY | 0.97+ |
Thoughtworks | ORGANIZATION | 0.96+ |
one set | QUANTITY | 0.96+ |
one stream | QUANTITY | 0.96+ |
Intuit | ORGANIZATION | 0.95+ |
one way | QUANTITY | 0.93+ |
two worlds | QUANTITY | 0.93+ |
HelloFresh | ORGANIZATION | 0.93+ |
this week | DATE | 0.93+ |
last night | DATE | 0.91+ |
fourth one | QUANTITY | 0.91+ |
Snowflake | TITLE | 0.91+ |
two different models | QUANTITY | 0.91+ |
ML Analytics | TITLE | 0.91+ |
Breaking Analysis | TITLE | 0.87+ |
two worlds | QUANTITY | 0.84+ |
PUBLIC SECTOR Speed to Insight
>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition at fraud waste and abuse per the government accountability office is broad as an attempt to obtain something about a value through unwelcomed misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal, uh, benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically for the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external perpetrators, again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, uh, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, those are broad stroke areas. What are the actual use cases that, um, agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use great, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment integrity. So fraud has its, um, uh, underpinnings in quite a few different government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that Chev is going to talk about later it's how do I look at more, that real, that streaming information? >>How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in essence, looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in, um, some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, uh, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific constituent, are there areas where we're seeing, uh, um, other aspects of a fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at, uh, simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. >>Um, and again, that really lends itself to a new opportunities. And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, uh, doing these baskets. >>Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection using, uh, Cloudera underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our data sets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so then comes clutter's platform and this reference architecture that needs to before you, so, uh, let's start on the left-hand side of this reference architecture with the collect phase. >>So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different porosities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with clutter data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL string builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or exploratory data analysis and visualization, uh, can all be enabled through clever visual patient technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutter's technology, right? And so, uh, the IRS is one of, uh, clutters customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually, uh, recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around, uh, how quad areas working in the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining Chevy and I today, we greatly appreciate your time and look forward to future >>Conversation..
SUMMARY :
So as we look at fraud, So as we also look at a So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, looking at, uh, deep learning type models around, uh, you know, So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher It could be in the data center or even on edge devices, and this data needs to be collected so uh, you know, downstream systems for further process. So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL string builder, historically collected data set, uh, to do this, we can use a combination of supervised And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, um, to give you some insights on
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Cindy Mikey | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Molly | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
patrick | PERSON | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
PWC | ORGANIZATION | 0.99+ |
Cindy | PERSON | 0.99+ |
Patrick Osbourne | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
NIFA | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
today | DATE | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
over $65 billion | QUANTITY | 0.99+ |
over $51 billion | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Shev | PERSON | 0.99+ |
57 billion | QUANTITY | 0.99+ |
IRS | ORGANIZATION | 0.99+ |
Sheva | PERSON | 0.98+ |
Jason | PERSON | 0.98+ |
first | QUANTITY | 0.98+ |
both | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
HPE | ORGANIZATION | 0.97+ |
Intel | ORGANIZATION | 0.97+ |
Avro | PERSON | 0.96+ |
salty | PERSON | 0.95+ |
eight X | QUANTITY | 0.95+ |
Apache | ORGANIZATION | 0.94+ |
single technology | QUANTITY | 0.92+ |
eight times | QUANTITY | 0.92+ |
91 billion | QUANTITY | 0.91+ |
zero changes | QUANTITY | 0.9+ |
next year | DATE | 0.9+ |
caldera | ORGANIZATION | 0.9+ |
Chev | ORGANIZATION | 0.87+ |
Richmond | LOCATION | 0.85+ |
three prong | QUANTITY | 0.85+ |
$148 billion | QUANTITY | 0.84+ |
single common format | QUANTITY | 0.83+ |
SQL | TITLE | 0.82+ |
Kafka | PERSON | 0.82+ |
Chevy | PERSON | 0.8+ |
HP Labs | ORGANIZATION | 0.8+ |
one individual | QUANTITY | 0.8+ |
Patrick | PERSON | 0.78+ |
Monte Carlo | TITLE | 0.76+ |
half | QUANTITY | 0.75+ |
over half | QUANTITY | 0.68+ |
17 | QUANTITY | 0.65+ |
second | QUANTITY | 0.65+ |
HBase | TITLE | 0.56+ |
elements | QUANTITY | 0.53+ |
Apache Flink | ORGANIZATION | 0.53+ |
cloudera.com | OTHER | 0.5+ |
coffin | PERSON | 0.5+ |
Spark | TITLE | 0.49+ |
Lake | COMMERCIAL_ITEM | 0.48+ |
HPE | TITLE | 0.47+ |
mini five | COMMERCIAL_ITEM | 0.45+ |
Green | ORGANIZATION | 0.37+ |
PUBLIC SECTOR V1 | CLOUDERA
>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud. Isn't an attempt to obtain something about value through unwelcome misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external, uh, perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically about 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from permit out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's a broad stroke areas. What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're gonna use focused specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of, um, unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has it it's, um, uh, underpinnings inquiry, like you different on government agencies and difficult, different analytical methods, and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models. We're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that shad is going to talk about later is how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the, the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like a constituent, are there areas where we're seeing, uh, >>Um, other >>Aspects of, of fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, uh, agent-based modeling techniques, where we're looking at simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to chef to talk about, uh, the reference architecture for, uh, doing these buckets. >>Thanks, Cindy. Um, so I'm gonna walk you through an example, reference architecture for fraud detection using, uh, Cloudera's underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. We need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, thinking, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on a patch NIFA in mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geolocation that's in that transaction data can be enriched with previously known locations of that very same individual. And all of that enriched data can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stricted to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So coffee is going to pretty much provide you with, uh, extremely fast resilient and fault tolerance storage. And it's also gonna give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone, uh, allowed that, you know, 17. So you can store that data in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL stream builder, which enables us to write, uh, streaming SQL jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks. And these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutters technology, right? And so, uh, the IRS is one of, uh, clutter's customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their spark based analytics and their machine learning, uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection, uh, looking at neural network analysis, time series information. So next steps we'd love to have additional conversation with you. You can also find on some additional information around, I have caught areas working in the, the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us Sheva and I today. We greatly appreciate your time and look forward to future progress. >>Good day, everyone. Thank you for joining me. I'm Sydney. Mike joined by Rick Taylor of Cloudera. Uh, we're here to talk about predictive maintenance for the public sector and how to increase assets, service, reliability on today's agenda. We'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on, um, what type of data, the analytical methods that we're typically seeing used, um, the associated, uh, Brooke, we'll go over a case study as well as a reference architecture. So by basic definition, uh, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets of actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. >>McKinsey has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about our corrective maintenance, and that's when we're performing maintenance on an asset, um, after the equipment fails. But the challenges with that is we end up with unplanned. We end up with disruptions in our schedules, um, as well as reduced quality, um, around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. Um, the challenges with that is we're typically doing it regardless of the actual condition of the asset, um, which has resulted in unnecessary downtime and expense. Um, and specifically we're really now focused on pre uh, condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Um, within that we've seen organizations, um, and again, source from McKenzie have a 50% reduction in downtime, as well as an overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy, um, several years ago, >>Um, they've really >>Looked at what does predictive maintenance mean to the public sector? What is the benefit, uh, looking at increasing return on investment of assets, reducing, uh, you know, reduction in downtime, um, as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure. Um, and then the movement towards, uh, preventative, which is based upon a set schedule or looking at predictive where we're monitoring real-time conditions. Um, and most importantly is now actually leveraging IOT and data and analytics to further reduce those overall downtimes. And there's a research report by the, uh, department of energy that goes into more specifics, um, on the opportunity within the public sector. So, Rick, let's talk a little bit about what are some of the challenges, uh, regarding data, uh, regarding predictive maintenance. >>Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in, in these silos of information. Uh, couple that with huge increases in data volume data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and insights and, and that in turn then fuels, uh, machine learning and, um, and, and what we call artificial intelligence, which enables predictive maintenance. Next slide. So >>Let's look specifically at, you know, the, the types of use cases and I'm going to Rick and I are going to focus on those use cases, where do we see predictive maintenance coming into the procurement facility, supply chain, operations and logistics. Um, we've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about, uh, using, uh, information, whether it be on a, um, a connected asset or a vehicle doing monitoring, uh, to also leveraging predictive maintenance on how do we bring about, uh, looking at data from connected warehouses facilities and buildings all bring on an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at re uh, looking at cost efficiency, as well as looking at risk and safety and the types of data, um, you know, that Rick mentioned around, you know, the new types of information, some of those data elements that we typically have seen is looking at failure history. >>So when has that an asset or a machine or a component within a machine failed in the past? Uh, we've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets, uh, looking at when we've replaced certain components to looking at, um, how are we actually leveraging the assets? What were the operating conditions, uh, um, pulling off data from a sensor on that asset? Um, also looking at the, um, the features of an asset, whether it's, you know, engine size it's make and model, um, where's the asset located on to also looking at who's operated the asset, uh, you know, whether it be their certifications, what's their experience, um, how are they leveraging the assets and then also bringing in together, um, some of the, the pattern analysis that we've seen. So what are the operating limits? Um, are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So, Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. Sure. >>So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So, as we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, um, uh, or temperature and humidity, for example, all this stuff is then combined together, uh, and then use to develop machine learning models that better inform, uh, predictive maintenance, because we'll do need to keep, uh, to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here's some examples of private sector, uh, maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running cloud era on Azure to capture secure and correlate sensor data collected from equipment within the airport, the people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. >>The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning, help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies and transport systems. These all improve for another example is Navistar Navistar, leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owner. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called on command. >>The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks, speed, acceleration, cooling temperature, and break where this data is then correlated with other Navistar and third-party data sources, including weather geo location, vehicle usage, traffic warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the, the benefits Navistar's helped fleet owners reduce maintenance by more than 30%. The same platform is also used to help school buses run safely. And on time, for example, one school district with 110 buses that travel over a million miles annually reduce the number of PTOs needed year over year, thanks to predictive insights delivered by this platform. >>So I'd like to take a moment and walk through the data. Life cycle is depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform. Whereas combined with data from existing systems of record to provide additional insights, and this platform supports multiple analytic functions working together on the same data while maintaining strict security governance and control measures once processed the data is used to train machine learning models, which are then deployed into production, monitored, and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence, analytics, and dashboards. And in fact, this data lifecycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. >>And the benefits they've discovered include less unscheduled maintenance and a reduction in mean man hours to repair increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically cost more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle. We've been discussing Cloudera data flow, the data ingest data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes an integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the cloud era data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you, Rick, >>Thank you. And I hope that, uh, Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together, um, data sources that maybe you're having challenges with today. Uh, bringing that, uh, more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually, uh, optimize maintenance and reduce costs within the, uh, each of your agencies, uh, to learn a little bit more about Cloudera, um, and our, what we're doing from a predictive maintenance please, uh, business@cloudera.com solutions slash public sector. And we look forward to scheduling a meeting with you, and on that, we appreciate your time today and thank you very much.
SUMMARY :
So as we look at fraud, Um, the types of fraud that we see is specifically around cyber crime, So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, the breadth and the opportunity really comes about when you can integrate and Some of the techniques that we use and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, I'm going to turn it over to chef to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. It could be in the data center or even on edge devices, and this data needs to be collected At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL stream builder, obtain the accuracy of the performance, the scores that we want, Um, and one of the neat things with the IRS the analysis, the information that Sheva and I have provided, um, to give you some insights on the analytical methods that we're typically seeing used, um, the associated, doing it regardless of the actual condition of the asset, um, uh, you know, reduction in downtime, um, as well as overall maintenance costs. And so the challenge is to collect all these assets together and begin the types of data, um, you know, that Rick mentioned around, you know, the new types on to also looking at who's operated the asset, uh, you know, whether it be their certifications, So we want, what we want to do is combine that information with So to help fleet So the platform then uses machine learning and advanced analytics to automatically detect problems So data ingest from the edge may include feeds from the factory floor or things like improved aircraft availability, and the ability to avoid cascading And I hope that, uh, Rick and I provided you some insights on how predictive
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Cindy Mikey | PERSON | 0.99+ |
Rick | PERSON | 0.99+ |
Rick Taylor | PERSON | 0.99+ |
Molly | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
PWC | ORGANIZATION | 0.99+ |
40% | QUANTITY | 0.99+ |
110 buses | QUANTITY | 0.99+ |
Europe | LOCATION | 0.99+ |
50% | QUANTITY | 0.99+ |
Cindy | PERSON | 0.99+ |
Mike | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
today | DATE | 0.99+ |
Navistar | ORGANIZATION | 0.99+ |
First | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
more than 30% | QUANTITY | 0.99+ |
over $51 billion | QUANTITY | 0.99+ |
NIFA | ORGANIZATION | 0.99+ |
over $65 billion | QUANTITY | 0.99+ |
IRS | ORGANIZATION | 0.99+ |
over a million miles | QUANTITY | 0.99+ |
first | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
Jason | PERSON | 0.98+ |
Azure | TITLE | 0.98+ |
Brooke | PERSON | 0.98+ |
Avro | PERSON | 0.98+ |
one school district | QUANTITY | 0.98+ |
SQL | TITLE | 0.97+ |
both | QUANTITY | 0.97+ |
$148 billion | QUANTITY | 0.97+ |
Sheva | PERSON | 0.97+ |
three types | QUANTITY | 0.96+ |
each | QUANTITY | 0.95+ |
McKenzie | ORGANIZATION | 0.95+ |
more than 375,000 connected vehicles | QUANTITY | 0.95+ |
Cloudera | TITLE | 0.95+ |
about 57 billion | QUANTITY | 0.95+ |
salty | PERSON | 0.94+ |
several years ago | DATE | 0.94+ |
single technology | QUANTITY | 0.94+ |
eight times | QUANTITY | 0.93+ |
91 billion | QUANTITY | 0.93+ |
eight X | QUANTITY | 0.92+ |
business@cloudera.com | OTHER | 0.92+ |
McKinsey | ORGANIZATION | 0.92+ |
zero changes | QUANTITY | 0.92+ |
Monte Carlo | TITLE | 0.92+ |
caldera | ORGANIZATION | 0.91+ |
couple | QUANTITY | 0.9+ |
over 70 sensor data feeds | QUANTITY | 0.88+ |
Richmond | LOCATION | 0.84+ |
Navistar Navistar | ORGANIZATION | 0.82+ |
single view | QUANTITY | 0.81+ |
17 | OTHER | 0.8+ |
single common format | QUANTITY | 0.8+ |
thousands of data points | QUANTITY | 0.79+ |
Sydney | LOCATION | 0.78+ |
Cindy Maike & Nasheb Ismaily | Cloudera
>>Hi, this is Cindy Mikey, vice president of industry solutions at Cloudera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and Shev we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud is an attempt to obtain something about a value through unwelcomed. Misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external are perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's broad stroke areas? What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has its, um, uh, underpinnings in quite a few different on government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to lock at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case, that shadow is going to talk about later it's how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So quite a different variety of data and the, the breadth, um, and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a sense, looking at a more extensive on data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be, um, investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like, uh, constituent, are there areas where we're seeing, uh, um, other aspects of, of fraud potentially being, uh, occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at simulation, Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, the public sector. >>Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing these buckets. >>Sure. Yeah. Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection, using Cloudera as underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or anomalous behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create from normal behavior profiles and these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jace on or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to know downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin is going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer API APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transform data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on from machine learning, right? So the next step in the architecture is to leverage, uh, clutter SQL stream builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real-time. Uh, I'll, you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage Q2, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed, and we've our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, even deep learning techniques with neural networks. Uh, and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the X one, uh, scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. Uh, and this entire pipeline is powered by clutters technology. Uh, Cindy, next slide please. >>Right. And so, uh, the IRS is one of, uh, clutter as customers. That's leveraging our platform today and implementing a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, uh, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around how called areas working in federal government, by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us today. Uh, we greatly appreciate your time and look forward to future conversations. Thank you.
SUMMARY :
So as we look at fraud and across So as we also look at a report So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, Um, and we can also look at more, uh, advanced data sources So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, um, And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing Um, and you know, before I get into the technical details, uh, I want to talk about how this It could be in the data center or even on edge devices, and this data needs to be collected so At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage, uh, clutter SQL stream builder, obtain the accuracy of the performance, the X one, uh, scores that we want, And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, uh, to give you some insights
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Cindy Mikey | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Molly | PERSON | 0.99+ |
Nasheb Ismaily | PERSON | 0.99+ |
PWC | ORGANIZATION | 0.99+ |
Joe | PERSON | 0.99+ |
Cindy | PERSON | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
Cindy Maike | PERSON | 0.99+ |
Today | DATE | 0.99+ |
over $65 billion | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
NIFA | ORGANIZATION | 0.99+ |
over $51 billion | QUANTITY | 0.99+ |
57 billion | QUANTITY | 0.99+ |
salty | PERSON | 0.99+ |
single | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
Jason | PERSON | 0.98+ |
one | QUANTITY | 0.97+ |
91 billion | QUANTITY | 0.97+ |
IRS | ORGANIZATION | 0.96+ |
Shev | PERSON | 0.95+ |
both | QUANTITY | 0.95+ |
Avro | PERSON | 0.94+ |
Apache | ORGANIZATION | 0.93+ |
eight | QUANTITY | 0.93+ |
$148 billion | QUANTITY | 0.92+ |
zero changes | QUANTITY | 0.91+ |
Richmond | LOCATION | 0.91+ |
Sheva | PERSON | 0.88+ |
single technology | QUANTITY | 0.86+ |
Cloudera | TITLE | 0.85+ |
Monte Carlo | TITLE | 0.84+ |
eight times | QUANTITY | 0.83+ |
cloudera.com | OTHER | 0.79+ |
Kafka | TITLE | 0.77+ |
second | QUANTITY | 0.77+ |
one individual | QUANTITY | 0.76+ |
coffin | PERSON | 0.72+ |
Kafka | PERSON | 0.69+ |
Jace | TITLE | 0.69+ |
SQL | TITLE | 0.68+ |
17 | QUANTITY | 0.68+ |
over half | QUANTITY | 0.63+ |
Chevy | ORGANIZATION | 0.57+ |
elements | QUANTITY | 0.56+ |
half | QUANTITY | 0.56+ |
mini five | COMMERCIAL_ITEM | 0.54+ |
Apache Flink | ORGANIZATION | 0.52+ |
HBase | TITLE | 0.45+ |
F1 Racing at the Edge of Real-Time Data: Omer Asad, HPE & Matt Cadieux, Red Bull Racing
>>Edge computing is predict, projected to be a multi-trillion dollar business. You know, it's hard to really pinpoint the size of this market. Let alone fathom the potential of bringing software, compute, storage, AI, and automation to the edge and connecting all that to clouds and on-prem systems. But what, you know, what is the edge? Is it factories? Is it oil rigs, airplanes, windmills, shipping containers, buildings, homes, race cars. Well, yes and so much more. And what about the data for decades? We've talked about the data explosion. I mean, it's mind boggling, but guess what, we're gonna look back in 10 years and laugh. What we thought was a lot of data in 2020, perhaps the best way to think about edge is not as a place, but when is the most logical opportunity to process the data and maybe it's the first opportunity to do so where it can be decrypted and analyzed at very low latencies that that defines the edge. And so by locating compute as close as possible to the sources of data, to reduce latency and maximize your ability to get insights and return them to users quickly, maybe that's where the value lies. Hello everyone. And welcome to this cube conversation. My name is Dave Vellante and with me to noodle on these topics is Omar Assad, VP, and GM of primary storage and data management services at HPE. Hello, Omer. Welcome to the program. >>Hey Steve. Thank you so much. Pleasure to be here. >>Yeah. Great to see you again. So how do you see the edge in the broader market shaping up? >>Uh, David? I think that's a super important, important question. I think your ideas are quite aligned with how we think about it. Uh, I personally think, you know, as enterprises are accelerating their sort of digitization and asset collection and data collection, uh, they're typically, especially in a distributed enterprise, they're trying to get to their customers. They're trying to minimize the latency to their customers. So especially if you look across industries manufacturing, which is distributed factories all over the place, they are going through a lot of factory transformations where they're digitizing their factories. That means a lot more data is being now being generated within their factories. A lot of robot automation is going on that requires a lot of compute power to go out to those particular factories, which is going to generate their data out there. We've got insurance companies, banks that are creating and interviewing and gathering more customers out at the edge for that. >>They need a lot more distributed processing out at the edge. What this is requiring is what we've seen is across analysts. A common consensus is that more than 50% of an enterprise is data, especially if they operate globally around the world is going to be generated out at the edge. What does that mean? More data is new data is generated at the edge, but needs to be stored. It needs to be processed data. What is not required needs to be thrown away or classified as not important. And then it needs to be moved for Dr. Purposes either to a central data center or just to another site. So overall in order to give the best possible experience for manufacturing, retail, uh, you know, especially in distributed enterprises, people are generating more and more data centric assets out at the edge. And that's what we see in the industry. >>Yeah. We're definitely aligned on that. There's some great points. And so now, okay. You think about all this diversity, what's the right architecture for these deploying multi-site deployments, robo edge. How do you look at that? >>Oh, excellent question. So now it's sort of, you know, obviously you want every customer that we talk to wants SimpliVity, uh, in, in, and, and, and, and no pun intended because SimpliVity is reasoned with a simplistic edge centric architecture, right? So because let's, let's take a few examples. You've got large global retailers, uh, they have hundreds of global retail stores around the world that is generating data that is producing data. Then you've got insurance companies, then you've got banks. So when you look at a distributed enterprise, how do you deploy in a very simple and easy to deploy manner, easy to lifecycle, easy to mobilize and easy to lifecycle equipment out at the edge. What are some of the challenges that these customers deal with these customers? You don't want to send a lot of ID staff out there because that adds costs. You don't want to have islands of data and islands of storage and promote sites, because that adds a lot of States outside of the data center that needs to be protected. >>And then last but not the least, how do you push lifecycle based applications, new applications out at the edge in a very simple to deploy better. And how do you protect all this data at the edge? So the right architecture in my opinion, needs to be extremely simple to deploy. So storage, compute and networking, uh, out towards the edge in a hyperconverged environment. So that's, we agree upon that. It's a very simple to deploy model, but then comes, how do you deploy applications on top of that? How do you manage these applications on top of that? How do you back up these applications back towards the data center, all of this keeping in mind that it has to be as zero touch as possible. We at HBS believe that it needs to be extremely simple. Just give me two cables, a network cable, a power cable, tied it up, connected to the network, push it state from the data center and back up at state from the ed back into the data center. Extremely simple. >>It's gotta be simple because you've got so many challenges. You've got physics that you have to deal your latency to deal with. You got RPO and RTO. What happens if something goes wrong, you've gotta be able to recover quickly. So, so that's great. Thank you for that. Now you guys have hard news. W what is new from HPE in this space >>From a, from a, from a, from a deployment perspective, you know, HPE SimpliVity is just gaining like it's exploding, like crazy, especially as distributed enterprises adopt it as it's standardized edge architecture, right? It's an HCI box has got stories, computer networking, all in one. But now what we have done is not only you can deploy applications all from your standard V-Center interface, from a data center, what have you have now added is the ability to backup to the cloud, right? From the edge. You can also back up all the way back to your core data center. All of the backup policies are fully automated and implemented in the, in the distributed file system. That is the heart and soul of, of the SimpliVity installation. In addition to that, the customers now do not have to buy any third-party software into backup is fully integrated in the architecture and it's van efficient. >>In addition to that, now you can backup straight to the client. You can backup to a central, uh, high-end backup repository, which is in your data center. And last but not least, we have a lot of customers that are pushing the limit in their application transformation. So not only do we previously were, were one-on-one them leaving VMware deployments out at the edge sites. Now revolver also added both stateful and stateless container orchestration, as well as data protection capabilities for containerized applications out at the edge. So we have a lot, we have a lot of customers that are now deploying containers, rapid manufacturing containers to process data out at remote sites. And that allows us to not only protect those stateful applications, but back them up, back into the central data center. >>I saw in that chart, it was a light on no egress fees. That's a pain point for a lot of CEOs that I talked to. They grit their teeth at those entities. So, so you can't comment on that or >>Excellent, excellent question. I'm so glad you brought that up and sort of at that point, uh, uh, pick that up. So, uh, along with SimpliVity, you know, we have the whole green Lake as a service offering as well. Right? So what that means, Dave, is that we can literally provide our customers edge as a service. And when you compliment that with, with Aruba wired wireless infrastructure, that goes at the edge, the hyperconverged infrastructure, as part of SimpliVity, that goes at the edge, you know, one of the things that was missing with cloud backups is the every time you backup to the cloud, which is a great thing, by the way, anytime you restore from the cloud, there is that breastfeed, right? So as a result of that, as part of the GreenLake offering, we have cloud backup service natively now offered as part of HPE, which is included in your HPE SimpliVity edge as a service offering. So now not only can you backup into the cloud from your edge sites, but you can also restore back without any egress fees from HBS data protection service. Either you can restore it back onto your data center, you can restore it back towards the edge site and because the infrastructure is so easy to deploy centrally lifecycle manage, it's very mobile. So if you want to deploy and recover to a different site, you could also do that. >>Nice. Hey, uh, can you, Omar, can you double click a little bit on some of the use cases that customers are choosing SimpliVity for, particularly at the edge, and maybe talk about why they're choosing HPE? >>What are the major use cases that we see? Dave is obviously, uh, easy to deploy and easy to manage in a standardized form factor, right? A lot of these customers, like for example, we have large retailer across the us with hundreds of stores across us. Right now you cannot send service staff to each of these stores. These data centers are their data center is essentially just a closet for these guys, right? So now how do you have a standardized deployment? So standardized deployment from the data center, which you can literally push out and you can connect a network cable and a power cable, and you're up and running, and then automated backup elimination of backup and state and BR from the edge sites and into the data center. So that's one of the big use cases to rapidly deploy new stores, bring them up in a standardized configuration, both from a hardware and a software perspective, and the ability to backup and recover that instantly. >>That's one large use case. The second use case that we see actually refers to a comment that you made in your opener. Dave was where a lot of these customers are generating a lot of the data at the edge. This is robotics automation that is going to up in manufacturing sites. These is racing teams that are out at the edge of doing post-processing of their cars data. Uh, at the same time, there is disaster recovery use cases where you have, uh, you know, campsites and local, uh, you know, uh, agencies that go out there for humanity's benefit. And they move from one site to the other. It's a very, very mobile architecture that they need. So those, those are just a few cases where we were deployed. There was a lot of data collection, and there's a lot of mobility involved in these environments. So you need to be quick to set up quick, to up quick, to recover, and essentially you're up to your next, next move. >>You seem pretty pumped up about this, uh, this new innovation and why not. >>It is, it is, uh, you know, especially because, you know, it is, it has been taught through with edge in mind and edge has to be mobile. It has to be simple. And especially as, you know, we have lived through this pandemic, which, which I hope we see the tail end of it in at least 2021, or at least 2022. They, you know, one of the most common use cases that we saw, and this was an accidental discovery. A lot of the retail sites could not go out to service their stores because, you know, mobility is limited in these, in these strange times that we live in. So from a central center, you're able to deploy applications, you're able to recover applications. And, and a lot of our customers said, Hey, I don't have enough space in my data center to back up. Do you have another option? So then we rolled out this update release to SimpliVity verse from the edge site. You can now directly back up to our backup service, which is offered on a consumption basis to the customers, and they can recover that anywhere they want. >>Fantastic Omer, thanks so much for coming on the program today. >>It's a pleasure, Dave. Thank you. >>All right. Awesome to see you. Now, let's hear from red bull racing and HPE customer, that's actually using SimpliVity at the edge. Countdown really begins when the checkered flag drops on a Sunday. It's always about this race to manufacture >>The next designs to make it more adapt to the next circuit to run those. Of course, if we can't manufacture the next component in time, all that will be wasted. >>Okay. We're back with Matt kudu, who is the CIO of red bull racing? Matt, it's good to see you again. >>Great to say, >>Hey, we're going to dig into a real-world example of using data at the edge and in near real time to gain insights that really lead to competitive advantage. But, but first Matt, tell us a little bit about red bull racing and your role there. >>Sure. So I'm the CIO at red bull racing and that red bull race. And we're based in Milton Keynes in the UK. And the main job job for us is to design a race car, to manufacture the race car, and then to race it around the world. So as CIO, we need to develop the ITT group needs to develop the applications is the design, manufacturing racing. We also need to supply all the underlying infrastructure and also manage security. So it's really interesting environment. That's all about speed. So this season we have 23 races and we need to tear the car apart and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 20 a movable deadlines, um, this big evolving prototype to manage with our car. Um, but we're also improving all of our tools and methods and software that we use to design and make and race the car. >>So we have a big can do attitude of the company around continuous improvement. And the expectations are that we continuously make the car faster. That we're, that we're winning races, that we improve our methods in the factory and our tools. And, um, so for, I take it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility, agility, and needs. So my job is, is really to make sure we have the right staff, the right partners, the right technical platforms. So we can live up to expectations >>That tear down and rebuild for 23 races. Is that because each track has its own unique signature that you have to tune to, or are there other factors involved there? >>Yeah, exactly. Every track has a different shape. Some have lots of strengths. Some have lots of curves and lots are in between. Um, the track surface is very different and the impact that has some tires, um, the temperature and the climate is very different. Some are hilly, some, a big curves that affect the dynamics of the power. So all that in order to win, you need to micromanage everything and optimize it for any given race track. >>Talk about some of the key drivers in your business and some of the key apps that give you a competitive advantage to help you win races. >>Yeah. So in our business, everything is all about speed. So the car obviously needs to be fast, but also all of our business operations needed to be fast. We need to be able to design a car and it's all done in the virtual world, but the, the virtual simulations and designs need to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulation is the algorithms and have all the underlying infrastructure that runs it quickly and reliably. Um, in manufacturing, um, we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running and helping us do that. And at the race track itself in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race, we have split seconds, literally seconds to alter our race strategy if an event happens. So if there's an accident, um, and the safety car comes out, or the weather changes, we revise our tactics and we're running Monte Carlo for example. And he is an experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors, all of that needs it. Um, so work at a very high level. >>It's interesting. I mean, as a lay person, historically we know when I think about technology and car racing, of course, I think about the mechanical aspects of a self-propelled vehicle, the electronics and the light, but not necessarily the data, but the data's always been there. Hasn't it? I mean, maybe in the form of like tribal knowledge, if somebody who knows the track and where the Hills are and experience and gut feel, but today you're digitizing it and you're, you're processing it and close to real time. >>It's amazing. I think exactly right. Yeah. The car's instrumented with sensors, we post-process at Virgin, um, video, um, image analysis, and we're looking at our car, our competitor's car. So there's a huge amount of, um, very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah. The data and the applications that can leverage it are really key. Um, and that's a critical success factor for us. >>So let's talk about your data center at the track, if you will. I mean, if I can call it that paint a picture for us, what does that look like? >>So we have to send, um, a lot of equipment to the track at the edge. Um, and even though we have really a great wide area network linked back to the factory and there's cloud resources, a lot of the trucks are very old. You don't have hardened infrastructure, don't have ducks that protect cabling, for example, and you could lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions, all that needs to be at the edge where the car operates. So historically we had three racks of equipment, like a safe infrastructure, um, and it was really hard to manage, um, to make changes. It was too flexible. Um, there were multiple panes of glass, um, and, um, and it was too slow. It didn't run her applications quickly. Um, it was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. >>So we, um, we'd, we'd introduced hyperconvergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said, there's a lot smarter way of operating. We can get rid of all the slow and flexible, expensive legacy and introduce hyperconvergence. And we saw really excellent benefits for doing that. Um, we saw a three X speed up for a lot of our applications. So I'm here where we're post-processing data, and we have to make decisions about race strategy. Time is of the essence in a three X reduction in processing time really matters. Um, we also, um, were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a hundred thousand dollars a year in freight costs by shipping less equipment, um, things like backup, um, mistakes happen. >>Sometimes the user makes a mistake. So for example, a race engineer could load the wrong data map into one of our simulations. And we could restore that VDI through SimpliVity backup at 90 seconds. And this makes sure it enables engineers to focus on the car to make better decisions without having downtime. And we sent them to, I take guys to every race they're managing 60 users, a really diverse environment, juggling a lot of balls and having a simple management platform like HPE SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >>Yeah. So you had the nice Petri dish and the factory. So it sounds like your, your goals, obviously your number one KPI is speed to help shave seconds time, but also costs just the simplicity of setting up the infrastructure. >>Yeah. It's speed. Speed, speed. So we want applications absolutely fly, you know, get to actionable results quicker, um, get answers from our simulations quicker. The other area that speed's really critical is, um, our applications are also evolving prototypes, and we're always, the models are getting bigger. The simulations are getting bigger and they need more and more resource and being able to spin up resource and provision things without being a bottleneck is a big challenge in SimpliVity. It gives us the means of doing that. >>So did you consider any other options or was it because you had the factory knowledge? It was HCI was, you know, very clearly the option. What did you look at? >>Yeah, so, um, we have over five years of experience in the factory and we eliminated all of our legacy, um, um, infrastructure five years ago. And the benefits I've described, um, at the track, we saw that in the factory, um, at the track we have a three-year operational life cycle for our equipment. When into 2017 was the last year we had legacy as we were building for 2018. It was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized time has even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >>Why, why SimpliVity? What why'd you choose HPE SimpliVity? >>Yeah. So when we first heard about hyperconverged way back in the, in the factory, um, we had, um, a legacy infrastructure, overly complicated, too slow, too inflexible, too expensive. And we stepped back and said, there has to be a smarter way of operating. We went out and challenged our technology partners. We learned about hyperconvergence within enough, the hype, um, was real or not. So we underwent some PLCs and benchmarking and, and the, the PLCs were really impressive. And, and all these, you know, speed and agility benefits, we saw an HP for our use cases was the clear winner in the benchmarks. So based on that, we made an initial investment in the factory. Uh, we moved about 150 VMs in the 150 VDI into it. Um, and then as, as we've seen all the benefits we've successfully invested, and we now have, um, an estate to the factory of about 800 VMs and about 400 VDI. So it's been a great platform and it's allowed us to really push boundaries and, and give the business, um, the service that expects. >>So w was that with the time in which you were able to go from data to insight to recommendation or, or edict, uh, was that compressed, you kind of indicated that, but >>So we, we all telemetry from the car and we post-process it, and that reprocessing time really it's very time consuming. And, um, you know, we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time and all, ultimately that meant an engineer could understand what the car was during a practice session, recommend a tweak to the configuration or setup of it, and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >>Such a great example. How are you guys feeling about the season, Matt? What's the team's sentiment? >>Yeah, I think we're optimistic. Um, we w we, um, uh, we have a new driver >>Lineup. Uh, we have, um, max for stopping his carries on with the team and Sergio joins the team. So we're really excited about this year and, uh, we want to go and win races. Great, Matt, good luck this season and going forward and thanks so much for coming back in the cube. Really appreciate it. And it's my pleasure. Great talking to you again. Okay. Now we're going to bring back Omer for quick summary. So keep it real >>Without having solutions from HB, we can't drive those five senses, CFD aerodynamics that would undermine the simulations being software defined. We can bring new apps into play. If we can bring new them's storage, networking, all of that can be highly advises is a hugely beneficial partnership for us. We're able to be at the cutting edge of technology in a highly stressed environment. That is no bigger challenge than the formula. >>Okay. We're back with Omar. Hey, what did you think about that interview with Matt? >>Great. Uh, I have to tell you I'm a big formula one fan, and they are one of my favorite customers. Uh, so, you know, obviously, uh, one of the biggest use cases as you saw for red bull racing is Trackside deployments. There are now 22 races in a season. These guys are jumping from one city to the next, they've got to pack up, move to the next city, set up, set up the infrastructure very, very quickly and average formula. One car is running the thousand plus sensors on that is generating a ton of data on track side that needs to be collected very quickly. It needs to be processed very quickly, and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull back factory back at the data center. What does this all need? It needs reliability. >>It needs compute power in a very short form factor. And it needs agility quick to set up quick, to go quick, to recover. And then in post processing, they need to have CPU density so they can pack more VMs out at the edge to be able to do that processing now. And we accomplished that for, for the red bull racing guys in basically two are you have two SimpliVity nodes that are running track side and moving with them from one, one race to the next race, to the next race. And every time those SimpliVity nodes connect up to the data center collector to a satellite, they're backing up back to their data center. They're sending snapshots of data back to the data center, essentially making their job a whole lot easier, where they can focus on racing and not on troubleshooting virtual machines, >>Red bull racing and HPE SimpliVity. Great example. It's agile, it's it's cost efficient, and it shows a real impact. Thank you very much. I really appreciate those summary comments. Thank you, Dave. Really appreciate it. All right. And thank you for watching. This is Dave Volante. >>You.
SUMMARY :
as close as possible to the sources of data, to reduce latency and maximize your ability to get Pleasure to be here. So how do you see the edge in the broader market shaping up? A lot of robot automation is going on that requires a lot of compute power to go out to More data is new data is generated at the edge, but needs to be stored. How do you look at that? a lot of States outside of the data center that needs to be protected. We at HBS believe that it needs to be extremely simple. You've got physics that you have to deal your latency to deal with. In addition to that, the customers now do not have to buy any third-party In addition to that, now you can backup straight to the client. So, so you can't comment on that or So as a result of that, as part of the GreenLake offering, we have cloud backup service natively are choosing SimpliVity for, particularly at the edge, and maybe talk about why from the data center, which you can literally push out and you can connect a network cable at the same time, there is disaster recovery use cases where you have, uh, out to service their stores because, you know, mobility is limited in these, in these strange times that we always about this race to manufacture The next designs to make it more adapt to the next circuit to run those. it's good to see you again. insights that really lead to competitive advantage. So this season we have 23 races and we So my job is, is really to make sure we have the right staff, that you have to tune to, or are there other factors involved there? So all that in order to win, you need to micromanage everything and optimize it for Talk about some of the key drivers in your business and some of the key apps that So all of that requires a lot of expertise to develop the simulation is the algorithms I mean, maybe in the form of like tribal So there's a huge amount of, um, very complicated models that So let's talk about your data center at the track, if you will. So the applications we need to operate the car and to make really Time is of the essence in a three X reduction in processing So for example, a race engineer could load the wrong but also costs just the simplicity of setting up the infrastructure. So we want applications absolutely fly, So did you consider any other options or was it because you had the factory knowledge? And the benefits that we see with hyper-converged actually mattered even more at the edge And, and all these, you know, speed and agility benefits, we saw an HP So we saw big, big reductions in time and all, How are you guys feeling about the season, Matt? we have a new driver Great talking to you again. We're able to be at Hey, what did you think about that interview with Matt? and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull And we accomplished that for, for the red bull racing guys in And thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Sergio | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
two racks | QUANTITY | 0.99+ |
Steve | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
Omar | PERSON | 0.99+ |
Omar Assad | PERSON | 0.99+ |
2018 | DATE | 0.99+ |
Matt Cadieux | PERSON | 0.99+ |
20 | QUANTITY | 0.99+ |
Red Bull Racing | ORGANIZATION | 0.99+ |
HBS | ORGANIZATION | 0.99+ |
Milton Keynes | LOCATION | 0.99+ |
2017 | DATE | 0.99+ |
23 races | QUANTITY | 0.99+ |
60 users | QUANTITY | 0.99+ |
22 races | QUANTITY | 0.99+ |
three-year | QUANTITY | 0.99+ |
90 seconds | QUANTITY | 0.99+ |
eight minutes | QUANTITY | 0.99+ |
Omer Asad | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
two cables | QUANTITY | 0.99+ |
One car | QUANTITY | 0.99+ |
more than 50% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
nine | QUANTITY | 0.99+ |
each track | QUANTITY | 0.99+ |
ITT | ORGANIZATION | 0.99+ |
SimpliVity | TITLE | 0.99+ |
last year | DATE | 0.99+ |
two minutes | QUANTITY | 0.99+ |
Virgin | ORGANIZATION | 0.99+ |
HPE SimpliVity | TITLE | 0.99+ |
three racks | QUANTITY | 0.99+ |
Matt kudu | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
hundreds of stores | QUANTITY | 0.99+ |
five senses | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
about 800 VMs | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
green Lake | ORGANIZATION | 0.98+ |
about 400 VDI | QUANTITY | 0.98+ |
10 years | QUANTITY | 0.98+ |
second use case | QUANTITY | 0.98+ |
one city | QUANTITY | 0.98+ |
Aruba | ORGANIZATION | 0.98+ |
one site | QUANTITY | 0.98+ |
five years ago | DATE | 0.98+ |
F1 Racing | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
SimpliVity | ORGANIZATION | 0.98+ |
this year | DATE | 0.98+ |
150 VDI | QUANTITY | 0.98+ |
about 150 VMs | QUANTITY | 0.98+ |
Sunday | DATE | 0.98+ |
red bull | ORGANIZATION | 0.97+ |
first | QUANTITY | 0.97+ |
Omer | PERSON | 0.97+ |
multi-trillion dollar | QUANTITY | 0.97+ |
over five years | QUANTITY | 0.97+ |
one large use case | QUANTITY | 0.97+ |
first opportunity | QUANTITY | 0.97+ |
HPE | ORGANIZATION | 0.97+ |
each | QUANTITY | 0.96+ |
decades | QUANTITY | 0.96+ |
one ratios | QUANTITY | 0.96+ |
HP | ORGANIZATION | 0.96+ |
one race | QUANTITY | 0.95+ |
GreenLake | ORGANIZATION | 0.94+ |
Omer Asad, HPE ft Matt Cadieux, Red Bull Racing full v1 (UNLISTED)
(upbeat music) >> Edge computing is projected to be a multi-trillion dollar business. It's hard to really pinpoint the size of this market let alone fathom the potential of bringing software, compute, storage, AI and automation to the edge and connecting all that to clouds and on-prem systems. But what is the edge? Is it factories? Is it oil rigs, airplanes, windmills, shipping containers, buildings, homes, race cars. Well, yes and so much more. And what about the data? For decades we've talked about the data explosion. I mean, it's a mind-boggling but guess what we're going to look back in 10 years and laugh what we thought was a lot of data in 2020. Perhaps the best way to think about Edge is not as a place but when is the most logical opportunity to process the data and maybe it's the first opportunity to do so where it can be decrypted and analyzed at very low latencies. That defines the edge. And so by locating compute as close as possible to the sources of data to reduce latency and maximize your ability to get insights and return them to users quickly, maybe that's where the value lies. Hello everyone and welcome to this CUBE conversation. My name is Dave Vellante and with me to noodle on these topics is Omer Asad, VP and GM of Primary Storage and Data Management Services at HPE. Hello Omer, welcome to the program. >> Thanks Dave. Thank you so much. Pleasure to be here. >> Yeah. Great to see you again. So how do you see the edge in the broader market shaping up? >> Dave, I think that's a super important question. I think your ideas are quite aligned with how we think about it. I personally think enterprises are accelerating their sort of digitization and asset collection and data collection, they're typically especially in a distributed enterprise, they're trying to get to their customers. They're trying to minimize the latency to their customers. So especially if you look across industries manufacturing which has distributed factories all over the place they are going through a lot of factory transformations where they're digitizing their factories. That means a lot more data is now being generated within their factories. A lot of robot automation is going on, that requires a lot of compute power to go out to those particular factories which is going to generate their data out there. We've got insurance companies, banks, that are creating and interviewing and gathering more customers out at the edge for that. They need a lot more distributed processing out at the edge. What this is requiring is what we've seen is across analysts. A common consensus is this that more than 50% of an enterprises data especially if they operate globally around the world is going to be generated out at the edge. What does that mean? New data is generated at the edge what needs to be stored. It needs to be processed data. Data which is not required needs to be thrown away or classified as not important. And then it needs to be moved for DR purposes either to a central data center or just to another site. So overall in order to give the best possible experience for manufacturing, retail, especially in distributed enterprises, people are generating more and more data centric assets out at the edge. And that's what we see in the industry. >> Yeah. We're definitely aligned on that. There's some great points and so now, okay. You think about all this diversity what's the right architecture for these multi-site deployments, ROBO, edge? How do you look at that? >> Oh, excellent question, Dave. Every customer that we talked to wants SimpliVity and no pun intended because SimpliVity is reasoned with a simplistic edge centric architecture, right? Let's take a few examples. You've got large global retailers, they have hundreds of global retail stores around the world that is generating data that is producing data. Then you've got insurance companies, then you've got banks. So when you look at a distributed enterprise how do you deploy in a very simple and easy to deploy manner, easy to lifecycle, easy to mobilize and easy to lifecycle equipment out at the edge. What are some of the challenges that these customers deal with? These customers, you don't want to send a lot of IT staff out there because that adds cost. You don't want to have islands of data and islands of storage and promote sites because that adds a lot of states outside of the data center that needs to be protected. And then last but not the least how do you push lifecycle based applications, new applications out at the edge in a very simple to deploy manner. And how do you protect all this data at the edge? So the right architecture in my opinion needs to be extremely simple to deploy so storage compute and networking out towards the edge in a hyper converged environment. So that's we agree upon that. It's a very simple to deploy model but then comes how do you deploy applications on top of that? How do you manage these applications on top of that? How do you back up these applications back towards the data center, all of this keeping in mind that it has to be as zero touch as possible. We at HPE believe that it needs to be extremely simple, just give me two cables, a network cable, a power cable, fire it up, connect it to the network, push it state from the data center and back up it state from the edge back into the data center, extremely simple. >> It's got to be simple 'cause you've got so many challenges. You've got physics that you have to deal, you have latency to deal with. You got RPO and RTO. What happens if something goes wrong you've got to be able to recover quickly. So that's great. Thank you for that. Now you guys have heard news. What is new from HPE in this space? >> Excellent question, great. So from a deployment perspective, HPE SimpliVity is just gaining like it's exploding like crazy especially as distributed enterprises adopted as it's standardized edge architecture, right? It's an HCI box has got storage computer networking all in one. But now what we have done is not only you can deploy applications all from your standard V-Center interface from a data center, what have you have now added is the ability to backup to the cloud right from the edge. You can also back up all the way back to your core data center. All of the backup policies are fully automated and implemented in the distributed file system that is the heart and soul of the SimpliVity installation. In addition to that, the customers now do not have to buy any third-party software. Backup is fully integrated in the architecture and it's then efficient. In addition to that now you can backup straight to the client. You can back up to a central high-end backup repository which is in your data center. And last but not least, we have a lot of customers that are pushing the limit in their application transformation. So not only, we previously were one-on-one leaving VMware deployments out at the edge site now evolved also added both stateful and stateless container orchestration as well as data protection capabilities for containerized applications out at the edge. So we have a lot of customers that are now deploying containers, rapid manufacture containers to process data out at remote sites. And that allows us to not only protect those stateful applications but back them up back into the central data center. >> I saw in that chart, it was a line no egress fees. That's a pain point for a lot of CIOs that I talked to. They grit their teeth at those cities. So you can't comment on that or? >> Excellent question. I'm so glad you brought that up and sort of at the point that pick that up. So along with SimpliVity, we have the whole Green Lake as a service offering as well, right? So what that means Dave is, that we can literally provide our customers edge as a service. And when you compliment that with with Aruba Wired Wireless Infrastructure that goes at the edge, the hyperconverged infrastructure as part of SimpliVity that goes at the edge. One of the things that was missing with cloud backups is that every time you back up to the cloud, which is a great thing by the way, anytime you restore from the cloud there is that egress fee, right? So as a result of that, as part of the GreenLake offering we have cloud backup service natively now offered as part of HPE, which is included in your HPE SimpliVity edge as a service offering. So now not only can you backup into the cloud from your edge sites, but you can also restore back without any egress fees from HPE's data protection service. Either you can restore it back onto your data center, you can restore it back towards the edge site and because the infrastructure is so easy to deploy centrally lifecycle manage, it's very mobile. So if you want to deploy and recover to a different site, you could also do that. >> Nice. Hey, can you, Omer, can you double click a little bit on some of the use cases that customers are choosing SimpliVity for particularly at the edge and maybe talk about why they're choosing HPE? >> Excellent question. So one of the major use cases that we see Dave is obviously easy to deploy and easy to manage in a standardized form factor, right? A lot of these customers, like for example, we have large retailer across the US with hundreds of stores across US, right? Now you cannot send service staff to each of these stores. Their data center is essentially just a closet for these guys, right? So now how do you have a standardized deployment? So standardized deployment from the data center which you can literally push out and you can connect a network cable and a power cable and you're up and running and then automated backup, elimination of backup and state and DR from the edge sites and into the data center. So that's one of the big use cases to rapidly deploy new stores, bring them up in a standardized configuration both from a hardware and a software perspective and the ability to backup and recover that instantly. That's one large use case. The second use case that we see actually refers to a comment that you made in your opener, Dave, was when a lot of these customers are generating a lot of the data at the edge. This is robotics automation that is going up in manufacturing sites. These is racing teams that are out at the edge of doing post-processing of their cars data. At the same time there is disaster recovery use cases where you have campsites and local agencies that go out there for humanity's benefit. And they move from one site to the other. It's a very, very mobile architecture that they need. So those are just a few cases where we were deployed. There was a lot of data collection and there was a lot of mobility involved in these environments, so you need to be quick to set up, quick to backup, quick to recover. And essentially you're up to your next move. >> You seem pretty pumped up about this new innovation and why not. >> It is, especially because it has been taught through with edge in mind and edge has to be mobile. It has to be simple. And especially as we have lived through this pandemic which I hope we see the tail end of it in at least 2021 or at least 2022. One of the most common use cases that we saw and this was an accidental discovery. A lot of the retail sites could not go out to service their stores because mobility is limited in these strange times that we live in. So from a central recenter you're able to deploy applications. You're able to recover applications. And a lot of our customers said, hey I don't have enough space in my data center to back up. Do you have another option? So then we rolled out this update release to SimpliVity verse from the edge site. You can now directly back up to our backup service which is offered on a consumption basis to the customers and they can recover that anywhere they want. >> Fantastic Omer, thanks so much for coming on the program today. >> It's a pleasure, Dave. Thank you. >> All right. Awesome to see you, now, let's hear from Red Bull Racing an HPE customer that's actually using SimpliVity at the edge. (engine revving) >> Narrator: Formula one is a constant race against time Chasing in tens of seconds. (upbeat music) >> Okay. We're back with Matt Cadieux who is the CIO Red Bull Racing. Matt, it's good to see you again. >> Great to see you Dave. >> Hey, we're going to dig in to a real world example of using data at the edge in near real time to gain insights that really lead to competitive advantage. But first Matt tell us a little bit about Red Bull Racing and your role there. >> Sure. So I'm the CIO at Red Bull Racing and at Red Bull Racing we're based in Milton Keynes in the UK. And the main job for us is to design a race car, to manufacture the race car and then to race it around the world. So as CIO, we need to develop, the IT group needs to develop the applications use the design, manufacturing racing. We also need to supply all the underlying infrastructure and also manage security. So it's really interesting environment that's all about speed. So this season we have 23 races and we need to tear the car apart and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 23 and movable deadlines this big evolving prototype to manage with our car but we're also improving all of our tools and methods and software that we use to design make and race the car. So we have a big can-do attitude of the company around continuous improvement. And the expectations are that we continue to say, make the car faster. That we're winning races, that we improve our methods in the factory and our tools. And so for IT it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility of needs. So my job is really to make sure we have the right staff, the right partners, the right technical platforms. So we can live up to expectations. >> And Matt that tear down and rebuild for 23 races, is that because each track has its own unique signature that you have to tune to or are there other factors involved? >> Yeah, exactly. Every track has a different shape. Some have lots of straight, some have lots of curves and lots are in between. The track surface is very different and the impact that has on tires, the temperature and the climate is very different. Some are hilly, some have big curbs that affect the dynamics of the car. So all that in order to win you need to micromanage everything and optimize it for any given race track. >> COVID has of course been brutal for sports. What's the status of your season? >> So this season we knew that COVID was here and we're doing 23 races knowing we have COVID to manage. And as a premium sporting team with Pharma Bubbles we've put health and safety and social distancing into our environment. And we're able to able to operate by doing things in a safe manner. We have some special exceptions in the UK. So for example, when people returned from overseas that they did not have to quarantine for two weeks, but they get tested multiple times a week. And we know they're safe. So we're racing, we're dealing with all the hassle that COVID gives us. And we are really hoping for a return to normality sooner instead of later where we can get fans back at the track and really go racing and have the spectacle where everyone enjoys it. >> Yeah. That's awesome. So important for the fans but also all the employees around that ecosystem. Talk about some of the key drivers in your business and some of the key apps that give you competitive advantage to help you win races. >> Yeah. So in our business, everything is all about speed. So the car obviously needs to be fast but also all of our business operations need to be fast. We need to be able to design a car and it's all done in the virtual world, but the virtual simulations and designs needed to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulations, the algorithms and have all the underlying infrastructure that runs it quickly and reliably. In manufacturing we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running and helping us do that. And at the race track itself. And in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race we have split seconds literally seconds to alter our race strategy if an event happens. So if there's an accident and the safety car comes out or the weather changes, we revise our tactics and we're running Monte-Carlo for example. And use an experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors. All of that needs IT to work at a very high level. >> Yeah, it's interesting. I mean, as a lay person, historically when I think about technology in car racing, of course I think about the mechanical aspects of a self-propelled vehicle, the electronics and the light but not necessarily the data but the data's always been there. Hasn't it? I mean, maybe in the form of like tribal knowledge if you are somebody who knows the track and where the hills are and experience and gut feel but today you're digitizing it and you're processing it and close to real time. Its amazing. >> I think exactly right. Yeah. The car's instrumented with sensors, we post process and we are doing video image analysis and we're looking at our car, competitor's car. So there's a huge amount of very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah. The data and the applications that leverage it are really key and that's a critical success factor for us. >> So let's talk about your data center at the track, if you will. I mean, if I can call it that. Paint a picture for us what does that look like? >> So we have to send a lot of equipment to the track at the edge. And even though we have really a great wide area network link back to the factory and there's cloud resources a lot of the tracks are very old. You don't have hardened infrastructure, don't have ducks that protect cabling, for example and you can lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions all that needs to be at the edge where the car operates. So historically we had three racks of equipment like I said infrastructure and it was really hard to manage, to make changes, it was too flexible. There were multiple panes of glass and it was too slow. It didn't run our applications quickly. It was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. So we'd introduced hyper convergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said, there's a lot smarter way of operating. We can get rid of all the slow and flexible expensive legacy and introduce hyper convergence. And we saw really excellent benefits for doing that. We saw up three X speed up for a lot of our applications. So I'm here where we're post-processing data. And we have to make decisions about race strategy. Time is of the essence. The three X reduction in processing time really matters. We also were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a $100,000 a year in freight costs by shipping less equipment. Things like backup mistakes happen. Sometimes the user makes a mistake. So for example a race engineer could load the wrong data map into one of our simulations. And we could restore that DDI through SimpliVity backup at 90 seconds. And this enables engineers to focus on the car to make better decisions without having downtime. And we sent two IT guys to every race, they're managing 60 users a really diverse environment, juggling a lot of balls and having a simple management platform like HPE SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >> Yeah. So you had the nice Petri dish in the factory so it sounds like your goals are obviously number one KPIs speed to help shave seconds, awesome time, but also cost just the simplicity of setting up the infrastructure is-- >> That's exactly right. It's speed, speed, speed. So we want applications absolutely fly, get to actionable results quicker, get answers from our simulations quicker. The other area that speed's really critical is our applications are also evolving prototypes and we're always, the models are getting bigger. The simulations are getting bigger and they need more and more resource and being able to spin up resource and provision things without being a bottleneck is a big challenge in SimpliVity. It gives us the means of doing that. >> So did you consider any other options or was it because you had the factory knowledge? It was HCI was very clearly the option. What did you look at? >> Yeah, so we have over five years of experience in the factory and we eliminated all of our legacy infrastructure five years ago. And the benefits I've described at the track we saw that in the factory. At the track we have a three-year operational life cycle for our equipment. When in 2017 was the last year we had legacy as we were building for 2018, it was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized. Time is even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >> Why SimpliVity, why'd you choose HPE SimpliVity? >> Yeah. So when we first heard about hyper-converged way back in the factory, we had a legacy infrastructure overly complicated, too slow, too inflexible, too expensive. And we stepped back and said there has to be a smarter way of operating. We went out and challenged our technology partners, we learned about hyperconvergence, would enough the hype was real or not. So we underwent some PLCs and benchmarking and the PLCs were really impressive. And all these speed and agility benefits we saw and HPE for our use cases was the clear winner in the benchmarks. So based on that we made an initial investment in the factory. We moved about 150 VMs and 150 VDIs into it. And then as we've seen all the benefits we've successfully invested and we now have an estate in the factory of about 800 VMs and about 400 VDIs. So it's been a great platform and it's allowed us to really push boundaries and give the business the service it expects. >> Awesome fun stories, just coming back to the metrics for a minute. So you're running Monte Carlo simulations in real time and sort of near real-time. And so essentially that's if I understand it, that's what ifs and it's the probability of the outcome. And then somebody got to make, then the human's got to say, okay, do this, right? Was the time in which you were able to go from data to insight to recommendation or edict was that compressed and you kind of indicated that. >> Yeah, that was accelerated. And so in that use case, what we're trying to do is predict the future and you're saying, well and before any event happens, you're doing what ifs and if it were to happen, what would you probabilistic do? So that simulation, we've been running for awhile but it gets better and better as we get more knowledge. And so that we were able to accelerate that with SimpliVity but there's other use cases too. So we also have telemetry from the car and we post-process it. And that reprocessing time really, is it's very time consuming. And we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time. And ultimately that meant an engineer could understand what the car was doing in a practice session, recommend a tweak to the configuration or setup of it and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >> Such a great example. How are you guys feeling about the season, Matt? What's the team's sentiment? >> I think we're optimistic. Thinking our simulations that we have a great car we have a new driver lineup. We have the Max Verstapenn who carries on with the team and Sergio Cross joins the team. So we're really excited about this year and we want to go and win races. And I think with COVID people are just itching also to get back to a little degree of normality and going racing again even though there's no fans, it gets us into a degree of normality. >> That's great, Matt, good luck this season and going forward and thanks so much for coming back in theCUBE. Really appreciate it. >> It's my pleasure. Great talking to you again. >> Okay. Now we're going to bring back Omer for quick summary. So keep it right there. >> Narrator: That's where the data comes face to face with the real world. >> Narrator: Working with Hewlett Packard Enterprise is a hugely beneficial partnership for us. We're able to be at the cutting edge of technology in a highly technical, highly stressed environment. There is no bigger challenge than Formula One. (upbeat music) >> Being in the car and driving in on the limit that is the best thing out there. >> Narrator: It's that innovation and creativity to ultimately achieves winning of this. >> Okay. We're back with Omer. Hey, what did you think about that interview with Matt? >> Great. I have to tell you, I'm a big formula One fan and they are one of my favorite customers. So obviously one of the biggest use cases as you saw for Red Bull Racing is track side deployments. There are now 22 races in a season. These guys are jumping from one city to the next they got to pack up, move to the next city, set up the infrastructure very very quickly. An average Formula One car is running the thousand plus sensors on, that is generating a ton of data on track side that needs to be collected very quickly. It needs to be processed very quickly and then sometimes believe it or not snapshots of this data needs to be sent to the Red Bull back factory back at the data center. What does this all need? It needs reliability. It needs compute power in a very short form factor. And it needs agility quick to set up, quick to go, quick to recover. And then in post processing they need to have CPU density so they can pack more VMs out at the edge to be able to do that processing. And we accomplished that for the Red Bull Racing guys in basically two of you have two SimpliVity nodes that are running track side and moving with them from one race to the next race to the next race. And every time those SimpliVity nodes connect up to the data center, collect up to a satellite they're backing up back to their data center. They're sending snapshots of data back to the data center essentially making their job a whole lot easier where they can focus on racing and not on troubleshooting virtual machines. >> Red bull Racing and HPE SimpliVity. Great example. It's agile, it's it's cost efficient and it shows a real impact. Thank you very much Omer. I really appreciate those summary comments. >> Thank you, Dave. Really appreciate it. >> All right. And thank you for watching. This is Dave Volante for theCUBE. (upbeat music)
SUMMARY :
and connecting all that to Pleasure to be here. So how do you see the edge in And then it needs to be moved for DR How do you look at that? and easy to deploy It's got to be simple and implemented in the So you can't comment on that or? and because the infrastructure is so easy on some of the use cases and the ability to backup You seem pretty pumped up about A lot of the retail sites on the program today. It's a pleasure, Dave. SimpliVity at the edge. a constant race against time Matt, it's good to see you again. in to a real world example and then to race it around the world. So all that in order to win What's the status of your season? and have the spectacle So important for the fans So the car obviously needs to be fast and close to real time. and to continuously improve our car. data center at the track, So the applications we Petri dish in the factory and being able to spin up the factory knowledge? And the benefits that we see and the PLCs were really impressive. Was the time in which you And so that we were able to about the season, Matt? and Sergio Cross joins the team. and thanks so much for Great talking to you again. going to bring back Omer comes face to face with the real world. We're able to be at the that is the best thing out there. and creativity to ultimately that interview with Matt? So obviously one of the biggest use cases and it shows a real impact. Thank you, Dave. And thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Matt Cadieux | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Sergio Cross | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
2018 | DATE | 0.99+ |
Red Bull Racing | ORGANIZATION | 0.99+ |
Matt | PERSON | 0.99+ |
2020 | DATE | 0.99+ |
Milton Keynes | LOCATION | 0.99+ |
two weeks | QUANTITY | 0.99+ |
three-year | QUANTITY | 0.99+ |
20 | QUANTITY | 0.99+ |
Red Bull Racing | ORGANIZATION | 0.99+ |
Omer Asad | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
Omer | PERSON | 0.99+ |
Red Bull | ORGANIZATION | 0.99+ |
UK | LOCATION | 0.99+ |
two racks | QUANTITY | 0.99+ |
23 races | QUANTITY | 0.99+ |
Max Verstapenn | PERSON | 0.99+ |
90 seconds | QUANTITY | 0.99+ |
60 users | QUANTITY | 0.99+ |
22 races | QUANTITY | 0.99+ |
eight minutes | QUANTITY | 0.99+ |
more than 50% | QUANTITY | 0.99+ |
each track | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
one race | QUANTITY | 0.99+ |
two minutes | QUANTITY | 0.99+ |
two cables | QUANTITY | 0.99+ |
nine | QUANTITY | 0.99+ |
Hewlett Packard Enterprise | ORGANIZATION | 0.99+ |
150 VDIs | QUANTITY | 0.99+ |
SimpliVity | TITLE | 0.99+ |
Pharma Bubbles | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
five years ago | DATE | 0.99+ |
first opportunity | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
One | QUANTITY | 0.99+ |
about 800 VMs | QUANTITY | 0.99+ |
three racks | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
one site | QUANTITY | 0.98+ |
HPE | ORGANIZATION | 0.98+ |
Monte Carlo | TITLE | 0.98+ |
about 400 VDIs | QUANTITY | 0.98+ |
Primary Storage and Data Management Services | ORGANIZATION | 0.98+ |
hundreds of stores | QUANTITY | 0.98+ |
Red bull Racing | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.98+ |
thousand plus sensors | QUANTITY | 0.98+ |
tens of seconds | QUANTITY | 0.98+ |
second use case | QUANTITY | 0.98+ |
multi-trillion dollar | QUANTITY | 0.98+ |
over five years | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
GreenLake | ORGANIZATION | 0.97+ |
one city | QUANTITY | 0.97+ |
10 years | QUANTITY | 0.96+ |
HPE SimpliVity | TITLE | 0.96+ |
COVID | OTHER | 0.96+ |
hundreds of global retail stores | QUANTITY | 0.96+ |
about 150 VMs | QUANTITY | 0.96+ |
Matt Cadieux, CIO Red Bull Racing v2
(mellow music) >> Okay, we're back with Matt Cadieux who is the CIO Red Bull Racing. Matt, it's good to see you again. >> Yeah, great to see you, Dave. >> Hey, we're going to dig into a real world example of using data at the edge and in near real-time to gain insights that really lead to competitive advantage. But first Matt, tell us a little bit about Red Bull Racing and your role there. >> Sure, so I'm the CIO at Red Bull Racing. And at Red Bull Racing we're based in Milton Keynes in the UK. And the main job for us is to design a race car, to manufacture the race car, and then to race it around the world. So as CIO, we need to develop, the IT team needs to develop the applications used for the design, manufacturing, and racing. We also need to supply all the underlying infrastructure, and also manage security. So it's a really interesting environment that's all about speed. So this season we have 23 races, and we need to tear the car apart, and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 23 immovable deadlines, this big evolving prototype to manage with our car. But we're also improving all of our tools and methods and software that we use to design and make and race the car. So we have a big can-do attitude in the company, around continuous improvement. And the expectations are that we continue to make the car faster, that we're winning races, that we improve our methods in the factory and our tools. And so for IT it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility it needs. So my job is really to make sure we have the right staff, the right partners, the right technical platforms, so we can live up to expectations. >> And Matt that tear down and rebuild for 23 races. Is that because each track has its own unique signature that you have to tune to or are there other factors involved there? >> Yeah, exactly. Every track has a different shape. Some have lots of straight, some have lots of curves and lots are in between. The track's surface is very different and the impact that has on tires, the temperature and the climate is very different. Some are hilly, some are big curves that affect the dynamics of the car. So all that in order to win, you need to micromanage everything and optimize it for any given race track. >> And, you know, COVID has, of course, been brutal for sports. What's the status of your season? >> So this season we knew that COVID was here and we're doing 23 races knowing we have COVID to manage. And as a premium sporting team we've formed bubbles, we've put health and safety and social distancing into our environment. And we're able to operate by doing things in a safe manner. We have some special exhibitions in the UK. So for example, when people return from overseas that they do not have to quarantine for two weeks but they get tested multiple times a week and we know they're safe. So we're racing, we're dealing with all the hassle that COVID gives us. And we are really hoping for a return to normality sooner instead of later where we can get fans back at the track and really go racing and have the spectacle where everyone enjoys it. >> Yeah, that's awesome. So important for the fans but also all the employees around that ecosystem. Talk about some of the key drivers in your business and some of the key apps that give you competitive advantage to help you win races. >> Yeah, so in our business everything is all about speed. So the car obviously needs to be fast but also all of our business operations need to be fast. We need to be able to design our car and it's all done in the virtual world but the virtual simulations and designs need to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulations, the algorithms, and have all the underlying infrastructure that runs it quickly and reliably. In manufacturing, we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running, helping us do that. And at the race track itself in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race, we have split seconds, literally seconds to alter our race strategy if an event happens. So if there's an accident and the safety car comes out or the weather changes, we revise our tactics. And we're running Monte Carlo for example. And using experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors. All of that needs IT to work at a very high level. >> You know it's interesting, I mean, as a lay person, historically when I think about technology and car racing, of course, I think about the mechanical aspects of a self-propelled vehicle, the electronics and the like, but not necessarily the data. But the data's always been there, hasn't it? I mean, maybe in the form of like tribal knowledge, if it's somebody who knows the track and where the hills are and experience and gut feel. But today you're digitizing it and you're processing it in close to real-time. It's amazing. >> Yeah, exactly right. Yeah, the car is instrumented with sensors, we post-process, we're doing video, image analysis and we're looking at our car, our competitor's car. So there's a huge amount of very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah, the data and the applications that leverage it are really key. And that's a critical success factor for us. >> So let's talk about your data center at the track, if you will, I mean, if I can call it that. Paint a picture for us. >> Sure. What does that look like? >> So we have to send a lot of equipment to the track, at the edge. And even though we have really a great lateral network link back to the factory and there's cloud resources, a lot of the tracks are very old. You don't have hardened infrastructure, you don't have docks that protect cabling, for example, and you can lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions, all that needs to be at the edge where the car operates. So historically we had three racks of equipment, legacy infrastructure and it was really hard to manage, to make changes, it was too inflexible. There were multiple panes of glass, and it was too slow. It didn't run our applications quickly. It was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. So we'd introduced hyper-convergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said there's a lot smarter way of operating. We can get rid of all this slow and inflexible expensive legacy and introduce hyper-convergence. And we saw really excellent benefits for doing that. We saw a three X speed up for a lot of our applications. So here where we're post-processing data, and we have to make decisions about race strategy, time is of the essence and a three X reduction in processing time really matters. We also were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a $100,000 a year in freight costs by shipping less equipment. Things like backup, mistakes happen. Sometimes a user makes a mistake. So for example a race engineer could load the wrong data map into one of our simulations. And we could restore that DDI through SimpliVity backup in 90 seconds. And this makes sure, enables engineers to focus on the car, to make better decisions without having downtime. And we send two IT guys to every race. They're managing 60 users, a really diverse environment, juggling a lot of balls and having a simple management platform like HP SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >> Yes, so you had the nice Petri dish in the factory, so it sounds like your goals obviously, number one KPI is speed to help shave seconds off the time, but also cost. >> That's right. Just the simplicity of setting up the infrastructure is key. >> Yeah, that's exactly right. >> It's speed, speed, speed. So we want applications that absolutely fly, you know gets actionable results quicker, get answers from our simulations quicker. The other area that speed's really critical is our applications are also evolving prototypes and we're always, the models are getting bigger, the simulations are getting bigger, and they need more and more resource. And being able to spin up resource and provision things without being a bottleneck is a big challenge. And SimpliVity gives us the means of doing that. >> So did you consider any other options or was it because you had the factory knowledge, HCI was, you know, very clearly the option? What did you look at? >> Yeah, so we have over five years of experience in the factory and we eliminated all of our legacy infrastructure five years ago. And the benefits I've described at the track we saw that in the factory. At the track, we have a three-year operational life cycle for our equipment. 2017 was the last year we had legacy. As we were building for 2018, it was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized. Time is even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >> So why SimpliVity? Why do you choose HPE SimpliVity? >> Yeah, so when we first heard about hyper-converged, way back in the factory. We had a legacy infrastructure, overly complicated, too slow, too inflexible, too expensive. And we stepped back and said there has to be a smarter way of operating. We went out and challenged our technology partners. We learned about hyper-convergence. We didn't know if the hype was real or not. So we underwent some PLCs and benchmarking and the PLCs were really impressive. And all these, you know, speed and agility benefits we saw and HPE for our use cases was the clear winner in the benchmarks. So based on that we made an initial investment in the factory. We moved about 150 VMs and 150 VDIs into it. And then as we've seen all the benefits we've successfully invested, and we now have an estate in the factory of about 800 VMs and about 400 VDIs. So it's been a great platform and it's allowed us to really push boundaries and give the business the service it expects. >> Well that's a fun story. So just coming back to the metrics for a minute. So you're running Monte Carlo simulations in real-time and sort of near real-time. >> Yeah. And so essentially that's, if I understand it, that's what-ifs and it's the probability of the outcome. And then somebody's got to make, >> Exactly. then a human's got to say, okay, do this, right. And so was that, >> Yeah. with the time in which you were able to go from data to insight to recommendation or edict was that compressed? You kind of indicated that, but. >> Yeah, that was accelerated. And so in that use case, what we're trying to do is predict the future and you're saying well, and before any event happens, you're doing what-ifs. Then if it were to happen, what would you probabilistically do? So, you know, so that simulation we've been running for a while but it gets better and better as we get more knowledge. And so that we were able to accelerate that with SimpliVity. But there's other use cases too. So we offload telemetry from the car and we post-process it. And that reprocessing time really is very time consuming. And, you know, we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time. And ultimately that meant an engineer could understand what the car was doing in a practice session, recommend a tweak to the configuration or setup of it, and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >> Such a great example. How are you guys feeling about the season, Matt? What's the team's, the sentiment? >> Yeah, I think we're optimistic. We with thinking our simulations that we have a great car. We have a new driver lineup. We have Max Verstappen who carries on with the team and Sergio Perez joins the team. So we're really excited about this year and we want to go and win races. And I think with COVID people are just itching also to get back to a little degree of normality, and, you know, and going racing again, even though there's no fans, it gets us into a degree of normality. >> That's great, Matt, good luck this season and going forward and thanks so much for coming back in theCUBE. Really appreciate it. >> It's my pleasure. Great talking to you again. >> Okay, now we're going to bring back Omar for a quick summary. So keep it right there. (mellow music)
SUMMARY :
Matt, it's good to see you again. and in near real-time and that we can be part of that journey And Matt that tear down and the impact that has on tires, What's the status of your season? and have the spectacle and some of the key apps So the car obviously needs to be fast the electronics and the like, and to continuously improve our car. data center at the track, What does that look like? So the applications we Petri dish in the factory, Just the simplicity of And being able to spin up And the benefits that we and the PLCs were really impressive. So just coming back to probability of the outcome. And so was that, from data to insight to recommendation And so that we were able to What's the team's, the sentiment? and Sergio Perez joins the team. and going forward and thanks so much Great talking to you again. So keep it right there.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Max Verstappen | PERSON | 0.99+ |
Matt Cadieux | PERSON | 0.99+ |
Sergio Perez | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
two weeks | QUANTITY | 0.99+ |
Milton Keynes | LOCATION | 0.99+ |
Red Bull Racing | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Omar | PERSON | 0.99+ |
2018 | DATE | 0.99+ |
60 users | QUANTITY | 0.99+ |
UK | LOCATION | 0.99+ |
20 | QUANTITY | 0.99+ |
90 seconds | QUANTITY | 0.99+ |
23 races | QUANTITY | 0.99+ |
150 VDIs | QUANTITY | 0.99+ |
three-year | QUANTITY | 0.99+ |
two racks | QUANTITY | 0.99+ |
each track | QUANTITY | 0.99+ |
2017 | DATE | 0.99+ |
two minutes | QUANTITY | 0.99+ |
eight minutes | QUANTITY | 0.99+ |
nine | QUANTITY | 0.99+ |
three racks | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
five years ago | DATE | 0.98+ |
hundreds | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
about 800 VMs | QUANTITY | 0.98+ |
HP | ORGANIZATION | 0.98+ |
about 150 VMs | QUANTITY | 0.98+ |
about 400 VDIs | QUANTITY | 0.98+ |
one ratios | QUANTITY | 0.98+ |
first | QUANTITY | 0.96+ |
over five years | QUANTITY | 0.95+ |
this year | DATE | 0.95+ |
SimpliVity | TITLE | 0.94+ |
$100,000 a year | QUANTITY | 0.93+ |
23 immovable | QUANTITY | 0.93+ |
HCI | ORGANIZATION | 0.93+ |
two IT | QUANTITY | 0.91+ |
Saturday | DATE | 0.91+ |
Monte Carlo | TITLE | 0.91+ |
one | QUANTITY | 0.88+ |
Every track | QUANTITY | 0.84+ |
a minute | QUANTITY | 0.77+ |
COVID | OTHER | 0.77+ |
three | QUANTITY | 0.76+ |
Monte Carlo | COMMERCIAL_ITEM | 0.75+ |
every race | QUANTITY | 0.75+ |
times a week | QUANTITY | 0.75+ |
seconds | QUANTITY | 0.64+ |
Friday | DATE | 0.6+ |
of curves | QUANTITY | 0.58+ |
no | QUANTITY | 0.56+ |
number one | QUANTITY | 0.56+ |
straight | QUANTITY | 0.52+ |
SimpliVity | OTHER | 0.52+ |
COVID | TITLE | 0.5+ |
HPE | TITLE | 0.34+ |
Jamie Thomas, IBM | IBM Think 2020
Narrator: From theCUBE studios in Palo Alto and Boston, it's theCUBE, covering IBM Think, brought to you by IBM. >> We're back. You're watching theCUBE and our coverage of IBM Think 2020, the digital IBM thinking. We're here with Jamie Thomas, who's the general manager of strategy and development for IBM Systems. Jamie, great to see you. >> It's great to see you as always. >> You have been knee deep in qubits, the last couple years. And we're going to talk quantum. We've talked quantum a lot in the past, but it's a really interesting field. We spoke to you last year at IBM Think about this topic. And a year in this industry is a long time, but so give us the update what's new in quantum land? >> Well, Dave first of all, I'd like to say that in this environment we find ourselves in, I think we can all appreciate why innovation of this nature is perhaps more important going forward, right? If we look at some of the opportunities to solve some of the unsolvable problems, or solve problems much more quickly, in the case of pharmaceutical research. But for us in IBM, it's been a really busy year. First of all, we worked to advance the technology, which is first and foremost in terms of this journey to quantum. We just brought online our 53 qubit computer, which also has a quantum volume of 32, which we can talk about. And we've continued to advance the software stack that's attached to the technology because you have to have both the software and the hardware thing, right rate and pace. We've advanced our new network, which you and I have spoken about, which are those individuals across the commercial enterprises, academic and startups, who are working with us to co-create around quantum to help us understand the use cases that really can be solved in the future with quantum. And we've also continued to advance our community, which is serving as well in this new digital world that we're finding ourselves in, in terms of reaching out to developers. Now, we have over 300,000 unique downloads of the programming model that represents the developers that we're touching out there every day with quantum. These developers have, in the last year, have run over 140 billion quantum circuits. So, our machines in the cloud are quite active, and the cloud model, of course, is serving us well. The data's, in addition, to all the other things that I mentioned. >> So Jamie, what metrics are you trying to optimize on? You mentioned 53 qubits I saw that actually came online, I think, last fall. So you're nearly six months in now, which is awesome. But what are you measuring? Are you measuring stability or coherence or error rates? Number of qubits? What are the things that you're trying to optimize on to measure progress? >> Well, that's a good question. So we have this metric that we've defined over the last year or two called quantum volume. And quantum volume 32, which is the capacity of our current machine really is a representation of many of the things that you mentioned. It represents the power of the quantum machine, if you will. It includes a definition of our ability to provide error correction, to maintain states, to really accomplish workloads with the computer. So there's a number of factors that go into quantum volume, which we think are important. Now, qubits and the number of qubits is just one such metric. It really depends on the coherence and the effect of error correction, to really get the value out of the machine, and that's a very important metric. >> Yeah, we love to boil things down to a single metric. It's more complicated than that >> Yeah, yeah. >> specifically with quantum. So, talk a little bit more about what clients are doing and I'm particularly interested in the ecosystem that you're forming around quantum. >> Well, as I said, the ecosystem is both the network, which are those that are really intently working with us to co-create because we found, through our long history in IBM, that co-creation is really important. And also these researchers and developers realize that some of our developers today are really researchers, but as you as you go forward you get many different types of developers that are part of this mix. But in terms of our ecosystem, we're really fundamentally focused on key problems around chemistry, material science, financial services. And over the last year, there's over 200 papers that have been written out there from our network that really embody their work with us on this journey. So we're looking at things like quadratic speed up of things like Monte Carlo simulation, which is used in the financial services arena today to quantify risk. There's papers out there around topics like trade settlements, which in the world today trade settlements is a very complex domain with very interconnected complex rules and trillions of dollars in the purview of trade settlement. So, it's just an example. Options pricing, so you see examples around options pricing from corporations like JPMC in the area of financial services. And likewise in chemistry, there's a lot of research out there focused on batteries. As you can imagine, getting everything to electric powered batteries is an important topic. But today, the way we manufacture batteries can in fact create air pollution, in terms of the process, as well as we want batteries to have more retention in life to be more effective in energy conservation. So, how do we create batteries and still protect our environment, as we all would like to do? And so we've had a lot of research around things like the next generation of electric batteries, which is a key topic. But if you can think, you know Dave, there's so many topics here around chemistry, also pharmaceuticals that could be advanced with a quantum computer. Obviously, if you look at the COVID-19 news, our supercomputer that we installed at Oak Ridge National Laboratory for instance, is being used to analyze 8000 different compounds for specifically around COVID-19 and the possibilities of using those compounds to solve COVID-19, or influence it in a positive manner. You can think of the quantum computer when it comes online as an accelerator to a supercomputer like that, helping speed up this kind of research even faster than what we're able to do with something like the Summit supercomputer. Oak Ridge is one of our prominent clients with the quantum technology, and they certainly see it that way, right, as an accelerator to the capacity they already have. So a great example that I think is very germane in the time that we find ourselves in. >> How 'about startups in this ecosystem? Are you able to-- I mean there must be startups popping up all over the place for this opportunity. Are you working with any startups or incubating any startups? Can you talk about that? >> Oh yep. Absolutely. There's about a third of our network are in VC startups and there's a long list of them out there. They're focused on many different aspects of quantum computing. Many of 'em are focused on what I would call loosely, the programming model, looking at improving algorithms across different industries, making it easier for those that are, perhaps more skilled in domains, whether that is chemistry or financial services or mathematics, to use the power of the quantum computer. Many of those startups are leveraging our Qiskit, our quantum information science open programming model that we put out there so it's open. Many of the startups are using that programming model and then adding their own secret sauce, if you will, to understand how they can help bring on users in different ways. So it depends on their domain. You see some startups that are focused on the hardware as well, of course, looking at different hardware technologies that can be used to solve quantum. I would say I feel like more of them are focused on the software programming model. >> Well Jamie, it was interesting hear you talk about what some of the clients are doing. I mean obviously in pharmaceuticals, and battery manufacturers do a lot of advanced R and D, but you mentioned financial services, you know JPMC. It's almost like they're now doing advanced R and D trying to figure out how they can apply quantum to their business down the road. >> Absolutely, and we have a number of financial institutions that we've announced as part of the network. JPMC is just one of our premiere references who have written papers about it. But I would tell you that in the world of Monte Carlo simulation, options pricing, risk management, a small change can make a big difference in dollars. So we're talking about operations that in many cases they could achieve, but not achieve in the right amount of time. The ability to use quantum as an accelerator for these kind of operations is very important. And I can tell you, even in the last few weeks, we've had a number of briefings with financial companies for five hours on this topic. Looking at what could they do and learning from the work that's already done out there. I think this kind of advanced research is going to be very important. We also had new members that we announced at the beginning of the year at the CES show. Delta Airlines joined. First Transportation Company, Amgen joined, a pharmaceutical, an example of pharmaceuticals, as well as a number of other research organizations. Georgia Tech, University of New Mexico, Anthem Insurance, just an example of the industries that are looking to take advantage of this kind of technology as it matures. >> Well, and it strikes me too, that as you start to bring machine intelligence into the equation, it's a game changer. I mean, I've been saying that it's not Moore's Law driving the industry anymore, it's this combination of data, AI, and cloud for scale, but now-- Of course there are alternative processors going on, we're seeing that, but now as you bring in quantum that actually adds to that innovation cocktail, doesn't it? >> Yes, and as you recall when you and I spoke last year about this, there are certain domains today where you really cannot get as much effective gain out of classical computing. And clearly, chemistry is one of those domains because today, with classical computers, we're really unable to model even something as simple as a caffeine molecule, which we're all so very familiar with. I have my caffeine here with me today. (laughs) But you know, clearly, to the degree we can actually apply molecular modeling and the advantages that quantum brings to those fields, we'll be able to understand so much more about materials that affect all of us around the world, about energy, how to explore energy, and create energy without creating the carbon footprint and the bad outcomes associated with energy creation, and how to obviously deal with pharmaceutical creation much more effectively. There's a real promise in a lot of these different areas. >> I wonder if you could talk a little bit about some of the landscape and I'm really interested in what IBM brings to the table that's sort of different. You're seeing a lot of companies enter this space, some big and many small, what's the unique aspect that IBM brings to the table? You've mentioned co-creating before. Are you co-creating, coopertating with some of the other big guys? Maybe you could address that. >> Well, obviously this is a very hot topic, both within the technology industry and across government entities. I think that some of the key values we bring to the table is we are the only vendor right now that has a fleet of systems available in the cloud, and we've been out there for several years, enabling clients to take advantage of our capacity. We have both free access and premium access, which is what the network is paying for because they get access to the highest fidelity machines. Clearly, we understand intently, classical computing and the ability to leverage classical with quantum for advantage across many of these different industries, which I think is unique. We understand the cloud experience that we're bringing to play here with quantum since day one, and most importantly, I think we have strong relationships. We have, in many cases, we're still running the world. I see it every day coming through my clients' port vantage point. We understand financial services. We understand healthcare. We understand many of these important domains, and we're used to solving tough problems. So, we'll bring that experience with our clients and those industries to the table here and help them on this journey. >> You mentioned your experience in sort of traditional computing, basically if I understand it correctly, you're still using traditional silicon microprocessors to read and write the data that's coming out of quantum. I don't know if they're sitting physically side by side, but you've got this big cryogenic unit, cables coming in. That's the sort of standard for some time. It reminds me, can it go back to ENIAC? And now, which is really excites me because you look at the potential to miniaturize this over the next several decades, but is that right, you're sort of side by side with traditional computing approaches? >> Right, effectively what we do with quantum today does not happen without classical computers. The front end, you're coming in on classical computers. You're storing your data on classical computers, so that is the model that we're in today, and that will continue to happen. In terms of the quantum processor itself, it is a silicon based processor, but it's a superconducting technology, in our case, that runs inside that cryogenics unit at a very cold temperature. It is powered by next-generation electronics that we in IBM have innovated around and created our own electronic stack that actually sends microwave pulses into the processor that resides in the cryogenics unit. So when you think about the components of the system, you have to be innovating around the processor, the cryogenics unit, the custom electronic stack, and the software all at the same time. And yes, we're doing that in terms of being surrounded by this classical backplane that allows our Q network, as well as the developers around the world to actually communicate with these systems. >> The other thing that I really like about this conversation is it's not just R and D for the sake of R and D, you've actually, you're working with partners to, like you said, co-create, customers, financial services, airlines, manufacturing, et cetera. I wonder if you could maybe kind of address some of the things that you see happening in the sort of near to midterm, specifically as it relates to where people start. If I'm interested in this, what do I do? Do I need new skills? Do I need-- It's in the cloud, right? >> Yeah. >> So I can spit it up there, but where do people get started? >> Well they can certainly come to the Quantum Experience, which is our cloud experience and start to try out the system. So, we have both easy ways to get started with visual composition of circuits, as well as using the programming model that I mentioned, the Qiskit programming model. We've provided extensive YouTube videos out there already. So, developers who are interested in starting to learn about quantum can go out there and subscribe to our YouTube channel. We've got over 40 assets already recorded out there, and we continue to do those. We did one last week on quantum circuits for those that are more interested in that particular domain, but I think that's a part of this journey is making sure that we have all the assets out there digitally available for those around the world that want to interact with us. We have tremendous amount of education. We're also providing education to our business partners. One of our key network members, who I'll be speaking with later, I think today, is from Accenture. Accenture's an example of an organization that's helping their clients understand this quantum journey, and of course they're providing their own assets, if you will, but once again, taking advantage of the education that we're providing to them as a business partner. >> People talk about quantum being a decade away, but I think that's the wrong way to think about it, and I'd love your thoughts on this. It feels like, almost like the return coming out of COVID-19, it's going to come in waves, and there's parts that are going to be commercialized thoroughly and it's not binary. It's not like all of a sudden one day we're going to wake, "Hey, quantum is here!" It's really going to come in layers. Your thoughts? >> Yeah, I definitely agree with that. It's very important, that thought process because if you want to be competitive in your industry, you should think about getting started now. And that's why you see so many financial services, industrial firms, and others joining to really start experimentation around some of these domain areas to understand jointly how we evolve these algorithms to solve these problems. I think that the production level characteristics will curate the rate and pace of the industry. The industry, as we know, can drive things together faster. So together, we can make this a reality faster, and certainly none of us want to say it's going to be a decade, right. I mean, we're getting advantage today, in terms of the experimentation and the understanding of these problems, and we have to expedite that, I think, in the next few years. And certainly, with this arms race that we see, that's going to continue. One of the things I didn't mention is that IBM is also working with certain countries and we have significant agreements now with the countries of Germany and Japan to put quantum computers in an IBM facility in those countries. It's in collaboration with Fraunhofer Institute or miR Scientific Organization in Germany and with the University of Tokyo in Japan. So you can see that it's not only being pushed by industry, but it's also being pushed from the vantage of countries and bringing this research and technology to their countries. >> All right, Jamie, we're going to have to leave it there. Thanks so much for coming on theCUBE and give us the update. It's always great to see you. Hopefully, next time I see you, it'll be face to face. >> That's right, I hope so too. It's great to see you guys, thank you. Bye. >> All right, you're welcome. Keep it right there everybody. This is Dave Vellante for theCUBE. Be back right after this short break. (gentle music)
SUMMARY :
brought to you by IBM. the digital IBM thinking. We spoke to you last year at in the future with quantum. What are the things that you're trying of many of the things that you mentioned. things down to a single metric. interested in the ecosystem in the time that we find ourselves in. all over the place for this opportunity. Many of the startups are to their business down the road. just an example of the that actually adds to that and the bad outcomes associated of the other big guys? and the ability to leverage That's the sort of standard for some time. so that is the model that we're in today, in the sort of near to midterm, and subscribe to our YouTube channel. that are going to be One of the things I didn't It's always great to see you. It's great to see you guys, thank you. Be back right after this short break.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jamie Thomas | PERSON | 0.99+ |
Jamie | PERSON | 0.99+ |
Fraunhofer Institute | ORGANIZATION | 0.99+ |
Germany | LOCATION | 0.99+ |
University of New Mexico | ORGANIZATION | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
Georgia Tech | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
First Transportation Company | ORGANIZATION | 0.99+ |
five hours | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
Japan | LOCATION | 0.99+ |
Amgen | ORGANIZATION | 0.99+ |
Delta Airlines | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Anthem Insurance | ORGANIZATION | 0.99+ |
Monte Carlo | TITLE | 0.99+ |
last year | DATE | 0.99+ |
miR Scientific Organization | ORGANIZATION | 0.99+ |
University of Tokyo | ORGANIZATION | 0.99+ |
53 qubits | QUANTITY | 0.99+ |
Oak Ridge | ORGANIZATION | 0.99+ |
last fall | DATE | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
COVID-19 | OTHER | 0.99+ |
8000 different compounds | QUANTITY | 0.99+ |
ENIAC | ORGANIZATION | 0.99+ |
over 200 papers | QUANTITY | 0.99+ |
trillions of dollars | QUANTITY | 0.99+ |
53 qubit | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
CES | EVENT | 0.98+ |
One | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
single metric | QUANTITY | 0.97+ |
32 | QUANTITY | 0.97+ |
first | QUANTITY | 0.96+ |
First | QUANTITY | 0.96+ |
IBM Think | ORGANIZATION | 0.95+ |
over 40 assets | QUANTITY | 0.94+ |
two | QUANTITY | 0.94+ |
IBM Systems | ORGANIZATION | 0.93+ |
over 140 billion quantum circuits | QUANTITY | 0.93+ |
a year | QUANTITY | 0.93+ |
last couple years | DATE | 0.92+ |
over 300,000 unique downloads | QUANTITY | 0.92+ |
Oak Ridge National Laboratory | ORGANIZATION | 0.89+ |
one such metric | QUANTITY | 0.87+ |
nearly six months | QUANTITY | 0.87+ |
Bill Vass, AWS | AWS re:Invent 2019
>> Announcer: Live from Las Vegas, it's theCUBE! Covering AWS re:Invent 2019. Brought to you by Amazon Web Services and Intel. Along with it's ecosystem partners. >> Okay, welcome back everyone. It's theCUBE's live coverage here in Las Vegas for Amazon Web Series today, re:Invent 2019. It's theCUBE's seventh year covering re:Invent. Eight years they've been running this event. It gets bigger every year. It's been a great wave to ride on. I'm John Furrier, my cohost, Dave Vellante. We've been riding this wave, Dave, for years. It's so exciting, it gets bigger and more exciting. >> Lucky seven. >> This year more than ever. So much stuff is happening. It's been really exciting. I think there's a sea change happening, in terms of another wave coming. Quantum computing, big news here amongst other great tech. Our next guest is Bill Vass, VP of Technology, Storage Automation Management, part of the quantum announcement that went out. Bill, good to see you. >> Yeah, well, good to see you. Great to see you again. Thanks for having me on board. >> So, we love quantum, we talk about it all the time. My son loves it, everyone loves it. It's futuristic. It's going to crack everything. It's going to be the fastest thing in the world. Quantum supremacy. Andy referenced it in my one-on-one with him around quantum being important for Amazon. >> Yes, it is, it is. >> You guys launched it. Take us through the timing. Why, why now? >> Okay, so the Braket service, which is based on quantum notation made by Dirac, right? So we thought that was a good name for it. It provides for you the ability to do development in quantum algorithms using gate-based programming that's available, and then do simulation on classical computers, which is what we call our digital computers today now. (men chuckling) >> Yeah, it's a classic. >> These are classic computers all of a sudden right? And then, actually do execution of your algorithms on, today, three different quantum computers, one that's annealing and two-bit gate-based machines. And that gives you the ability to test them in parallel and separate from each other. In fact, last week, I was working with the team and we had two machines, an ion trap machine and an electromagnetic tunneling machine, solving the same problem and passing variables back and forth from each other, you could see the cloud watch metrics coming out, and the data was going to an S3 bucket on the output. And we do it all in a Jupiter notebook. So it was pretty amazing to see all that running together. I think it's probably the first time two different machines with two different technologies had worked together on a cloud computer, fully integrated with everything else, so it was pretty exciting. >> So, quantum supremacy has been a word kicked around. A lot of hand waving, IBM, Google. Depending on who you talk to, there's different versions. But at the end of the day, quantum is a leap in computing. >> Bill: Yes, it can be. >> It can be. It's still early days, it would be day zero. >> Yeah, well I think if you think of, we're about where computers were with tubes if you remember, if you go back that far, right, right? That's about where we are right now, where you got to kind of jiggle the tubes sometimes to get them running. >> A bug gets in there. Yeah, yeah, that bug can get in there, and all of those kind of things. >> Dave: You flip 'em off with a punch card. Yeah, yeah, so for example, a number of the machines, they run for four hours and then they come down for a half hour for calibration. And then they run for another four hours. So we're still sort of at that early stage, but you can do useful work on them. And more mature systems, like for example D-Wave, which is annealer, a little different than gate-based machines, is really quite mature, right? And so, I think as you go back and forth between these machines, the gate-based machines and annealers, you can really get a sense for what's capable today with Braket and that's what we want to do is get people to actually be able to try them out. Now, quantum supremacy is a fancy word for we did something you can't do on a classical computer, right? That's on a quantum computer for the first time. And quantum computers have the potential to exceed the processing power, especially on things like factoring and other things like that, or on Hamiltonian simulations for molecules, and those kids of things, because a quantum computer operates the way a molecule operates, right, in a lot of ways using quantum mechanics and things like that. And so, it's a fancy term for that. We don't really focus on that at Amazon. We focus on solving customer's problems. And the problem we're solving with Braket is to get them to learn it as it's evolving, and be ready for it, and continue to develop the environment. And then also offer a lot of choice. Amazon's always been big on choice. And if you look at our processing portfolio, we have AMD, Intel x86, great partners, great products from them. We have Nvidia, great partner, great products from them. But we also have our Graviton 1 and Graviton 2, and our new GPU-type chip. And those are great products, too, I've been doing a lot on those, as well. And the customer should have that choice, and with quantum computers, we're trying to do the same thing. We will have annealers, we will have ion trap machines, we will have electromagnetic machines, and others available on Braket. >> Can I ask a question on quantum if we can go back a bit? So you mentioned vacuum tubes, which was kind of funny. But the challenge there was with that, it was cooling and reliability, system downtime. What are the technical challenges with regard to quantum in terms of making it stable? >> Yeah, so some of it is on classical computers, as we call them, they have error-correction code built in. So you have, whether you know it or not, there's alpha particles that are flipping bits on your memory at all times, right? And if you don't have ECC, you'd get crashes constantly on your machine. And so, we've built in ECC, so we're trying to build the quantum computers with the proper error correction, right, to handle these things, 'cause nothing runs perfectly, you just think it's perfect because we're doing all the error correction under the covers, right? And so that needs to evolve on quantum computing. The ability to reproduce them in volume from an engineering perspective. Again, standard lithography has a yield rate, right? I mean, sometimes the yield is 40%, sometimes it's 20%, sometimes it's a really good fab and it's 80%, right? And so, you have a yield rate, as well. So, being able to do that. These machines also generally operate in a cryogenic world, that's a little bit more complicated, right? And they're also heavily affected by electromagnetic radiation, other things like that, so you have to sort of faraday cage them in some cases, and other things like that. So there's a lot that goes on there. So it's managing a physical environment like cryogenics is challenging to do well, having the fabrication to reproduce it in a new way is hard. The physics is actually, I shudder to say well understood. I would say the way the physics works is well understood, how it works is not, right? No one really knows how entanglement works, they just knows what it does, and that's understood really well, right? And so, so a lot of it is now, why we're excited about it, it's an engineering problem to solve, and we're pretty good at engineering. >> Talk about the practicality. Andy Jassy was on the record with me, quoted, said, "Quantum is very important to Amazon." >> Yes it is. >> You agree with that. He also said, "It's years out." You said that. He said, "But we want to make it practical "for customers." >> We do, we do. >> John: What is the practical thing? Is it just kicking the tires? Is it some of the things you mentioned? What's the core goal? >> So, in my opinion, we're at a point in the evolution of these quantum machines, and certainly with the work we're doing with Cal Tech and others, that the number of available cubits are starting to increase at an astronomic rate, a Moore's Law kind of of rate, right? Whether it's, no matter which machine you're looking at out there, and there's about 200 different companies building quantum computers now, and so, and they're all good technology. They've all got challenges, as well, as reproducibility, and those kind of things. And so now's a good time to start learning how to do this gate-based programming knowing that it's coming, because quantum computers, they won't replace a classical computer, so don't think that. Because there is no quantum ram, you can't run 200 petabytes of data through a quantum computer today, and those kind of things. What it can do is factoring very well, or it can do probability equations very well. It'll have affects on Monte Carlo simulations. It'll have affects specifically in material sciences where you can simulate molecules for the first time that you just can't do on classical computers. And when I say you can't do on classical computers, my quantum team always corrects me. They're like, "Well, no one has proven "that there's an algorithm you can run "on a classical computer that will do that yet," right? (men chuckle) So there may be times when you say, "Okay, I did this on a quantum computer," and you can only do it on a quantum computer. But then someone's very smart mathematician says, "Oh, I figured out how to do it on a regular computer. "You don't need a quantum computer for that." And that's constantly evolving, as well, in parallel, right? And so, and that's what's that argument between IBM and Google on quantum supremacy is that. And that's an unfortunate distraction in my opinion. What Google did was quite impressive, and if you're in the quantum world, you should be very happy with what they did. They had a very low error rate with a large number of cubits, and that's a big deal. >> Well, I just want to ask you, this industry is an arms race. But, with something like quantum where you've got 200 companies actually investing in it so early days, is collaboration maybe a model here? I mean, what do think? You mentioned Cal Tech. >> It certainly is for us because, like I said, we're going to have multiple quantum computers available, just like we collaborate with Intel, and AMD, and the other partners in that space, as well. That's sort of the nice thing about being a cloud service provider is we can give customers choice, and we can have our own innovation, plus their innovations available to customers, right? Innovation doesn't just happen in one place, right? We got a lot of smart people at Amazon, we don't invent everything, right? (Dave chuckles) >> So I got to ask you, obviously, we can take cube quantum and call it cubits, not to be confused with theCUBE video highlights. Joking aside, classical computers, will there be a classical cloud? Because this is kind of a futuristic-- >> Or you mean a quantum cloud? >> Quantum cloud, well then you get the classic cloud, you got the quantum cloud. >> Well no, they'll be together. So I think a quantum computer will be used like we used to use a math coprocessor if you like, or FPGAs are used today, right? So, you'll go along and you'll have your problem. And I'll give you a real, practical example. So let's say you had a machine with 125 cubits, okay? You could just start doing some really nice optimization algorithms on that. So imagine there's this company that ships stuff around a lot, I wonder who that could be? And they need to optimize continuously their delivery for a truck, right? And that changes all the time. Well that algorithm, if you're doing hundreds of deliveries in a truck, it's very complicated. That traveling salesman algorithm is a NP-hard problem when you do it, right? And so, what would be the fastest best path? But you got to take into account weather and traffic, so that's changing. So you might have a classical computer do those algorithms overnight for all the delivery trucks and then send them out to the trucks. The next morning they're driving around. But it takes a lot of computing power to do that, right? Well, a quantum computer can do that kind of problemistic or deterministic equation like that, not deterministic, a best-fit algorithm like that, much faster. And so, you could have it every second providing that. So your classical computer is sending out the manifests, interacting with the person, it's got the website on it. And then, it gets to the part where here's the problem to calculate, we call it a shot when you're on a quantum computer, it runs it in a few seconds that would take an hour or more. >> It's a fast job, yeah. >> And it comes right back with the result. And then it continues with it's thing, passes it to the driver. Another update occurs, (buzzing) and it's just going on all the time. So those kind of things are very practical and coming. >> I've got to ask for the younger generations, my sons super interested as I mentioned before you came on, quantum attracts the younger, smart kids coming into the workforce, engineering talent. What's the best path for someone who has an either advanced degree, or no degree, to get involved in quantum? Is there a certain advice you'd give someone? >> So the reality is, I mean, obviously having taken quantum mechanics in school and understanding the physics behind it to an extent, as much as you can understand the physics behind it, right? I think the other areas, there are programs at universities focused on quantum computing, there's a bunch of them. So, they can go into that direction. But even just regular computer science, or regular mechanical and electrical engineering are all neat. Mechanical around the cooling, and all that other stuff. Electrical, these are electrically-based machines, just like a classical computer is. And being able to code at low level is another area that's tremendously valuable right now. >> Got it. >> You mentioned best fit is coming, that use case. I mean, can you give us a sense of a timeframe? And people will say, "Oh, 10, 15, 20 years." But you're talking much sooner. >> Oh, I don't, I think it's sooner than that, I do. And it's hard for me to predict exactly when we'll have it. You can already do, with some of the annealing machines, like D- Wave, some of the best fit today, right? So it's a matter of people want to use a quantum computer because they need to do something fast, they don't care how much it costs, they need to do something fast. Or it's too expensive to do it on a classical computer, or you just can't do it at all on a classical computer. Today, there isn't much of that last one, you can't do it at all, but that's coming. As you get to around 52, 50, 52 cubits, it's very hard to simulate that on a classical computer. You're starting to reach the edge of what you can practically do on a classical computer. At about 125 cubits, you probably are at a point where you can't just simulate it anymore. >> But you're talking years, not decades, for this use case? >> Yeah, I think you're definitely talking years. I think, and you know, it's interesting, if you'd asked me two years ago how long it would take, I would've said decades. So that's how fast things are advancing right now, and I think that-- >> Yeah, and the computers just getting faster and faster. >> Yeah, but the ability to fabricate, the understanding, there's a number of architectures that are very well proven, it's just a matter of getting the error rates down, stability in place, the repeatable manufacturing in place, there's a lot of engineering problems. And engineering problems are good, we know how to do engineering problems, right? And we actually understand the physics, or at least we understand how the physics works. I won't claim that, what is it, "Spooky action at a distance," is what Einstein said for entanglement, right? And that's a core piece of this, right? And so, those are challenges, right? And that's part of the mystery of the quantum computer, I guess. >> So you're having fun? >> I am having fun, yeah. >> I mean, this is pretty intoxicating, technical problems, it's fun. >> It is. It is a lot of fun. Of course, the whole portfolio that I run over at AWS is just really a fun portfolio, between robotics, and autonomous systems, and IOT, and the advanced storage stuff that we do, and all the edge computing, and all the monitor and management systems, and all the real-time streaming. So like Kinesis Video, that's the back end for the Amazon ghost stores, and working with all that. It's a lot of fun, it really is, it's good. >> Well, Bill, we need an hour to get into that, so we may have to come up and see you, do a special story. >> Oh, definitely! >> We'd love to come up and dig in, and get a special feature program with you at some point. >> Yeah, happy to do that, happy to do that. >> Talk some robotics, some IOT, autonomous systems. >> Yeah, you can see all of it around here, we got it up and running around here, Dave. >> What a portfolio. >> Congratulations. >> Alright, thank you so much. >> Great news on the quantum. Quantum is here, quantum cloud is happening. Of course, theCUBE is going quantum. We've got a lot of cubits here. Lot of CUBE highlights, go to SiliconAngle.com. We got all the data here, we're sharing it with you. I'm John Furrier with Dave Vellante talking quantum. Want to give a shout out to Amazon Web Services and Intel for setting up this stage for us. Thanks to our sponsors, we wouldn't be able to make this happen if it wasn't for them. Thank you very much, and thanks for watching. We'll be back with more coverage after this short break. (upbeat music)
SUMMARY :
Brought to you by Amazon Web Services and Intel. It's so exciting, it gets bigger and more exciting. part of the quantum announcement that went out. Great to see you again. It's going to be the fastest thing in the world. You guys launched it. It provides for you the ability to do development And that gives you the ability to test them in parallel Depending on who you talk to, there's different versions. It's still early days, it would be day zero. we're about where computers were with tubes if you remember, can get in there, and all of those kind of things. And the problem we're solving with Braket But the challenge there was with that, And so that needs to evolve on quantum computing. Talk about the practicality. You agree with that. And when I say you can't do on classical computers, But, with something like quantum and the other partners in that space, as well. So I got to ask you, you get the classic cloud, you got the quantum cloud. here's the problem to calculate, we call it a shot and it's just going on all the time. quantum attracts the younger, smart kids And being able to code at low level is another area I mean, can you give us a sense of a timeframe? And it's hard for me to predict exactly when we'll have it. I think, and you know, it's interesting, Yeah, and the computers Yeah, but the ability to fabricate, the understanding, I mean, this is and the advanced storage stuff that we do, so we may have to come up and see you, and get a special feature program with you Yeah, happy to do that, Talk some robotics, some IOT, Yeah, you can see all of it We got all the data here, we're sharing it with you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Amazon Web Services | ORGANIZATION | 0.99+ |
two machines | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Cal Tech | ORGANIZATION | 0.99+ |
AMD | ORGANIZATION | 0.99+ |
Andy | PERSON | 0.99+ |
Bill | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Einstein | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
40% | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
Bill Vass | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
20% | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
an hour | QUANTITY | 0.99+ |
four hours | QUANTITY | 0.99+ |
200 companies | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
two-bit | QUANTITY | 0.99+ |
15 | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
125 cubits | QUANTITY | 0.99+ |
200 petabytes | QUANTITY | 0.99+ |
20 years | QUANTITY | 0.99+ |
two different machines | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
50 | QUANTITY | 0.99+ |
two different technologies | QUANTITY | 0.99+ |
Eight years | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
Monte Carlo | TITLE | 0.98+ |
today | DATE | 0.98+ |
two years ago | DATE | 0.98+ |
52 cubits | QUANTITY | 0.97+ |
Braket | ORGANIZATION | 0.97+ |
x86 | COMMERCIAL_ITEM | 0.97+ |
This year | DATE | 0.96+ |
next morning | DATE | 0.96+ |
about 125 cubits | QUANTITY | 0.95+ |
Graviton 1 | COMMERCIAL_ITEM | 0.95+ |
Dirac | ORGANIZATION | 0.95+ |
Graviton 2 | COMMERCIAL_ITEM | 0.94+ |
about 200 different companies | QUANTITY | 0.93+ |
three different quantum computers | QUANTITY | 0.93+ |
Moore's Law | TITLE | 0.91+ |
seventh year | QUANTITY | 0.9+ |
decades | QUANTITY | 0.87+ |
seconds | QUANTITY | 0.86+ |
every second | QUANTITY | 0.85+ |
re: | EVENT | 0.82+ |
half hour | QUANTITY | 0.81+ |
Caryn Woodruff, IBM & Ritesh Arora, HCL Technologies | IBM CDO Summit Spring 2018
>> Announcer: Live from downtown San Francisco, it's the Cube, covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody. We're at the Parc 55 in Union Square and this is the Cube, the leader in live tech coverage and we're covering exclusive coverage of the IBM CDO strategy summit. IBM has these things, they book in on both coasts, one in San Francisco one in Boston, spring and fall. Great event, intimate event. 130, 150 chief data officers, learning, transferring knowledge, sharing ideas. Cayn Woodruff is here as the principle data scientist at IBM and she's joined by Ritesh Ororo, who is the director of digital analytics at HCL Technologies. Folks welcome to the Cube, thanks for coming on. >> Thank you >> Thanks for having us. >> You're welcome. So we're going to talk about data management, data engineering, we're going to talk about digital, as I said Ritesh because digital is in your title. It's a hot topic today. But Caryn let's start off with you. Principle Data Scientist, so you're the one that is in short supply. So a lot of demand, you're getting pulled in a lot of different directions. But talk about your role and how you manage all those demands on your time. >> Well, you know a lot of, a lot of our work is driven by business needs, so it's really understanding what is critical to the business, what's going to support our businesses strategy and you know, picking the projects that we work on based on those items. So it's you really do have to cultivate the things that you spend your time on and make sure you're spending your time on the things that matter and as Ritesh and I were talking about earlier, you know, a lot of that means building good relationships with the people who manage the systems and the people who manage the data so that you can get access to what you need to get the critical insights that the business needs, >> So Ritesh, data management I mean this means a lot of things to a lot of people. It's evolved over the years. Help us frame what data management is in this day and age. >> Sure, so there are two aspects of data in my opinion. One is the data management, another the data engineering, right? And over the period as the data has grown significantly. Whether it's unstructured data, whether it's structured data, or the transactional data. We need to have some kind of governance in the policies to secure data to make data as an asset for a company so the business can rely on your data. What you are delivering to them. Now, the another part comes is the data engineering. Data engineering is more about an IT function, which is data acquisition, data preparation and delivering the data to the end-user, right? It can be business, it can be third-party but it all comes under the governance, under the policies, which are designed to secure the data, how the data should be accessed to different parts of the company or the external parties. >> And how those two worlds come together? The business piece and the IT piece, is that where you come in? >> That is where data science definitely comes into the picture. So if you go online, you can find Venn diagrams that describe data science as a combination of computer science math and statistics and business acumen. And so where it comes in the middle is data science. So it's really being able to put those things together. But, you know, what's what's so critical is you know, Interpol, actually, shared at the beginning here and I think a few years ago here, talked about the five pillars to building a data strategy. And, you know, one of those things is use cases, like getting out, picking a need, solving it and then going from there and along the way you realize what systems are critical, what data you need, who the business users are. You know, what would it take to scale that? So these, like, Proof-point projects that, you know, eventually turn into these bigger things, and for them to turn into bigger things you've got to have that partnership. You've got to know where your trusted data is, you've got to know that, how it got there, who can touch it, how frequently it is updated. Just being able to really understand that and work with partners that manage the infrastructure so that you can leverage it and make it available to other people and transparent. >> I remember when I first interviewed Hilary Mason way back when and I was asking her about that Venn diagram and she threw in another one, which was data hacking. >> Caryn: Uh-huh, yeah. >> Well, talk about that. You've got to be curious about data. You need to, you know, take a bath in data. >> (laughs) Yes, yes. I mean yeah, you really.. Sometimes you have to be a detective and you have to really want to know more. And, I mean, understanding the data is like the majority of the battle. >> So Ritesh, we were talking off-camera about it's not how titles change, things evolve, data, digital. They're kind of interchangeable these days. I mean we always say the difference between a business and a digital business is how they have used data. And so digital being part of your role, everybody's trying to get digital transformation, right? As an SI, you guys are at the heart of it. Certainly, IBM as well. What kinds of questions are our clients asking you about digital? >> So I ultimately see data, whatever we drive from data, it is used by the business side. So we are trying to always solve a business problem, which is to optimize the issues the company is facing, or try to generate more revenues, right? Now, the digital as well as the data has been married together, right? Earlier there are, you can say we are trying to analyze the data to get more insights, what is happening in that company. And then we came up with a predictive modeling that based on the data that will statically collect, how can we predict different scenarios, right? Now digital, we, over the period of the last 10 20 years, as the data has grown, there are different sources of data has come in picture, we are talking about social media and so on, right? And nobody is looking for just reports out of the Excel, right? It is more about how you are presenting the data to the senior management, to the entire world and how easily they can understand it. That's where the digital from the data digitization, as well as the application digitization comes in picture. So the tools are developed over the period to have a better visualization, better understanding. How can we integrate annotation within the data? So these are all different aspects of digitization on the data and we try to integrate the digital concepts within our data and analytics, right? So I used to be more, I mean, I grew up as a data engineer, analytics engineer but now I'm looking more beyond just the data or the data preparation. It's more about presenting the data to the end-user and the business. How it is easy for them to understand it. >> Okay I got to ask you, so you guys are data wonks. I am too, kind of, but I'm not as skilled as you are, but, and I say that with all due respect. I mean you love data. >> Caryn: Yes. >> As data science becomes a more critical skill within organizations, we always talk about the amount of data, data growth, the stats are mind-boggling. But as a data scientist, do you feel like you have access to the right data and how much of a challenge is that with clients? >> So we do have access to the data but the challenge is, the company has so many systems, right? It's not just one or two applications. There are companies we have 50 or 60 or even hundreds of application built over last 20 years. And there are some applications, which are basically duplicate, which replicates the data. Now, the challenge is to integrate the data from different systems because they maintain different metadata. They have the quality of data is a concern. And sometimes with the international companies, the rules, for example, might be in US or India or China, the data acquisitions are different, right? And you are, as you become more global, you try to integrate the data beyond boundaries, which becomes a more compliance issue sometimes, also, beyond the technical issues of data integration. >> Any thoughts on that? >> Yeah, I think, you know one of the other issues too, you have, as you've heard of shadow IT, where people have, like, servers squirreled away under their desks. There's your shadow data, where people have spreadsheets and databases that, you know, they're storing on, like a small server or that they share within their department. And so you know, you were discussing, we were talking earlier about the different systems. And you might have a name in one system that's one way and a name in another system that's slightly different, and then a third system, where it's it's different and there's extra granularity to it or some extra twist. And so you really have to work with all of the people that own these processes and figure out what's the trusted source? What can we all agree on? So there's a lot of... It's funny, a lot of the data problems are people problems. So it's getting people to talk and getting people to agree on, well this is why I need it this way, and this is why I need it this way, and figuring out how you come to a common solution so you can even create those single trusted sources that then everybody can go to and everybody knows that they're working with the the right thing and the same thing that they all agree on. >> The politics of it and, I mean, politics is kind of a pejorative word but let's say dissonance, where you have maybe of a back-end syst6em, financial system and the CFO, he or she is looking at the data saying oh, this is what the data says and then... I remember I was talking to a, recently, a chef in a restaurant said that the CFO saw this but I know that's not the case, I don't have the data to prove it. So I'm going to go get the data. And so, and then as they collect that data they bring together. So I guess in some ways you guys are mediators. >> [Caryn And Ritesh] Yes, yes. Absolutely. >> 'Cause the data doesn't lie you just got to understand it. >> You have to ask the right question. Yes. And yeah. >> And sometimes when you see the data, you start, that you don't even know what questions you want to ask until you see the data. Is that is that a challenge for your clients? >> Caryn: Yes, all the time. Yeah >> So okay, what else do we want to we want to talk about? The state of collaboration, let's say, between the data scientists, the data engineer, the quality engineer, maybe even the application developers. Somebody, John Fourier often says, my co-host and business partner, data is the new development kit. Give me the data and I'll, you know, write some code and create an application. So how about collaboration amongst those roles, is that something... I know IBM's gone on about some products there but your point Caryn, it's a lot of times it's the people. >> It is. >> And the culture. What are you seeing in terms of evolution and maturity of that challenge? >> You know I have a very good friend who likes to say that data science is a team sport and so, you know, these should not be, like, solo projects where just one person is wading up to their elbows in data. This should be something where you've got engineers and scientists and business, people coming together to really work through it as a team because everybody brings really different strengths to the table and it takes a lot of smart brains to figure out some of these really complicated things. >> I completely agree. Because we see the challenges, we always are trying to solve a business problem. It's important to marry IT as well as the business side. We have the technical expert but we don't have domain experts, subject matter experts who knows the business in IT, right? So it's very very important to collaborate closely with the business, right? And data scientist a intermediate layer between the IT as well as business I will say, right? Because a data scientist as they, over the years, as they try to analyze the information, they understand business better, right? And they need to collaborate with IT to either improve the quality, right? That kind of challenges they are facing and I need you to, the data engineer has to work very hard to make sure the data delivered to the data scientist or the business is accurate as much as possible because wrong data will lead to wrong predictions, right? And ultimately we need to make sure that we integrate the data in the right way. >> What's a different cultural dynamic that was, say ten years ago, where you'd go to a statistician, she'd fire up the SPSS.. >> Caryn: We still use that. >> I'm sure you still do but run some kind of squares give me some, you know, probabilities and you know maybe run some Monte Carlo simulation. But one person kind of doing all that it's your point, Caryn. >> Well you know, it's it's interesting. There are there are some students I mentor at a local university and you know we've been talking about the projects that they get and that you know, more often than not they get a nice clean dataset to go practice learning their modeling on, you know? And they don't have to get in there and clean it all up and normalize the fields and look for some crazy skew or no values or, you know, where you've just got so much noise that needs to be reduced into something more manageable. And so it's, you know, you made the point earlier about understanding the data. It's just, it really is important to be very curious and ask those tough questions and understand what you're dealing with. Before you really start jumping in and building a bunch of models. >> Let me add another point. That the way we have changed over the last ten years, especially from the technical point of view. Ten years back nobody talks about the real-time data analysis. There was no streaming application as such. Now nobody talks about the batch analysis, right? Everybody wants data on real-time basis. But not if not real-time might be near real-time basis. That has become a challenge. And it's not just that prediction, which are happening in their ERP environment or on the cloud, they want the real-time integration with the social media for the marketing and the sales and how they can immediately do the campaign, right? So, for example, if I go to Google and I search for for any product, right, for example, a pressure cooker, right? And I go to Facebook, immediately I see the ad within two minutes. >> Yeah, they're retargeting. >> So that's a real-time analytics is happening under different application, including the third-party data, which is coming from social media. So that has become a good source of data but it has become a challenge for the data analyst and the data scientist. How quickly we can turn around is called data analysis. >> Because it used to be you would get ads for a pressure cooker for months, even after you bought the pressure cooker and now it's only a few days, right? >> Ritesh: It's a minute. You close this application, you log into Facebook... >> Oh, no doubt. >> Ritesh: An ad is there. >> Caryn: There it is. >> Ritesh: Because everything is linked either your phone number or email ID you're done. >> It's interesting. We talked about disruption a lot. I wonder if that whole model is going to get disrupted in a new way because everybody started using the same ad. >> So that's a big change of our last 10 years. >> Do you think..oh go ahead. >> oh no, I was just going to say, you know, another thing is just there's so much that is available to everybody now, you know. There's not this small little set of tools that's restricted to people that are in these very specific jobs. But with open source and with so many software-as-a-service products that are out there, anybody can go out and get an account and just start, you know, practicing or playing or joining a cackle competition or, you know, start getting their hands on.. There's data sets that are out there that you can just download to practice and learn on and use. So, you know, it's much more open, I think, than it used to be. >> Yeah, community additions of software, open data. The number of open day sources just keeps growing. Do you think that machine intelligence can, or how can machine intelligence help with this data quality challenge? >> I think that it's it's always going to require people, you know? There's always going to be a need for people to train the machines on how to interpret the data. How to classify it, how to tag it. There's actually a really good article in Popular Science this month about a woman who was training a machine on fake news and, you know, it did a really nice job of finding some of the the same claims that she did. But she found a few more. So, you know, I think it's, on one hand we have machines that we can augment with data and they can help us make better decisions or sift through large volumes of data but then when we're teaching the machines to classify the data or to help us with metadata classification, for example, or, you know, to help us clean it. I think that it's going to be a while before we get to the point where that's the inverse. >> Right, so in that example you gave, the human actually did a better job from the machine. Now, this amazing to me how.. What, what machines couldn't do that humans could, you know last year and all of a sudden, you know, they can. It wasn't long ago that robots couldn't climb stairs. >> And now they can. >> And now they can. >> It's really creepy. >> I think the difference now is, earlier you know, you knew that there is an issue in the data. But you don't know that how much data is corrupt or wrong, right? Now, there are tools available and they're very sophisticated tools. They can pinpoint and provide you the percentage of accuracy, right? On different categories of data that that you come across, right? Even forget about the structure data. Even when you talk about unstructured data, the data which comes from social media or the comments and the remarks that you log or are logged by the customer service representative, there are very sophisticated text analytics tools available, which can talk very accurately about the data as well as the personality of the person who is who's giving that information. >> Tough problems but it seems like we're making progress. All you got to do is look at fraud detection as an example. Folks, thanks very much.. >> Thank you. >> Thank you very much. >> ...for sharing your insight. You're very welcome. Alright, keep it right there everybody. We're live from the IBM CTO conference in San Francisco. Be right back, you're watching the Cube. (electronic music)
SUMMARY :
Brought to you by IBM. of the IBM CDO strategy summit. and how you manage all those demands on your time. and you know, picking the projects that we work on I mean this means a lot of things to a lot of people. and delivering the data to the end-user, right? so that you can leverage it and make it available about that Venn diagram and she threw in another one, You need to, you know, take a bath in data. and you have to really want to know more. As an SI, you guys are at the heart of it. the data to get more insights, I mean you love data. and how much of a challenge is that with clients? Now, the challenge is to integrate the data And so you know, you were discussing, I don't have the data to prove it. [Caryn And Ritesh] Yes, yes. You have to ask the right question. And sometimes when you see the data, Caryn: Yes, all the time. Give me the data and I'll, you know, And the culture. and so, you know, these should not be, like, and I need you to, the data engineer that was, say ten years ago, and you know maybe run some Monte Carlo simulation. and that you know, more often than not And I go to Facebook, immediately I see the ad and the data scientist. You close this application, you log into Facebook... Ritesh: Because everything is linked I wonder if that whole model is going to get disrupted that is available to everybody now, you know. Do you think that machine intelligence going to require people, you know? Right, so in that example you gave, and the remarks that you log All you got to do is look at fraud detection as an example. We're live from the IBM CTO conference
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Ritesh Ororo | PERSON | 0.99+ |
Caryn | PERSON | 0.99+ |
John Fourier | PERSON | 0.99+ |
Ritesh | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
US | LOCATION | 0.99+ |
50 | QUANTITY | 0.99+ |
Cayn Woodruff | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
India | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Excel | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
Caryn Woodruff | PERSON | 0.99+ |
Ritesh Arora | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
60 | QUANTITY | 0.99+ |
130 | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Monte Carlo | TITLE | 0.99+ |
HCL Technologies | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
third system | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Interpol | ORGANIZATION | 0.98+ |
ten years ago | DATE | 0.98+ |
two applications | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
Parc 55 | LOCATION | 0.98+ |
five pillars | QUANTITY | 0.98+ |
one system | QUANTITY | 0.98+ |
ORGANIZATION | 0.97+ | |
two aspects | QUANTITY | 0.97+ |
both coasts | QUANTITY | 0.97+ |
one person | QUANTITY | 0.96+ |
Ten years back | DATE | 0.96+ |
two minutes | QUANTITY | 0.95+ |
this month | DATE | 0.95+ |
Union Square | LOCATION | 0.95+ |
two worlds | QUANTITY | 0.94+ |
Spring 2018 | DATE | 0.94+ |
Popular Science | TITLE | 0.9+ |
CTO | EVENT | 0.88+ |
days | QUANTITY | 0.88+ |
one way | QUANTITY | 0.87+ |
SPSS | TITLE | 0.86+ |
single trusted sources | QUANTITY | 0.85+ |
Venn | ORGANIZATION | 0.84+ |
few years ago | DATE | 0.84+ |
150 chief data officers | QUANTITY | 0.83+ |
last 10 20 years | DATE | 0.83+ |
Officer Strategy Summit 2018 | EVENT | 0.82+ |
hundreds of application | QUANTITY | 0.8+ |
last 10 years | DATE | 0.8+ |
Cube | COMMERCIAL_ITEM | 0.79+ |
IBM Chief | EVENT | 0.79+ |
IBM CDO strategy summit | EVENT | 0.72+ |
last ten years | DATE | 0.7+ |
IBM CDO Summit | EVENT | 0.7+ |
fall | DATE | 0.68+ |
Cube | TITLE | 0.66+ |
spring | DATE | 0.65+ |
last 20 years | DATE | 0.63+ |
minute | QUANTITY | 0.49+ |
Sharad Singhal, The Machine & Matthias Becker, University of Bonn | HPE Discover Madrid 2017
>> Announcer: Live from Madrid, Spain, it's theCUBE, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is theCUBE, the leader in live tech coverage and my name is Dave Vellante, and I'm here with Peter Burris, this is day two of HPE Hewlett Packard Enterprise Discover in Madrid, this is their European version of a show that we also cover in Las Vegas, kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sharad Singal is here, he covers software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker, who's a postdoctoral researcher at the University of Bonn. Gentlemen, thanks so much for coming in theCUBE. >> Thank you. >> No problem. >> You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about, you know, something just more important, right? We're talking about lives and the human condition and >> Peter: Hard problems to solve. >> Specifically, yeah, hard problems like Alzheimer's. So Sharad, why don't we start with you, maybe talk a little bit about what this initiative is all about, what the partnership is all about, what you guys are doing. >> So we started on a project called the Machine Project about three, three and a half years ago and frankly at that time, the response we got from a lot of my colleagues in the IT industry was "You guys are crazy", (Dave laughs) right. We said we are looking at an enormous amount of data coming at us, we are looking at real time requirements on larger and larger processing coming up in front of us, and there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data, and we have to rethink how we do computing, and the real question for those of us who are in research in Hewlett Packard Labs, was if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past, this should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years, in terms of computing architectures and what we will do there. In the last three years we have gone from ideas and paper study, paper designs, and things which were made out of plastic, to a real working system. We have around Las Vegas time, we'd basically announced that we had the entire system working with actual applications running on it, 160 terabytes of memory all addressable from any processing core in 40 computing nodes around it. And the reason is, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture, as opposed to the processor, and any processor can return to any part of the data directly as if it was doing, addressing in local memory. This provides us with a degree of flexibility and freedom in compute that we never had before, and as a software person, I work in software, as a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of us in the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers, where we said, you know, this algorithm I had written off decades ago because it didn't work, but now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden, a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications, ranging from just Spark on this kind of an environment, to how do you do large scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order of, oh I can get you a 30% improvement. We are saying in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. >> So many orders of magnitude. >> Many, many orders >> When you don't have to wait for the horrible storage stack. (laughs) >> That's right, right. And these kinds of results gave us the hope that as we look forward, all of us in these new computing architectures that we are thinking through right now, will take us through this data mountain, data tsunami that we are all facing, in terms of bringing all of the data back and essentially doing real-time work on those. >> Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. >> So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases, and in their mission they are facing all diseases like Alzheimer's, Parkinson's, Multiple Sclerosis, and so on. And in particular Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improve, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much we currently can do, so the DZNE is focusing on their research efforts together with the German government in this direction, and one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that will point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE they have started on a cohort study. In the area around Bonn, they are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years, and in this process we generate a lot of data, so of course we do the usual surveys to learn a bit about them, we learn about their environments. But we also do very more detailed analysis, so we take blood samples and we analyze the complete genome, and also we acquire imaging data from the brain, so we do an MRI at an extremely high resolution with some very advanced machines we have, and all this data is accumulated because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study, so that we can later analyze the time series when in 10 years someone develops Alzheimer's we can go back through the data and see, maybe there's something interesting in there, maybe there was one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we need something new to analyze this data, and to deal with this, and when we heard about the machine, we though immediately this is a system that we would need. >> Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts, I used to live there, in Framingham, Massachusetts, >> Dave: I was actually born in Framingham. >> You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and relationship between smoking and cancer, and other really interesting problems. But they used a paper-based study with an interview base, so for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte and a half of data. You just described a couple of gigabytes of data per person, 30,000, multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data, to then do something with it, so you can generate those important inferences. >> Exactly, so with all these large amounts of data we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we have captured with each other. So there's really a lot of things we have to parse and compare. >> This brings together the idea that it's not just the volume of data. I also have to do analytics and cross all of that data together, right, so every time a scientist, one of the people who is doing biology studies or informatic studies asks a question, and they say, I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease, they then want to go through all of that data, and analyze it as as they are asking the question. Now if the amount of compute it takes to actually answer their questions takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. >> But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions, and use a small amount of data. The technology to do that's been around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem, we're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I gotta presume that there's the quid pro quo for getting people into the study, is that, you know, 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on, have I got that right? >> So, we're trying to do that, but also there are limits to this, of course. >> Of course. >> For us it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies. >> To help future generations. >> So that's not necessarily quid pro quo. >> Okay, there isn't, okay. But still, the knowledge is enough for them. >> Yeah, their incentive is they're gonna help people who have this disease down the road. >> I mean if it is not me, if it helps society in general, people are willing to do a lot. >> Yeah of course. >> Oh sure. >> Now the machine is not a product yet that's shipping, right, so how do you get access to it, or is this sort of futures, or... >> When we started talking to one another about this, we actually did not have the prototype with us. But remember that when we started down this journey for the machine three years ago, we know back then that we would have hardware somewhere in the future, but as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. It does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack, using emulation and simulation environments, where we have some simulators with essentially instruction level simulator for what the machine does, or what that prototype would have done, and we were running code on top of those simulators. We also had performance simulators, where we'd say, if we write the application this way, this is how much we think we would gain in terms of performance, and all of those applications on all of that code we were writing was actually on our large memory machines, Superdome X to be precise. So by the time we started talking to them, we had these emulation environments available, we had experience using these emulation environments on our Superdome X platform. So when they came to us and started working with us, we took their software that they brought to us, and started working within those emulation environments to see how fast we could make those problems, even within those emulation environments. So that's how we started down this track, and most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdome X platform. So even in that emulated environment, which is emulating the machine now, on course in the emulation Superdome X, for example, I can only hold 24 terabytes of data in memory. I say only 24 terabytes >> Only! because I'm looking at much larger systems, but an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right, they won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us we then started looking for problems, and I'll let Matthias comment on the problems that they brought to us, and then we can talk about how we actually solved those problems. >> So we work a lot with genomics data, and usually what we do is we have a pipeline so we connect multiple tools, and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the new optimal solution. So prior work was taking up to six days for processing, they were able to cut it to 22 minutes, and we thought, okay, this is a perfect challenge for our collaboration, and we went ahead and we took this tool, we put it on the Superdome X that was already running and stepped five minutes instead of just 22, and then we started modifying the code and in the end we were able to shrink the time down to just 30 seconds, so that's two magnitudes faster. >> We took something which was... They were able to run in 22 minutes, and that was already had been optimized by people in the field to say "I want this answer fast", and then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment, so the serious coding was starting in around March. By June we had 9X improvement, which is already a factor of 10, and since June up to now, we have gotten another factor of 10 on that application. So I'm now at a 100X faster than what the application was able to do before. >> Dave: Two orders of magnitude in a year? >> Sharad: In a year. >> Okay, we're out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? >> For us, we're really aiming to analyze our data in real time. Oftentimes when we have biological questions that we address, we analyze our data set, and then in a discussion a new question comes up, and we have to say, "Sorry, we have to process the data, "come back in a week", and our idea is to be able to generate these answers instantaneously from our data. >> And those answers will lead to what? Just better care for individuals with Alzheimer's, or potentially, as you said, making Alzheimer's a memory. >> So the idea is to identify Alzheimer long before the first symptoms are shown, because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. >> Well thank you for your great work, gentlemen, and best of luck on behalf of society, >> Thank you very much >> really appreciate you coming on theCUBE and sharing your story. You're welcome. All right, keep it right there, buddy. Peter and I will be back with our next guest right after this short break. This is theCUBE, you're watching live from Madrid, HPE Discover 2017. We'll be right back.
SUMMARY :
brought to you by Hewlett Packard Enterprise. that we also cover in Las Vegas, So Sharad, why don't we start with you, and frankly at that time, the response we got When you don't have to computing architectures that we are thinking through and how this technology gives you possible hope and in this process we generate a lot of data, So Dave lives in Massachusetts, I used to live there, is the Framingham Heart Study, which tracked people that we have captured with each other. Now if the amount of compute it takes to actually the Framingham Heart Study, you know, there are limits to this, of course. and people are really willing to donate everything So that's not necessarily But still, the knowledge is enough for them. people who have this disease down the road. I mean if it is not me, if it helps society in general, Now the machine is not a product yet and most of the results we have shown in the study that they brought to us, and then we can talk about and in the end we were able to shrink the time based on the emulation results we were seeing underneath, and we have to say, "Sorry, we have to process the data, Just better care for individuals with Alzheimer's, So the idea is to identify Alzheimer Peter and I will be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Neil | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jonathan | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Ajay Patel | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
$3 | QUANTITY | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Jonathan Ebinger | PERSON | 0.99+ |
Anthony | PERSON | 0.99+ |
Mark Andreesen | PERSON | 0.99+ |
Savannah Peterson | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Paul Gillin | PERSON | 0.99+ |
Matthias Becker | PERSON | 0.99+ |
Greg Sands | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jennifer Meyer | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Target | ORGANIZATION | 0.99+ |
Blue Run Ventures | ORGANIZATION | 0.99+ |
Robert | PERSON | 0.99+ |
Paul Cormier | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
OVH | ORGANIZATION | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Sony | ORGANIZATION | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Robin | PERSON | 0.99+ |
Red Cross | ORGANIZATION | 0.99+ |
Tom Anderson | PERSON | 0.99+ |
Andy Jazzy | PERSON | 0.99+ |
Korea | LOCATION | 0.99+ |
Howard | PERSON | 0.99+ |
Sharad Singal | PERSON | 0.99+ |
DZNE | ORGANIZATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
$2.7 million | QUANTITY | 0.99+ |
Tom | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Matthias | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Jesse | PERSON | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
Xavier Poisson, HPE and Craig McLellan, ThinkOn - HPE Discover 2017
>> Announcer: Live from Las Vegas, it's theCUBE covering HPE Discover 2017 brought to you by Hewlett-Packard Enterprise. >> Welcome back everyone. We're here live in Las Vegas with theCUBE's coverage of HPE Discover 2017. I'm John Furrier with Silicon Angle. My co-host David Vellante. David with Silicon Angle and Wikibon. Our next is Xavier Poisson, VP in Indirect Digital Services at HPE and Craig McClellan, founder of ThinkOn. Guys, welcome to theCUBE, welcome back. I know Dave interviewed you in London. I wasn't there, but welcome to theCUBE. >> Thank you. >> So Xavier I got to congratulate you on the prestigious cloud leadership award in 2017. >> Xavier: Oh my. >> So congratulations- >> Xavier: Thank you. >> On the Data Cloud Europe prestigious award. >> Yeah, it was announced yesterday in Monte Carlo and I believe it is a good recognition from the industry about what we have been doing. But not only me, you know, but as a collective work with our partners, with the HP people. And really to bring the best of the value of cloud to our customers. >> So Monte Carlo, Vegas, okay. Tough choices. >> I'd like to go to Monte Carlo. It's not a bad place to visit, hang out. Cloud 28 is really expanding, really kind of lightning in a bottle with what you've been doing so this speaks to the general industry trend, the way that you're riding with cloud and enterprise. Talk about why Cloud 28's doing so well and what's the dynamic, what's the driver? >> Well, you know, I take back of the prize, we believe that the customer deserves to know more and they need to have their choice. And also that our partners are paying a significant role to make it happen because we cannot believe that one single company will do everything. So the digital transformation of our customers is involving that more and more capabilities are put in place in order that we answer the right needs at the right moment in the right geography. And this was, you know, the foundation of Cloud 28 was to make it happen like that. We call it, you know, how you can make a global ecosystem in the sense of the sharing economy, putting the resources together and at the ready that one single partner can find with another one the way to achieve his goal instead of thinking, "I will do it myself" and I will lose my customer at the end of the day. And they may not know it, but the customers recognize that so this is the reason why I believe it's growing and it's growing fast. >> And the open source community is really expanding as well and if you look at the technology providers from the global system integrators down to the front lines of channel partners, cloud is changing the game. Customers expect co-existence. Craig, you're in the middle of all this. What is some of the front line dynamics with customers because they're going to be getting a lot of services from a variety of different vendors and suppliers, no one size fits all anymore. >> That's so true, more than ever. I think it falls into three categories. One is all the customers expect partners and their service providers to focus on integration with others, treat each other as peers, whether you call it collaboration or coop-itition it's still an issue that the customer, more than ever, is expecting their providers to facilitate. Secondly, they're very impatient. Everything is about now or five minutes ago and there is very low tolerance for the traditional engagement model. And the third item is technology's changing so fast that the customers, in many cases, have stopped trying to stay on top of it and they're now looking for service providers to be, effectively, their proxy with the underlying developers. >> The patient thing is a good point. I want to drill into that because what we're seeing as a move to cloud highlights the anti-waterfall concept, which was really great for project management back in the days of ERPs and those 18 month to 24 months POCs. Now, you know, people are under a lot of pressure to drive top line revenue and cost consolidations so cloud can give you that. So how has that changed the nature of the customer? Obviously they're impatient, but how has that changed structurally how they engage with partners? >> So what I experience in our day to day is the customers are eager to fail fast. Failure is acceptable outcome as long as it doesn't take them 12 months to 18 months. They're also expecting service providers to embrace a similar dev ops mentality where they're looking for service providers to be innovating all the time. So there is some forgiveness, I think, that occurs from the customer base if we're all in this together, but they really, back to what I said earlier, they just do not tolerate we'll meet next Thursday and talk about it. They really want to move today. >> David: Action, they want the action. >> So Craig, talk a little bit more about ThinkOn, sort of, why you founded the company. What's your journey been like? I'm really interested in the transformation that has been affected as a result of Cloud 28. >> So we believe very strongly in ecosystems, participating ecosystems. We're a wholesale provider so we enable the traditional vars to go to market faster and we look to the Cloud 28 marketplace as just another example of ecosystem where traction inside the ecosystem is growing faster than if we were to do everything ourselves. So not only do we embrace the notion of partnerships, we also leverage the channel to help them develop faster go to market strategies in their chosen niches. >> So how did it work? How did you guys engage? Xavier do you find partners like this? Do they come to you? They're already part of the ecosystem. >> So really it's both sides. Sometimes, yes, we discuss. I believe HP has a responsibility to discuss with our partners to explain that the world is changing and there is an opportunity. So we do our job and creating a relationship with Craig has been done by the HP team in the country. And diversity matters. We need to respect also what is happening into the country. The ecosystem and the way business is done into the country so in this case it was HP. Some other cases, and I have a very good example it was in New York, the eComm manager of var was called by the var to say, "I want to join, how I can get in touch "with carton tier plus because I see the opportunity "to partner with some other vendors, "meaning ISVs or SIs and I want to be there." So it is both sides. We have a lot of calls from ISVs because a software vendor is developing applications and, as you said Craig, it's going very, very fast with cloud native development. So you have more and more startups coming and developing new products and they want to reach market very, very quickly. And with the exposure that we have because we are world wide and we started in Europe and Eastern Africa, but we are developing Cloud 28+ now from December onwards in The United States of America, in Canada, Latin America, in Asia Pacific. You would be amazed what is happening in India, for instance, where cloud is just popping up and where all the good ideas are coming. So it is both sides, either from HP engaging with our partner saying, "okay there is an opportunity, "do you want to join?" Or sometimes, as I said, it is the partners reaching on us saying, "we want to be there, we want to accelerate with you." >> Now give us some metrics on the program. >> So, as of today, so remember we opened the platform, it was in December '15 and worked together in London if you remember. >> John: Yeah, absolutely. >> As of today's 18 months after 500 members. It's amazing, 500 members. We cover more than 300 data centers of our partners, like the ones of Craig. 300. And we have published nearly 18,000 cloud services on the platform out of 2,000 unique and we have nearly now 40,000 hits per month on the website. It's really amazing. I can tell you it's a snowball effect and it's not only the end user customers, but we have a lot of traffic inside the platform between members while building new offering. So, for instance, we have been speaking here at Discover of the Automoción Ferias that has been announced running on Discover. This is coming out of Cloud 28+, typically, and we see that there. There is another offering that HP pont next is proposing now as a service, which is a legal identity by Lay-kwah, which is a software company in the Nordics, coming out of Cloud 28+. So expanding dramatically. >> So this really highlights the pay as you go cloud business model. >> Xavier: Yeah. >> And it gives ISVs and vars and vabs the portfolio approach. So they're kind of organically putting this together versus the old channel model of predefined programs and products being shipped out to partners. You can pop services in here and then your customers can roll their own solutions. >> Craig: That's right. >> David: Am I getting that right? >> Absolutely, I also think that one of the things that's a real value add is- a lot of organizations are concerned about vendor lock-in. And when you build a consortium, like what HPE has done, it forces the service providers to participate in a way that avoids lock-in. Every service provider wants to build a lock-in strategy, but there are subtle ways that you can do it that aren't offensive and then there are offensive ways and I think the Cloud 28 consortium is really doing a good job on giving customers the comfort that they can adopt services, but they're not locked in. >> George: Let's call it sticky. >> There you go. >> What's the best way for somebody in the channel to create stickiness and loyalty with their customers? >> In my experience, they have an existing ecosystem that they've been working with for a long time, whether it's HPE or a Veeam or another software vendor and that's an ecosystem that their sales organization understands. That's an ecosystem that their own support organization understands. I think you should always start a nice simple step within an ecosystem you already know and then take the next step, turn it into a recurring revenue stream without trying to start from scratch. Blank slate is always exciting to the people that are paid to do it, but unfortunately the outcome is usually not on time and on budget, but there's lots of little steps you can take with existing ecosystem partners. >> Kind of familiarity, you know, ease of doing business. >> Yep. >> You know, track record, all those kinds of things. >> Craig: Customer trust. >> So, I mean, we use the term lock-in but that's sort of, that's what we're really trying to achieve is trust and loyalty. >> The new lock-in is scale, openness, and trust. Question on some of the technical things. I mean, channels are always been a beautiful thing and direct to sales is a great cost per order dollar, the numbers are great, but you got to get it going, right? You got the flywheel going with Cloud 28. How do you nurture this? I mean obviously it's organic, there's some community involved, training, and getting out there, I mean, how is it running? I'm just trying to understand. This is a really good formula. Is there a magical formula? Is there certain training? Is it done in the community peer to peer? >> So it is amazing because it is driven by listening to the people and, I would say, educating everybody in the value chain and the sales people at HP, the pre-sales at HP, and the people within our partners and the end user customer that they need to think business outcome. And once you shift from transactional selling to thinking business outcome, all the things are getting together because you think what your customer and your customer's customer wants to do and how you will help you customer to achieve his business goals. And you spoke about agility, time to market. These are things you can create with assembling all what is into Cloud 28+. I have a big example. We used our Cloud 28+ to answer a multi-million dollar RFPs. Why? Because multi-cloud is a reality so large governments, enterprises wants to deploy clouds in many areas, not always putting everything in the same data center. They want it so you have a good mix of technologies, a good mix of usage, and then you end with RFPs which are giant. And especially when everything is coming to IoT, to the storing of data. You need to have data analytics, hyper for most companies, it is becoming a nightmare. So we had a very good example with a big RFP in Europe. It was all about connecting all the open data that are produced by satellites in the sky and to put all this data available for all the sam-vees in Europe. I can tell you, it was very complicated to do. You would not believe me. In less than three weeks, we were able to discuss with the right partners inside Cloud 28+ to be the consortium onto beat. Three weeks. It was unbelievable. >> Well the thing about cloud too, as you get into these horizontally scalable data opportunities, you also need specialism, you need to have expertise. And that, to me, really is an application-specific, not peddling product. You actually, to your outcome perspective, you're solution-providing, right? It's back to listening. So, okay final thoughts guy, HP Discover 2017. What's the takeaway, Craig? So this year what's the big story? Obviously we heard Meg Whitman, you know, compute is kind of being redefined and scaling. What's the big story here from your perspective? >> For me I was excited to hear about the customer having a more open mind about where to put workload. I would say two years ago there was this mad rush to the cloud without really understanding the cloud and now there's a more seasoned reality is that workload has a multitude of locations where it can be. And I've been saying this for a long time, but as a small organization in Canada not everyone's listening. >> David: Well you're nibbling on the front line. >> That's right. So it's nice to hear that it's being seen around the world in the enterprise space. That's my big takeaway. >> John: Xavier, thoughts? >> I believe that Hewlett-Packard Enterprise is interest and confidence about the journey we have designed with Meg Whitman. We have to cross different phases of transformation, it is not finished. But more than every, we put the customer in front of the discussion. You know, when you have been, perhaps, listening about this new start that was pre-announced there, I was thrilled with the process. This product has been built just because it was by essence connected. When they were designing the product, to Cloud 28+ that would be a resource provider for the new start. This is the way we invent product now. So we put the customer and the channel partners and the ecosystems in the center of the design of the products that we are doing. So it's no longer a product I'm selling, it is a product that is ready to be sold because it is fitting customer or channel partner outcomes. This is a big transformation of today's. >> And I would just say, one of my observation is, again, education on the cloud is key and then, you know, this ability of tailoring solutions not a one size fits all. You know, here's hyper converged or here's composability. >> Exactly. >> Having the customer mix and match whatever they need. Guys, great conversation here inside theCUBE. HPE Discover 2017, this is theCUBE, I'm John Furrier with David Vellante we'll be back with more live coverage. Stay with us after this short break. (upbeat music)
SUMMARY :
brought to you by Hewlett-Packard Enterprise. I know Dave interviewed you in London. So Xavier I got to congratulate you And really to bring the best of the value So Monte Carlo, Vegas, okay. so this speaks to the general industry trend, So the digital transformation of our customers is involving and if you look at the technology providers and their service providers to focus So how has that changed the nature of the customer? is the customers are eager to fail fast. I'm really interested in the transformation to go to market faster and we look Do they come to you? to discuss with our partners to explain So, as of today, so remember we opened the platform, and it's not only the end user customers, as you go cloud business model. and products being shipped out to partners. of the things that's a real value add is- to the people that are paid to do it, to achieve is trust and loyalty. Is it done in the community peer to peer? and the sales people at HP, the pre-sales at HP, Well the thing about cloud too, as you get into about the customer having a more open mind So it's nice to hear that it's being seen and confidence about the journey we and then, you know, this ability Having the customer mix and match whatever they need.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
David Vellante | PERSON | 0.99+ |
Craig | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
London | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
Meg Whitman | PERSON | 0.99+ |
John | PERSON | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Xavier | PERSON | 0.99+ |
Craig McClellan | PERSON | 0.99+ |
December '15 | DATE | 0.99+ |
Canada | LOCATION | 0.99+ |
India | LOCATION | 0.99+ |
New York | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
12 months | QUANTITY | 0.99+ |
December | DATE | 0.99+ |
Asia Pacific | LOCATION | 0.99+ |
Hewlett-Packard Enterprise | ORGANIZATION | 0.99+ |
24 months | QUANTITY | 0.99+ |
Monte Carlo | LOCATION | 0.99+ |
Discover | ORGANIZATION | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
Xavier Poisson | PERSON | 0.99+ |
Cloud 28+ | TITLE | 0.99+ |
18 month | QUANTITY | 0.99+ |
18 months | QUANTITY | 0.99+ |
Nordics | LOCATION | 0.99+ |
Craig McLellan | PERSON | 0.99+ |
Eastern Africa | LOCATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
ThinkOn | ORGANIZATION | 0.99+ |
third item | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
both sides | QUANTITY | 0.99+ |
500 members | QUANTITY | 0.99+ |
less than three weeks | QUANTITY | 0.99+ |
next Thursday | DATE | 0.99+ |
Latin America | LOCATION | 0.99+ |
Veeam | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
Silicon Angle | ORGANIZATION | 0.99+ |
Secondly | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
more than 300 data centers | QUANTITY | 0.99+ |
Vegas | LOCATION | 0.98+ |
One | QUANTITY | 0.98+ |
Three weeks | QUANTITY | 0.98+ |
two years ago | DATE | 0.98+ |
Cloud 28 | TITLE | 0.97+ |