Image Title

Search Results for Rajeem:

Caitlin Halferty & Carlo Appugliese, IBM | IBM CDO Summit 2019


 

>> live from San Francisco, California. It's the Q covering the IBM Chief Data Officer Summit brought to you by IBM. >> Welcome back to Fisherman's Fisherman's Wharf in San Francisco. Everybody, my name is David wanted. You're watching the Cube, the leader in live tech coverage, you ought to events. We extract the signal from the noise. We're here. The IBM CDO event. This is the 10th anniversary of this event. Caitlin Hallford is here. She's the director of a I Accelerator and client success at IBM. Caitlin, great to see you again. Wow. 10 years. Amazing. They and Carlo Apple Apple Glace e is here. Who is the program director for data and a I at IBM. Because you again, my friend. Thanks for coming on to Cuba. Lums. Wow, this is 10 years, and I think the Cube is covered. Probably eight of these now. Yeah, kind of. We bounce between San Francisco and Boston to great places for CEOs. Good places to have intimate events, but and you're taking it global. I understand. Congratulations. Congratulations on the promotion. Thank you. Going. Thank you so much. >> So we, as you know well are well, no. We started our chief date officer summits in San Francisco here, and it's gone 2014. So this is our 10th 1 We do two a year. We found we really have a unique cohort of clients. The join us about 100 40 in San Francisco on the spring 140 in Boston in the fall, and we're here celebrating the 10th 10 Summit. >> So, Carlo, talk about your role and then let's get into how you guys, you know, work together. How you hand the baton way we'll get to the client piece. >> So I lead the Data Center League team, which is a group within our product development, working side by side with clients really to understand their needs as well developed, use cases on our platform and tools and make sure we are able to deliver on those. And then we work closely with the CDO team, the global CEO team on best practices, what patterns they're seeing from an architecture perspective. Make sure that our platforms really incorporating that stuff. >> And if I recall the data science that lead team is its presales correct and could >> be posted that it could, it really depends on the client, so it could be prior to them buying software or after they bought the software. If they need the help, we can also come in. >> Okay, so? So it can be a for pay service. Is that correct or Yeah, we can >> before pay. Or sometimes we do it based on just our relation with >> It's kind of a mixed then. Right? Okay, so you're learning the client's learning, so they're obviously good, good customers. And so you want to treat him right >> now? How do you guys work >> together? Maybe Caitlin, you can explain. The two organizations >> were often the early testers, early adopters of some of the capabilities. And so what we'll do is we'll test will literally will prove it out of skill internally using IBM itself as an example. And then, as we build out the capability, work with Carlo and his team to really drive that in a product and drive that into market, and we share a lot of client relationships where CEOs come to us, they're want advice and counsel on best practices across the organization. And they're looking for latest applications to deploy deploy known environments and so we can capture a lot of that feedback in some of the market user testing proved that out. Using IBM is an example and then work with you to really commercialized and bring it to market in the most efficient manner. >> You were talking this morning. You had a picture up of the first CDO event. No Internet, no wife in the basement. I love it. So how is this evolved from a theme standpoint? What do you What are the patterns? Sure. So when >> we started this, it was really a response. Thio primarily financial service is sector regulatory requirements, trying to get data right to meet those regulatory compliance initiatives. Defensive posture certainly weren't driving transformation within their enterprises. And what I've seen is a couple of those core elements are still key for us or data governance and data management. And some of those security access controls are always going to be important. But we're finding his videos more and more, have expanded scope of responsibilities with the enterprise they're looked at as a leader. They're no longer sitting within a c i o function there either appear or, you know, working in partnership with, and they're driving enterprise wide, you know, initiatives for the for their enterprises and organizations, which has been great to see. >> So we all remember when you know how very and declared data science was gonna be the number one job, and it actually kind of has become. I think I saw somewhere, maybe in Glass door was anointed that the top job, which is >> kind of cool to see. So what are you seeing >> with customers, Carlo? You guys, you have these these blueprints, you're now applying them, accelerating different industries. You mentioned health care this morning. >> What are some >> of those industry accelerators And how is that actually coming to fruition? Yes. >> So some of the things we're seeing is speaking of financial clients way go into a lot of them. We do these one on one engagements, we build them from custom. We co create these engineering solutions, our platform, and we're seeing patterns, patterns around different use cases that are coming up over and over again. And the one thing about data science Aye, aye. It's difficult to develop a solution because everybody's date is different. Everybody's business is different. So what we're trying to do is build these. We can't just build a widget that's going to solve the problem, because then you have to force your data into that, and we're seeing that that doesn't really work. So building a platform for these clients. But these accelerators, which are a set of core code source code notebooks, industry models in terms a CZ wells dashboards that allow them to quickly build out these use cases around a turn or segmentation on dhe. You know some other models we can grab the box provide the models, provide the know how with the source code, as well as a way for them to train them, deploy them and operationalize them in an organization. That's kind of what we're doing. >> You prime the pump >> prime minute pump, we call them there right now, we're doing client in eights for wealth management, and we're doing that, ref SS. And they come right on the box of our cloudpack for data platform. You could quickly click and install button, and in there you'll get the sample data files. You get no books. You get industry terms, your governance capability, as well as deployed dashboards and models. >> So talk more about >> cloudpack for data. What's inside of that brought back the >> data is a collection of micro Service's Andi. It includes a lot of things that we bring to market to help customers with their journey things from like data ingestion collection to all the way Thio, eh? I model development from building your models to deploying them to actually infusing them in your business process with bias detection or integration way have a lot of capability. Part >> of it's actually tooling. It's not just sort of so how to Pdf >> dualism entire platform eso. So the platform itself has everything you need an organization to kind of go from an idea to data ingestion and governance and management all the way to model training, development, deployment into integration into your business process. >> Now Caitlin, in the early days of the CDO, saw CDO emerging in healthcare, financialservices and government. And now it's kind of gone mainstream to the point where we had Mark Clare on who's the head of data neighborhood AstraZeneca. And he said, I'm not taking the CDO title, you know, because I'm all about data enablement and CDO. You know, title has sort of evolved. What have you seen? It's got clearly gone mainstream Yep. What are you seeing? In terms of adoption of that, that role and its impact on organizations, >> So couple of transit has been interesting both domestically and internationally as well. So we're seeing a lot of growth outside of the U. S. So we did our first inaugural summit in Tokyo. In Japan, there's a number of day leaders in Japan that are really eager to jump start their transformation initiatives. Also did our first Dubai summit. Middle East and Africa will be in South Africa next month at another studio summit. And what I'm seeing is outside of North America a lot of activity and interest in creating an enabling studio light capability. Data Leader, Like, um, and some of these guys, I think we're gonna leapfrog ahead. I think they're going to just absolutely jump jump ahead and in parallel, those traditional industries, you know, there's a new federal legislation coming down by year end for most federal agencies to appoint a chief data officer. So, you know, Washington, D. C. Is is hopping right now, we're getting a number of agencies requesting advice and counsel on how to set up the office how to be successful I think there's some great opportunity in those traditional industries and also seeing it, you know, outside the U. S. And cross nontraditional, >> you say >> Jump ahead. You mean jump ahead of where maybe some of the U. S. >> Absolute best? Absolutely. And I'm >> seeing a trend where you know, a lot of CEOs they're moving. They're really closer to the line of business, right? They're moving outside of technology, but they have to be technology savvy. They have a team of engineers and data scientists. So there is really an important role in every organization that I'm seeing for every client I go to. It's a little different, but you're right, it's it's definitely up and coming. Role is very important for especially for digital transformation. >> This is so good. I was gonna say one of the ways they are teens really, partner Well, together, I think is weaken source some of these in terms of enabling that you know, acceleration and leap frog. What are those pain points or use cases in traditional data management space? You know, the metadata. So I think you talk with Steven earlier about how we're doing some automated meditate a generation and really using a i t. O instead of manually having to label and tag that we're able to generate about 85% of our labels internally and drive that into existing product. Carlos using. And our clients are saying, Hey, we're spending, you know, hundreds of millions of dollars and we've got teams of massive teams of people manual work. And so we're able to recognize it, adopts something like that, press internally and then work with you guys >> actually think of every detail developer out there that has to go figure out what this date is. If you have a tool which we're trying to cooperate the platform based on the guidance from the CDO Global CEO team, we can automatically create that metadata are likely ingested and provide into platform so that data scientists can start to get value out >> of it quickly. So we heard Martin Schroeder talked about digital trade and public policy, and he said there were three things free flow of data. Unless it doesn't make sense like personal information prevent data localization mandates, yeah, and then protect algorithms and source code, which is an I P protection thing. So I'm interested in how your customers air Reacting to that framework, I presume the protect the algorithms and source code I p. That's near and dear right? They want to make sure that you're not taking models and then giving it to their competitors. >> Absolutely. And we talk about that every time we go in there and we work on projects. What's the I p? You know, how do we manage this? And you know, what we bring to the table with the accelerators is to help them jump start them right, even though that it's kind of our a p we created, but we give it to them and then what they derive from that when they incorporate their data, which is their i p, and create new models, that is then their i. P. So those air complicated questions and every company is a little different on what they're worried about with that, so but many banks, we give them all the I P to make sure that they're comfortable and especially in financial service is but some other spaces. It's very competitive. And then I was worried about it because it's, ah, known space. A lot of the algorithm for youse are all open source. They're known algorithms, so there's not a lot of problem there. >> It's how you apply them. That's >> exactly right how you apply them in that boundary of what >> is P, What's not. It's kind of >> fuzzy, >> and we encourage our clients a lot of times to drive that for >> the >> organisation, for us, internally, GDP, our readiness, it was occurring to the business unit level functional area. So it was, you know, we weren't where we needed to be in terms of achieving compliance. And we have the CEO office took ownership of that across the business and got it where we needed to be. And so we often encourage our clients to take ownership of something like that and use it as an opportunity to differentiate. >> And I talked about the whole time of clients. Their data is impor onto them. Them training models with that data for some new making new decisions is their unique value. Prop In there, I'd be so so we encourage them to make sure they're aware that don't just tore their data in any can, um, service out there model because they could be giving away their intellectual property, and it's important. Didn't understand that. >> So that's a complicated one. Write the piece and the other two seem to be even tougher. And some regards, like the free flow of data. I could see a lot of governments not wanting the free flow of data, but and the client is in the middle. OK, d'oh. Government is gonna adjudicate. What's that conversation like? The example that he gave was, maybe was interpolate. If it's if it's information about baggage claims, you can you can use the Blockchain and crypt it and then only see the data at the other end. So that was actually, I thought, a good example. Why do you want to restrict that flow of data? But if it's personal information, keep it in country. But how is that conversation going with clients? >> Leo. Those can involve depending on the country, right and where you're at in the industry. >> But some Western countries are strict about that. >> Absolutely. And this is why we've created a platform that allows for data virtualization. We use Cooper nannies and technologies under the covers so that you can manage that in different locations. You could manage it across. Ah, hybrid of data centers or hybrid of public cloud vendors. And it allows you to still have one business application, and you can kind of do some of the separation and even separation of data. So there's there's, there's, there's an approach there, you know. But you gotta do a balance. Balance it. You gotta balance between innovation, digital transformation and how much you wanna, you know, govern so governs important. And then, you know. But for some projects, we may want to just quickly prototype. So there's a balance there, too. >> Well, that data virtualization tech is interesting because it gets the other piece, which was prevent data localization mandates. But if there is a mandate and we know that some countries aren't going to relax that mandate, you have, ah, a technical solution for that >> architecture that will support that. And that's a big investment for us right now. And where we're doing a lot of work in that space. Obviously, with red hat, you saw partnership or acquisition. So that's been >> really Yeah, I heard something about that's important. That's that's that's a big part of Chapter two. Yeah, all right. We'll give you the final world Caitlyn on the spring. I guess it's not spring it. Secondly, this summer, right? CDO event? >> No, it's been agreed. First day. So we kicked off. Today. We've got a full set of client panel's tomorrow. We've got some announcements around our meta data that I mentioned. Risk insights is a really cool offering. We'll be talking more about. We also have cognitive support. This is another one. Our clients that I really wanted to help with some of their support back in systems. So a lot of exciting announcements, new thought leadership coming out. It's been a great event and looking forward to the next next day. >> Well, I love the fact >> that you guys have have tied data science into the sea. Sweet roll. You guys have done a great job, I think, better than anybody in terms of of, of really advocating for the chief data officer. And this is a great event because it's piers talking. Appears a lot of private conversations going on. So congratulations on all the success and continued success worldwide. >> Thank you so much. Thank you, Dave. >> You welcome. Keep it right there, everybody. We'll be back with our next guest. Ready for this short break. We have a panel coming up. This is David. Dante. You're >> watching the Cube from IBM CDO right back.

Published Date : Jun 24 2019

SUMMARY :

the IBM Chief Data Officer Summit brought to you by IBM. the leader in live tech coverage, you ought to events. So we, as you know well are well, no. We started our chief date officer summits in San Francisco here, How you hand the baton way we'll get to the client piece. So I lead the Data Center League team, which is a group within our product development, be posted that it could, it really depends on the client, so it could be prior So it can be a for pay service. Or sometimes we do it based on just our relation with And so you want to treat him right Maybe Caitlin, you can explain. can capture a lot of that feedback in some of the market user testing proved that out. What do you What are the patterns? And some of those security access controls are always going to be important. So we all remember when you know how very and declared data science was gonna be the number one job, So what are you seeing You guys, you have these these blueprints, of those industry accelerators And how is that actually coming to fruition? So some of the things we're seeing is speaking of financial clients way go into a lot prime minute pump, we call them there right now, we're doing client in eights for wealth management, What's inside of that brought back the It includes a lot of things that we bring to market It's not just sort of so how to Pdf So the platform itself has everything you need I'm not taking the CDO title, you know, because I'm all about data enablement and CDO. in those traditional industries and also seeing it, you know, outside the U. You mean jump ahead of where maybe some of the U. S. seeing a trend where you know, a lot of CEOs they're moving. And our clients are saying, Hey, we're spending, you know, hundreds of millions of dollars and we've got If you have a tool which we're trying to cooperate the platform based on the guidance from the CDO Global CEO team, So we heard Martin Schroeder talked about digital trade and public And you know, what we bring to the table It's how you apply them. It's kind of So it was, you know, we weren't where we needed to be in terms of achieving compliance. And I talked about the whole time of clients. And some regards, like the free flow of data. And it allows you to still have one business application, and you can kind of do some of the separation But if there is a mandate and we know that some countries aren't going to relax that mandate, Obviously, with red hat, you saw partnership or acquisition. We'll give you the final world Caitlyn on the spring. So a lot of exciting announcements, new thought leadership coming out. that you guys have have tied data science into the sea. Thank you so much. This is David.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Caitlin HallfordPERSON

0.99+

IBMORGANIZATION

0.99+

BostonLOCATION

0.99+

DavidPERSON

0.99+

CaitlinPERSON

0.99+

South AfricaLOCATION

0.99+

CarloPERSON

0.99+

Martin SchroederPERSON

0.99+

San FranciscoLOCATION

0.99+

10 yearsQUANTITY

0.99+

TodayDATE

0.99+

CubaLOCATION

0.99+

JapanLOCATION

0.99+

North AmericaLOCATION

0.99+

TokyoLOCATION

0.99+

StevenPERSON

0.99+

Mark ClarePERSON

0.99+

2014DATE

0.99+

San Francisco, CaliforniaLOCATION

0.99+

CaitlynPERSON

0.99+

U. S.LOCATION

0.99+

CarlosPERSON

0.99+

LeoPERSON

0.99+

Middle EastLOCATION

0.99+

AstraZenecaORGANIZATION

0.99+

tomorrowDATE

0.99+

next monthDATE

0.99+

DantePERSON

0.99+

bothQUANTITY

0.99+

Washington, D. C.LOCATION

0.99+

Data Center LeagueORGANIZATION

0.98+

twoQUANTITY

0.98+

10th anniversaryQUANTITY

0.98+

AfricaLOCATION

0.98+

First dayQUANTITY

0.98+

CDOTITLE

0.98+

this summerDATE

0.97+

two organizationsQUANTITY

0.97+

CDO GlobalORGANIZATION

0.97+

Carlo AppugliesePERSON

0.97+

U. S.LOCATION

0.97+

10thQUANTITY

0.96+

one business applicationQUANTITY

0.96+

eightQUANTITY

0.96+

Caitlin HalfertyPERSON

0.95+

about 85%QUANTITY

0.94+

first inaugural summitQUANTITY

0.94+

about 100 40QUANTITY

0.93+

SecondlyQUANTITY

0.93+

firstQUANTITY

0.92+

next next dayDATE

0.9+

hundreds of millions of dollarsQUANTITY

0.9+

IBM Chief Data Officer SummitEVENT

0.9+

Carlo ApplePERSON

0.88+

coupleQUANTITY

0.88+

two a yearQUANTITY

0.88+

CubeCOMMERCIAL_ITEM

0.88+

10th 10 SummitEVENT

0.84+

CDOEVENT

0.83+

Chapter twoOTHER

0.83+

IBM CDO Summit 2019EVENT

0.83+

oneQUANTITY

0.82+

three thingsQUANTITY

0.8+

AndiORGANIZATION

0.76+

this morningDATE

0.75+

DubaiLOCATION

0.74+

Fisherman's Fisherman's WharfLOCATION

0.74+

spring 140DATE

0.72+

one thingQUANTITY

0.71+

summitEVENT

0.7+

WesternLOCATION

0.66+

first CDOQUANTITY

0.66+

CDOORGANIZATION

0.61+

endDATE

0.61+

Carlos Guevara, Claro Colombia & Carlo Appugliese, IBM | IBM Think 2019


 

>> Live from San Francisco, it's theCUBE. Covering IBM Think 2019. Brought to you by IBM. >> Hey everyone, welcome back to the live coverage here in Moscone North in San Francisco for IBM Think. This is theCUBE's coverage. I'm here with Dave Vellante. I've got two great guests here, Carlos Guevara, chief data officer, Claro Columbia, and Carlo Appugliese- Appugliese? >> Appugliese, yeah, good. That's good. >> Engagement Manager, IBM's Data Science Elite Team, customer of IBM, conversation around data science. Welcome to theCube, thanks for joining us. >> Thanks for having us. >> Thank you. >> So we're here the streets are shut down. AI Anywhere is a big theme, Multi-Cloud, but it's all about the data everywhere. People trying to put end-to-end solutions together to solve real business problems. Data's at the heart of all this. Moving data around from cloud to cloud, using AI and technology to get insights out of that. So, take a minute to explain your situation, what you guys are trying to do. >> Okay, okay, perfect. Right now we're working a lot about the business theme, because we need to use the machine learning models or the artificial intelligence, to take best decisions for the company. We were working with Carlo and Sean Muller in order to know how can we divide the customers who leave the company. Because, for us, it's very important, to maintain our customer, to know how their behavior is from them, and their artificial intelligence is an excellent way to do it. We have a lot of challenge about that, because, you know, we have a lot of data, different systems that are running the data, but we need to put all the information together to run the models. The Elite Team that Carlo is leading right now is helping us a lot because, we know how to handle data, we know how to clean the data, we know how to do the right governance for the data and the IBM Equinix is very compromised with us in order to do that. Sofie, that is one of the engineers that is very close to us right now. She was working a lot with my team in order to run the models. Susan, she was doing a lot for our middleware, FITON, and right now we are trained to do it in over the Hadoop system, running the spark, and that is the good way that we are thinking that it's going to get the goal for us We need to maintain our customers. >> So you guys are the largest telecommunications piece, Claro in Mexico for voice and home services-- >> Yeah. >> Is that the segments you guys are targeting? >> Yeah, yeah. >> And the scope size of, how big is that? >> Claro is the largest company in Columbia for telecommunications. We have maybe 50 million customers in Columbia, more than 50% of their market share. Also, where we have many, maybe 2.5 millions of homes in Columbia, that is more than the 50% of the customers for home services. And you know that is a big challenge for us because the competitors are all the time trying to take our customers and the churn, it also adds to us, and how to avoid that and how to do the artificial intelligence to do it, machine learning is a very good way to do that. >> So, classic problem in telecommunications is churn, right, so it's a data problem, so how did it all come about? So these guys came to you and-- >> Yeah, so they came to us, and we got together, we talked about the problem, and churn was at the top, right, these guys have a ton of data. So what we did was, the team got together, we had, really the way the Data Science Elite Team works is we really help clients in three areas. It's all about the right skills, the right people, the right tools, and then the right process. So we put together a team, we put together some Agile approaches, and what we're going to do, and then we started by spinning up an environment, we took some data in, we took their, and there as a lot of data, it as a terabytes of data. We took their user data, we took their users' usage data, which is like how many texts, cellphone, and then billing data, we pulled all that together in an environment, then the data scientists, alongside with Carlos' team, really worked on the problem. And they addressed it with machine learning obviously, targeting churn, they tried a variety of models, but XGBoost ended up being one of the better approaches. And we came up with pretty good accuracy, about 90, 92% precision on the model. >> On predicting-- >> On predicting churn-- >> Yeah, churn, and also, what did you do with that data? >> That is a very good question because, the company is preparing to handle that. I have a funny history, I say to the business people, okay these customers are going to leave the company, and I forget about that, and two months later, I was asking okay, what happened, they say okay, your model is very good, all the customers goes. Oh my God, what is happening with that? They weren't working with information, that is the reason we're thinking that the good ways to think from the right to the left, because which is the purpose, the purpose is to maintain our customers, and in that case we lose 50,000 customers because we didn't do nothing. We are close in the circle, we are taking care about that, prescriptive models to have helped for us to do it. And okay, maybe that is an invoice problem, we need to correct them, to fix the problem, in order to avoid that, but the first part is to predict, to get in a score, for the churn, and to handle that with the people. Obviously, working also, at the root cause analysis, because we need the churn to fix from the root. >> Carlos, what goes through the scope of, like, just the project because, this is a concern we see in the industry, I got a lot of data, how do I attack it, what's the scope? You just come in, ingest it into a data lake, how do you get to the value of these insights quickly, because obviously they are starving for insights, take us through that quick process. >> Well, you know, every problem's a little different, we help hundreds of clients in different ways, but this particular problem, it was a big data problem, we knew we had a lot of data, they had a Hadoop environment, but some of the data wasn't there. So what we did was, is we spun up a separate environment, we pulled some of the big data in there, we also pulled some of the other data together, and we started to do our analysis on that, kind of separately in the cloud, which was a little different, but we're working now to push that down into their Hadoop data lake, because not all the data's there, but some of the data is there, and we want to use some of that computing network to-- >> So you had to almost do an audit on those, figure out what you want to pull in first, >> Absolutely. >> Tie it to the business, on the business side, what were you guys like? Waiting for the answers, or like, what was some of the, on your side of the process, how did it go down? >> Thinking about our business, we were talking a little bit about that, about the architecture to handle that, ICP for Data within that is a very good solution for that, because we need infrastructure to help us, in order to get the answers because finally, we have a question, we have questions, why the customers are leaving us. And, the answer was the data, and the data was handled in a good way, with governance, with data cleaning, with the right models to do that, and right now, our concern is business action, and business offer, because the solution for the company is that we, obviously new products are coming from the data. >> So 10 years ago, you probably didn't have a Hadoop cluster to solve this problem, the data was, maybe it was in a data warehouse, maybe it wasn't, and you probably weren't a chief data officer back then, you know, that role kind of didn't exit. So a lot has changed, in the last 10 years. My question is, do you, first of all, I'd be interested in your comment on that, but then, do you see a point in which you can now take remedial action, or maybe even automate some of that remedial action using machine intelligence and that data cloud, or however else you do it, to actually take action on behalf of the brand, before humans, or without even human involvement, did you foresee the day? >> Yeah, so, just a comment on your thought about the times you know, I've been doing technology for 20 something years, and you know, data science is something that's been around but it's kind of evolved in software development. My thought is, you know, we have these roles of data scientist, but a lot of the feature engineer and data prep does require traditional people that were DBAs and now data engineers, and a variety of skills come together, and that's what try to do in every project. Just to add to that comment. As far as predicting ahead of time, like I think you were trying to say, what data, help me understand your question. >> So you've got 93% accuracy, okay, so, I presume you take that, you give it to the business, business says okay, let's maybe, you know, reach out to them, maybe do a little incentive, or, what kind of action can the machines take action on behalf of your brand, do you foresee a day when that could happen? >> Ah. >> Ah, okay. >> Yeah, so my thought is, for Claro Columbia and Carlos, but obviously this is, to me, remain, is the predictive models we've built will obviously be deployed, and then it would interact with their digital mobile applications, so in real time it'll react for the customers. And then, obviously, you know you want to make sure that Claro and company trust that, and it's making accurate predictions, and that's where a lot more, you know we have to do some model of validation, and evaluation of that, so they can begin to trust those predictions. I think is where we're-- >> Guys. I want to get your thoughts on this because you're doing a lot of learnings here. So can you guys each take a minute and explain the key learnings from this, as you go through the process, certainly in the business side, this is a big imperative to do this. You want to have a business outcome that keeps your users there. But what did you learn, what was some of the learnings you guys got from the project? >> The most important learning from the company was cleaning the data, that sounds funny but, as we say in analysis, garbage in, garbage out. And that was very important for us, one of the things that we learned, that we need to put cleaning data or the system. Also, the governance. Many people forget about the governance, the governance of the data. And right now we're working, again with IBM, in order to put that governance soon. >> So data quality problem. >> Yeah, data quality. >> And, do you report into I guess, COO or the CIO, are you a peer of the CIO, how does that work? >> Oh, okay, that's another funny history because, because the company is, right now I'm working for planning. Yes, it's strange, we're working for planning for the company-- >> For business planning. >> Yeah, for business planning. >> I was coming for an engineer, engineering, and right now I'm working for planning, and trying to make money for the company. You know, that is an engineer thinking how to get more money for the company. I was talking about some kind of analytics that is geospatial analytics, and I went to see that engineer to know how their network's handling, how the quality of the network and right now introducing the same software, the same knowledge, to know which is the better points to do sales. It's a good combination where finally I'm working for planning, and my boss, the planning chief, is working for the CEO. And I hear about different organizations, somebody's in financial, the CDO's in financial, or the CDO for IT, it's different, it depends on the company. Right now, I'm working for planning, how to handle the things, how to make more money for the company, how to handle the churn, and it's interesting because all the knowledge that I have from engineering is perfect to do it. >> Well, I would argue that's the job of a CDO, is to figure out how to make money with data, or save money, right? >> Yeah. >> Yeah, absolutely. >> So it's number one, anyway, is start there. >> Yeah, the thing we always talk about it is, is really proving value, it starts with that use case, identify where the real value is, and then we can, you know, the technology can come and the development can work after that. So I agree 100% with that, is what we're seeing across the board. >> Carlos, thanks for coming in, largest telecommunications in Columbia, great customer reference. >> Carlo, take a minute to explain, real quick, get a plug in for your Data Science Elite Team. What do you guys do, how do you engage, what are some of the projects you work on? >> Right, yeah, so we're a team of about 100 data scientists worldwide, we work side by side with clients, and our job is to really understand the problem from end to end and help in all areas, from skills, tools, and technique. And we roll and prototype, in a three Agile sprints, we use an Agile methodology, about six to eight weeks, and we kind of develop a real, we call it a proof of value. It's not a MVP just yet, or POC, but at the end of the day we prove out that we can get a model, we can do some prediction, we get a certain accuracy, and it's going to add value to the organization. >> It's not a freebie, right? >> It actually is-- >> Sorry, I'm sorry. It's not a four page service, it's a freebie, right? >> Yeah, it's no cost. >> But you got to-- >> We don't like to use free, that's what-- >> But, you got to be saying-- >> It's a good lead. >> Good to discuss that-- >> Well, we don't charge, but >> Largely. >> But it, but it, it's something that clients can take advantage of, if they've got an interesting problem, they're potentially going to do some business with you guys. >> Absolutely. >> If you're the largest telecommunication provider in the country, you get a freebie, and then, the key is, you guys dig in. >> We dig in, it's practitioners, real practitioners, with the right skills, >> Yeah. >> Working on problems. >> Great sales model. >> By the way, Claro Columbia's team, they were amazing in Columbia, we had a really good time, six to eight weeks, you know, working on a problem, and those guys all loved it too, they were-- >> Thank you. >> Before they knew it, they were coding in Python and R, and they had already knew a lot of this stuff, but they're digging in with the team, and it came well together. >> This is the secret to modernization of digital transformation-- >> Yeah. >> Is having the sales process is getting, co-creating together-- >> Absolutely. >> You guys do a great job, and I think this is a trend we'll see more of, of course, TheCUBE is bringing you live coverage here in San Francisco, at Mascone North, that's where our set is. They're shutting down the streets for IBM Think 2019, here in San Francisco. More CUBE coverage after this short break, be right back. (energetic music)

Published Date : Feb 13 2019

SUMMARY :

Brought to you by IBM. and Carlo Appugliese- Appugliese? Appugliese, yeah, good. Welcome to theCube, thanks for joining us. but it's all about the data everywhere. that are running the data, but we need to put the artificial intelligence to do it, Yeah, so they came to us, and we got together, We are close in the circle, we are taking care about that, just the project because, this is a concern but some of the data is there, about the architecture to handle that, and that data cloud, or however else you do it, and you know, data science is something that's been around and that's where a lot more, you know we have to do and explain the key learnings from this, one of the things that we learned, because the company is, right now I'm working for planning. more money for the company, how to handle the churn, and then we can, you know, the technology can come Carlos, thanks for coming in, what are some of the projects you work on? and it's going to add value to the organization. It's not a four page service, it's a freebie, right? they're potentially going to do some business with you guys. in the country, you get a freebie, and then, and they had already knew a lot of this stuff, They're shutting down the streets

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Carlos GuevaraPERSON

0.99+

SusanPERSON

0.99+

IBMORGANIZATION

0.99+

SofiePERSON

0.99+

ClaroORGANIZATION

0.99+

CarlosPERSON

0.99+

ColumbiaLOCATION

0.99+

Sean MullerPERSON

0.99+

CarloPERSON

0.99+

sixQUANTITY

0.99+

San FranciscoLOCATION

0.99+

50,000 customersQUANTITY

0.99+

93%QUANTITY

0.99+

AppugliesePERSON

0.99+

MexicoLOCATION

0.99+

Carlo Appugliese- AppugliesePERSON

0.99+

100%QUANTITY

0.99+

more than 50%QUANTITY

0.99+

Moscone NorthLOCATION

0.99+

FITONORGANIZATION

0.99+

oneQUANTITY

0.99+

Mascone NorthLOCATION

0.99+

first partQUANTITY

0.99+

eight weeksQUANTITY

0.99+

two months laterDATE

0.99+

Carlos'PERSON

0.99+

Claro ColombiaORGANIZATION

0.98+

about 100 data scientistsQUANTITY

0.98+

four pageQUANTITY

0.98+

two great guestsQUANTITY

0.98+

Claro ColumbiaORGANIZATION

0.98+

PythonTITLE

0.98+

10 years agoDATE

0.98+

more thanQUANTITY

0.97+

50 million customersQUANTITY

0.97+

2.5 millions of homesQUANTITY

0.97+

IBM EquinixORGANIZATION

0.96+

eachQUANTITY

0.96+

threeQUANTITY

0.96+

about 90, 92%QUANTITY

0.96+

AgileTITLE

0.95+

hundreds of clientsQUANTITY

0.95+

Carlo AppugliesePERSON

0.94+

about sixQUANTITY

0.92+

CarloORGANIZATION

0.92+

Claro ColumbiaPERSON

0.91+

last 10 yearsDATE

0.9+

three areasQUANTITY

0.89+

20 something yearsQUANTITY

0.89+

TheCUBEORGANIZATION

0.88+

XGBoostORGANIZATION

0.88+

a minuteQUANTITY

0.86+

Data ScienceORGANIZATION

0.85+

theCUBEORGANIZATION

0.82+

terabytesQUANTITY

0.79+

50% ofQUANTITY

0.79+

firstQUANTITY

0.74+

AgileORGANIZATION

0.7+

TeamORGANIZATION

0.65+

approachesQUANTITY

0.59+

ThinkCOMMERCIAL_ITEM

0.51+

theCubeORGANIZATION

0.51+

RTITLE

0.49+

2019EVENT

0.49+

CarlosORGANIZATION

0.45+

2019DATE

0.43+

Carlos Guevara, Claro Columbia & Carlo Appugliese, IBM | IBM Think 2019


 

>> Live from San Francisco. It's the cube covering IBM thing twenty nineteen brought to you by IBM. >> Welcome back to the live coverage here in Mosconi North in San Francisco for IBM. Think this. The cubes coverage. I'm Jeffrey David. Launching a too great guest here. Carlos. Gavel, gavel. A chief date. Officer Clara, Columbia and Carlos. See? Good. Engage your manager. IBM data Science elite team a customer of IBM country around data science. Welcome to the Cube. Thanks for joining us. Thanks for having us. So we'll hear the street, the street to shut down a i N E. Where's the big theme? Multi cloud. But it's all about the data everywhere. People trying to put end to end solutions together to solve real business problems. Date is at the heart of all this moving date around from cloud to cloud using. Aye, aye. And technology get insights out of that. So take a minute to explain your situation, but you got to try to do. >> Okay. Okay, Perfect. Right now, we're working out a lot about the business thing because we need to use the machine learning models or all the artificial intelligence toe. Take best decisions for the company. Way. We're working with Carlo in a charming mother in order to know how how come with a boy the customers left the company Because for us it's very important to maintain our our customer toe. Now, how they're how are the cables is from them. There are two facility intelligences is next selling way to do it that way. Have a lot of challenge about that because, you know, we have a lot of data, different systems, that they're running the data way need to put all the information together to run them to run the mother's. The team that Carlo is leaving right now is helping to us a lot because we WeII know how to handle that. We know howto clean the data when you have to do the right governess for the data on the IBM iniquity is very compromised with us in there in order to do that safely. That is one of the union that is very close to us right now. She was working a lot with my team in order to run the models. You saying she was doing a lot of four. I mean, over fight on right now we are trained to do it in over the system, running this park on DH that is they? They Good way that we are. We are thinking that is going to get the gold for us way Need to maintain our customers. >> So years the largest telecommunications piece Claro in Mexico for boys and home services. Is that segments you guys are targeting? Yeah, Yeah. Scope. Size of how big is that? >> Clarisa? Largest company in Colombia For telecommunication. We have maybe fifty million customers in Colombia. More than fifty percent of the market marketer also way have many maybe two point five millions off forms in Colombia. That is more than fifty percent of the customers for from services on. Do you know that it's a big challenge for us because the competitors are all the time. Tryinto take our customers on DH the charm or they'll have toe. How's the boy that and how to I hope to do their artificial intelligence to do it much learning. It's a very good way to do that. >> So classic problem and telecommunications is Charon, right? So it's a date. A problem? Yeah, but So how did it all come about? So these guys came to you? >> Yeah. They help The game does. We got together. We talked about the problem and in turn was at the top right. These guys have a ton of data, so what we did is the team got together. We have really the way to data sensibly team works is we really helped clients in three areas. It's all about the right skills, the right people, the right tools and then the right process. So we put together a team. We put together some agile approaches on what we're going to do on DH. Then we'd get started by spinning up in environment. We took some data and we took there. And there's a lot of data is terabytes of data. We took their user data way, took their use users usage data, which is like how many text, cellphone and then bill on day that we pulled all that together and environment. Then the data scientists alongside what Carlos is team really worked on the problem, and they addressed it with, you know, machine learning, obviously target. In turn, they tried a variety of models, But actually, boost ended up being one of the better approaches on DH. They came up with a pretty good accuracy about nineties ninety two. Percent precision on the model. Predicting unpredictable turn. Yeah. >> So what did you do with that? That >> that that is a very good question because the company is preparing to handle that. I have a funny history. I said today to the business people. Okay, these customers are going to leave the company. Andi, I forget about that on DH. Two months later, I was asking Okay, what happened? They say, Okay, your model is very good. All the customers goes, >> Oh, my God, What >> this company with that they weren't working with a with information. That is the reason that we're thinking that the good ways to fame for on the right toe the left because twist them which is therefore, pulls the purposes toe Montana where our customers And in that case, we lose fifty thousand customers because we didn't do nothing Where we are close in the circle, we are taking care about that prescriptive boys could have tto do it on. OK, maybe that is her name. Voice problem. We need to correct them to fix the problem in orderto avoid that. But the fetus first parties toe predict toe. Get any score for the charm on Tau handled that with people obviously working. Also at the root cause analysis because way need to charm, way, need to fix from their road, >> Carla. So walk us through the scope of, like, just the project, because this is a concern we see in the industry a lot of data. How do I attack it? What's the scoop? You just come in and just into a data lake. How do you get to the value? These insights quickly because, honestly, they're starving for insights would take us through that quick process. >> Well, you know, every every problems with different. We helped hundreds of clients in different ways. But this pig a problem. It was a big data problem because we knew we had a lot of data. They had a new environment, but some of the data wasn't there. So what we did was way spun up a separate environment. We pulled some of the big data in there. We also pulled some of the other data together on DH. We started to do analysis on that kind of separately in the cloud, which is a little different, but we're working now to push that down into their Duke Data Lake, because not all the data is there, but some of the data is there, and we want to use some of that >> computer that almost to audit. Almost figure out what you want, what you want to pull in first, absolutely tie into the business on the business side. What would you guys like waiting for the answers? Or was that some of the on your side of process? How did it go down? >> I'm thinking about our business way. We're talking a little bit about about that about their detective tow hundred that I see before data within. That is a very good solution for that because we need infested toe, have us in orderto get the answers because finally we have a question we have question quite by. The customers are leaving us. Andi. What is data on the data handed in the good in a good way with governor? Dance with data cleaning with the rhyme orders toe. Do that on DH Right now, our concern is Business Section a business offer Because because the solution for the companies that way always, the new problems are coming from the data >> started ten years ago, you probably didn't have a new cluster to solve this problem. Data was maybe maybe isn't a data warehouse that maybe it wasn't And you probably weren't chief data officer back then. You know that roll kind of didn't exist, so a lot has changed in the last ten years. My question is, do you first of all be adjusting your comment on that? But do you see a point in which you could now take remedial action or maybe even automate some of that remedial action using machine intelligence and that data cloud or however else you do it to actually take action on behalf of the brand before humans who are without even human involvement foresee a day? >> Yeah. So just a comment on your thought about the times I've been doing technology for twenty something years, and data science is something has been around, but it's kind of evolved in software development. My thought is, uh, you know, we have these rolls of data scientists, but a lot of the feature engineering Data prep does require traditional people that were devious. And now Dave engineers and variety of skills come together, and that's what we try to do in every project. Just add that comment. A ce faras predicted ahead of time. Like, I think you're trying to say what data? Help me understand >> you. You know, you've got a ninety three percent accuracy. Okay, So I presume you take that, You give it to the business businesses, Okay? Let's maybe, you know, reach out to them, maybe do a little incentive or you know what kind of action in the machines take action on behalf of your brand? Do you foresee a day >> so that my thought is for Clara, Columbia and Carlos? But but obviously this is to me. Remain is the predictive models we build will obviously be deployed. And then it would interact with their digital mobile applications. So in real time, it'll react for the customers. And then obviously, you know, you want to make sure that claro and company trust that and it's making accurate predictions. And that's where a lot more, you know, we have to do some model validation and evaluation of that so they can begin to trust those predictions. I think is where >> I want to get your thoughts on this because you're doing a lot of learnings here. So can you guys each taking minutes playing the key Learnings from this As you go through the process? Certainly in the business side, there's a big imperative to do this. You want to have a business outcome that keeps the users there. But what did you learn? What was some of the learnings? You guys gone from the project? >> They the most important learning front from the company that wass teen in the data that that sound funny, but waiting in an alley, garbage in garbage, out on DH that wass very, very important for other was one of the things that we learn that we need to put cleaning date over the system. Also, the government's many people forget about the governments of the governments of the data on DH. Right now, we're working again with IBM in our government's >> so data quality problem? Yeah, they fight it and you report in to your CEO or the CEO. Seo, your spear of the CIA is OK. That >> is it. That's on another funny history, because because the company the company is right now, I am working for planning. This is saying they were working for planning for the company. >> Business planning? >> Yeah, for business planning. I was coming for an engineer engineering on DH. Right now, I'm working for a planning on trying to make money for the company, and you know that it's an engineer thinking how to get more money for the company I was talking about. So on some kind of analysis ticks, that is us Partial Analytics on I want you seeing that in engineer to know how the network handling how the quality of the network on right now using the same software this acknowledge, to know which is the better point to do sales is is a good combination finally and working. Ralph of planning on my boss, the planning the planet is working for the CEO and I heard about different organizations. Somebody's in Financial City owes in financial or the video for it is different. That depends from the company. Right now, I'm working for planning how to handle things, to make more money for the company, how to tow hundred children. And it is interesting because all the knowledge that I have engineering is perfect to do it >> Well, I would argue that's the job of a CDO is to figure out how to make money with data. Are saying money. Yeah. Absolute number one. Anyway, start there. >> Yeah, The thing we always talked about is really proving value. It starts with that use case. Identify where the real value is and then waken. You know, technology could come in the in the development work after that. So I agree with hundred percent. >> Carlos. Thanks for coming in. Largest telecommunication in Colombia. Great. Great customer reference. Carlo thinking men to explain real quick in a plug in for your data science elite team. What do you guys do? How do you engage? What? Some of the projects you work on Grey >> out. So we were a team of about one hundred data scientists worldwide. We work side by side with clients. In our job is to really understand the problem from end and help in all areas from skills, tools and technique. And we won't prototype in a three agile sprints. We use an agile methodology about six to eight weeks and we tied. It developed a really We call it a proof of value. It's it's not a M v P just yet or or poc But at the end of the day we prove out that we could get a model. We can do some prediction. We get a certain accuracy and it's gonna add value to the >> guys. Just >> It's not a freebie. It actually sorry. I'm sorry. It's not for paint service. It's a freebie is no cough you've got. But I don't like to use >> free way. Don't charge, but >> But it's something that clients could take advantage of if they're interesting problem and maybe eventually going to do some business. >> If you the largest telecommunication provider in the country, to get a freebie and then three keys, You guys dig in because its practitioners, real practitioners with the right skills, working on problems that way. Claro, >> Colombia's team. They were amazing. In Colombia. We had a really good time. Six to eight weeks working on it. You know, a problem on those guys. All loved it, too. They were. They were. Before they knew it. They were coding and python. And are they ready? Knew a lot of this stuff, but they're digging in with the team and became well together. >> This is the secret to modernization of digital transformation, Having sales process is getting co creating together. Absolutely. Guys do a great job, and I think this is a trend will see more of. Of course, the cubes bring you live coverage here in San Francisco at Mosconi. Nor That's where I said it is. They're shutting down the streets for IBM. Think twenty here in San Francisco, more cube coverage after the short break right back.

Published Date : Feb 12 2019

SUMMARY :

It's the cube covering Date is at the heart of all this moving date around from cloud to cloud using. We know howto clean the data when you have to do the right governess for the data on Is that segments you guys are targeting? How's the boy that and how to I hope to do their artificial intelligence to do So these guys came to you? We have really the way to data All the customers goes, are close in the circle, we are taking care about that prescriptive boys could have How do you get to the value? but some of the data is there, and we want to use some of that on the business side. What is data on the data handed in the good in a good way with governor? and that data cloud or however else you do it to actually take but a lot of the feature engineering Data prep does require traditional Okay, So I presume you take that, Remain is the predictive models we build will obviously be deployed. Certainly in the business side, there's a big imperative to do this. They the most important learning front from the company Yeah, they fight it and you report in to the company is right now, I am working for planning. the planning the planet is working for the CEO and I heard Well, I would argue that's the job of a CDO is to figure out how to make money with data. You know, technology could come in the in the development Some of the projects you work on Grey So we were a team of about one hundred data scientists worldwide. Just But I don't like to use but But it's something that clients could take advantage of if they're interesting problem and maybe If you the largest telecommunication provider in the country, to get a freebie and then three Six to eight weeks working This is the secret to modernization of digital transformation, Having sales process is getting co

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ColombiaLOCATION

0.99+

CarlosPERSON

0.99+

IBMORGANIZATION

0.99+

CarlaPERSON

0.99+

San FranciscoLOCATION

0.99+

Jeffrey DavidPERSON

0.99+

SixQUANTITY

0.99+

ClaroORGANIZATION

0.99+

CIAORGANIZATION

0.99+

DavePERSON

0.99+

More than fifty percentQUANTITY

0.99+

hundred percentQUANTITY

0.99+

ninety three percentQUANTITY

0.99+

more than fifty percentQUANTITY

0.99+

Two months laterDATE

0.99+

CarloPERSON

0.99+

MexicoLOCATION

0.99+

Mosconi NorthLOCATION

0.99+

ClarisaORGANIZATION

0.99+

todayDATE

0.99+

MosconiLOCATION

0.99+

fifty thousand customersQUANTITY

0.99+

MontanaLOCATION

0.98+

RalphPERSON

0.98+

twentyQUANTITY

0.98+

eight weeksQUANTITY

0.98+

about one hundred data scientistsQUANTITY

0.98+

oneQUANTITY

0.98+

hundreds of clientsQUANTITY

0.97+

twenty something yearsQUANTITY

0.97+

hundred childrenQUANTITY

0.97+

three keysQUANTITY

0.97+

ClaraPERSON

0.97+

Claro ColumbiaORGANIZATION

0.97+

ten years agoDATE

0.97+

pythonTITLE

0.96+

fifty million customersQUANTITY

0.94+

Carlos GuevaraPERSON

0.93+

about sixQUANTITY

0.92+

fourQUANTITY

0.91+

AndiPERSON

0.9+

GavelPERSON

0.89+

first partiesQUANTITY

0.89+

nineteenQUANTITY

0.89+

about nineties ninety twoQUANTITY

0.89+

eachQUANTITY

0.87+

two facilityQUANTITY

0.86+

Carlo AppugliesePERSON

0.86+

gavelPERSON

0.85+

a dayQUANTITY

0.83+

five millions offQUANTITY

0.83+

two pointQUANTITY

0.81+

three areasQUANTITY

0.81+

claroORGANIZATION

0.78+

firstQUANTITY

0.78+

CharonORGANIZATION

0.74+

Think 2019COMMERCIAL_ITEM

0.71+

agileTITLE

0.7+

TauPERSON

0.7+

last ten yearsDATE

0.69+

LakeCOMMERCIAL_ITEM

0.69+

SeoPERSON

0.68+

Financial CityLOCATION

0.67+

threeQUANTITY

0.63+

GreyORGANIZATION

0.59+

thingsQUANTITY

0.59+

Duke DataORGANIZATION

0.59+

Think twentyORGANIZATION

0.58+

lemPERSON

0.56+

terabytesQUANTITY

0.55+

Carlo Vaiti | DataWorks Summit Europe 2017


 

>> Announcer: You are CUBE Alumni. Live from Munich, Germany, it's theCUBE. Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello, everyone, welcome back to live coverage at DataWorks 2017, I'm John Furrier with my cohost, Dave Vellante. Two days of coverage here in Munich, Germany, covering Hortonworks and Yahoo, presenting Hadoop Summit, now called DataWorks 2017. Our next guest is Carlo Vaiti, who's the HPE chief technology strategist, EMEA Digital Solutions, Europe, Middle East, and Africa. Welcome to theCUBE. >> Thank you, John. >> So we were just chatting before we came on, of your historic background at IBM, Oracle, and now HPE, and now back into the saddle there. >> Don't forget Sun Microsystems. >> Sun Microsystems, sorry, Sun, yeah. I mean, great, great run. >> It was a long run. >> You've seen the computer revolution happen. I worked at HP for nine years, from '88 to '97. Again, Dave was a premier analyst during that run of client-server. We've seen the computer revolution happen. Now we're seeing the digital revolution where the iPhone is now 10 years old, Cloud is booming, data's at the center of the value proposition, so a completely new disruptive capability. >> Carlo: Sure, yes. >> So what are you doing as the CTO, chief technologist for HPE, how are you guys bringing this story together? 'Cause there's so much going on at HPE. You got the services spit, you got the software split, and HP's focusing on the new style of IT, as Meg Whitman calls it. >> So, yeah. My role in EMEA is actually all about having basically a visionary kind of strategy role for what's going to be HP in the future, in terms of IT. And one of the things that we are looking at is, is specifically to have, we split our strategy in three different aspects, so three transformation areas. The first one which we usually talk is what I call hybrid IT, right, which is basically making services around either On-Premise or on Cloud for our customer base. The second one is actually power the Intelligent Edge, so is actually looking after our collaboration and when we acquire Aruba components. And the third one, which is in the middle, and that's why I'm here at the DataWorks Summit, is actually the data-analytics aspects. And we have a couple of solution in there. One is the Enterprise great Hadoop, which is part of this. This is actually how we generalize all the figure and the strategy for HP. >> It's interesting, Dave and I were talking yesterday, being in Europe, it's obviously a different sideshow, it's smaller than the DataWorks or Hadoop Summit in North America in San Jose, but there's a ton of Internet of things, IoT or IIoT, 'cause here in Germany, obviously, a lot of industrial nations, but in Europe in general, a lot of smart cities initiatives, a lot of mobility, a ton of Internet of things opportunity, more than in the US. >> Absolutely. >> Can you comment on how you guys are tackling the IoT? Because it's an Intelligent Edge, certainly, but it's also data, it's in your wheelhouse. >> Yes, sure. So I'm actually working, it's a good question, because I'm actually working a couple of projects in Eastern Europe, where it's all about Industrial IoT Analytics, IIoTA. That's the new terminology we use. So what we do is actually, we analyze from a business perspective, what are the business pain points, in an oil and gas company for example. And we understand for example, what kind of things that they need and must have. And what I'm saying here is, one of the aspects for example, is the drilling opportunity. So how much oil you can extract from a specific rig in the middle of the North Sea, for example. This is one of the key question, because the customer want to understand, in the future, how much oil they can extract. The other one is for example, the upstream business. So doing on the retail side and having, say, when my customer is stopping in a gas station, I want go in the shop, immediately giving, I dunno, my daughter, a kind of campaign for the Barbie, because they like the Barbie. So IoT, Industrial IoT help us in actually making a much better customer experience, and that's the case of the upstream business, but is also helping us in actually much faster business outcomes. And that's what the customer wants, right? 'Cause, and was talking with your colleague before, I'm talking to the business guy. I'm not talking to the IT anymore in these kind of place, and that's how IoT allow us a chance to change the conversation at the industry level. >> These are first-time conversations too. You're getting at the kinds of business conversations that weren't possible five years ago. >> Carlo: Yes, sure. >> I mean and 10 years ago, they would have seemed fantasy. Now they're reality. >> The role of analytics in my opinion, is becoming extremely key, and I said this morning, for me my best center is that the detail, is the stone foundation of the digital economy. I continue to repeat this terminology, because it's actually where everything is starting from. So what I mean is, let's take a look at the analytic aspect. So if I'm able to analyze the data close to the shop floor, okay, close to the shop manufacturing floor, if I'm able to analyze my data on the rig, in the oil and gas industry, if I'm able to analyze doing preprocessing analytics, with Kafka, Druid, these kind of open-source software, where close to the Intelligent Edge, then my customers going to be happy, because I give them very fast response, and the decision-maker can get to decision in a faster time. Today, it takes a long time to take these type of decision. So that's why we want to move into the power Intelligent Edge. >> So you're saying, data's foundational, but if you get to the Intelligent Edge, it's dynamic. So you have a dynamic reactive, realtime time series, or presences of data, but you need the foundational pre-data. >> Perfect. >> Is that kind of what you're getting at? >> Yes, that's the first step. Preprocessing analytics is what we do. In the next generation of, we think is going to be Industrial IoT Analytics, we're going to actually put massive amount of compute close to the shop manufacturing floor. We call internally or actually externally, convergent planned infrastructure. And that's the key point, right? >> John: Convergent plan? >> Convergent planned infrastructure, CPI. If you look at in Google, you will find. It's a solution we bring in the market a few months ago. We announce it in December last year. >> Yeah, Antonio's smart. He also had a converged systems as well. One of the first ones. >> Yeah, so that's converge compute at the edge basically. >> Correct, converge compute-- >> Very powerful. >> Very powerful, and we run analytics on the edge. That's the key point. >> Which we love, because that means you don't have to send everything back to the Cloud because it's too expensive, it's going to take too long, it's not going to work. >> Carlo: The bandwidth on the network is much less. >> There's no way that's going to be successful, unless you go to the edge and-- >> It takes time. >> With a cost. >> Now the other thing is, of course, you've got the Aruba asset, to be able to, I always say, joke, connect the windmill. But, Carlo, can we go back to the IoTA example? >> Carlo: Correct, yeah. >> I want to help, help our audience understand, sort of, the new HP, post these spin merges. So perviously you would say, okay, we have Vertica. You still have partnership, or you still own Vertica, but after September 1st-- >> Absolutely, absolutely. It's part of the columnar side-- >> Right, yes, absolutely, but, so. But the new strategy is to be more of a platform for a variety of technology. So how for instance would you solve, or did you solve, that problem that you described? What did you actually deliver? >> So again, as I said, we're, especially in the Industrial IoT, we are an ecosystem, okay? So we're one element of the ecosystem solution. For the oil and gas specifically, we're working with other system integrator. We're working with oil and the industry gas expertise, like DXC company, right, the company that we just split a few days ago, and we're working with them. They're providing the industry expertise. We are a infrastructure provided around that, and the services around that for the infrastructure element. But for the industry expertise, we try to have a kind of little bit of knowledge, to start the conversation with the customer. But again, my role in the strategy is actually to be a ecosystem digital integrator. That's the new terminology we like to bring in the market, because we really believe that's the way HP role is going to be. And the relevance of HP is totally depending if we are going to be successful in these type of things. >> Okay, now a couple other things you talked about in your keynote. I'm just going to list them, and then we can go wherever we want. There was Data Link 3.0, Storage Disaggregation, which is kind of interesting, 'cause it's been a problem. Hadoop as a service, Realtime Everywhere, and then Analytics at the Edge, which we kind of just talked about. Let's pick one. Let's start with Data Link 3.0. What is that? John doesn't like the term data link. He likes data ocean. >> I like data ocean. >> Is Data Link 3.0 becoming an ocean? >> It's becoming an ocean. So, Data Link 3.0 for us is actually following what is going to be the future for HDFS 3.0. So we have three elements. The erasure coding feature, which is coming on HDFS. The second element is around having HDFS data tier, multi-data tier. So we're going to have faster SSD drives. We're going to have big memory nodes. We're going to have GPU nodes. And the reason why I say disaggregation is because some of the workload will be only compute, and some of the workload will be only storage, okay? So we're going to bring, and the customer require this, because it's getting more data, and they need to have for example, YARN application running on compute nodes, and the same level, they want to have storage compute block, sorry, storage components, running on the storage model, like HBase for example, like HDFS 3.0 with the multi-tier option. So that's why the data disaggregation, or disaggregation between compute and storage, is the key point. We call this asymmetric, right? Hadoop is becoming asymmetric. That's what it mean. >> And the problem you're solving there, is when I add a node to a cluster, I don't have to add compute and storage together, I can disaggregate and choose whatever I need, >> Everyone that we did. >> based on the workload. >> They are all multitenancy kind of workload, and they are independent and they scale out. Of course, it's much more complex, but we have actually proved that this is the way to go, because that's what the customer is demanding. >> So, 3.0 is actually functional. It's erasure coding, you said. There's a data tier. You've got different memory levels. >> And I forgot to mention, the containerization of the application. Having dockerized the application for example. Using mesosphere for example, right? So having the containerization of the application is what all of that means, because what we do in Hadoop, we actually build the different clusters, they need to talk to each other, and change data in a faster way. And a solution like, a product like SQL Manager, from Hortonworks, is actually helping us to get this connection between the cluster faster and faster. And that's what the customer wants. >> And then Hadoop as a service, is that an on-premise solution, is that a hybrid solution, is it a Cloud solution, all three? >> I can offer all of them. Hadoop is a service could be run on-premise, could be run on a public Cloud, could be run on Azure, or could be mix of them, partially on-premise, and partially on public. >> And what are you seeing with regard to customer adoption of Cloud, and specifically around Hadoop and big data? >> I think the way I see that option is all the customer want to start very small. The maturity is actually better from a technology standpoint. If you're asking me the same question maybe a year ago, I would say, it's difficult. Now I think they've got the point. Every large customer, they want to build this big data ocean, note the delay, ocean, whatever you want to call it. >> John: Love that. (laughs) >> All right. They want to build this data ocean, and the point I want to make is, they want to start small, but they want to think very high. Very big, right, from their perspective. And the way they approach us is, we have a kind of methodology. We establish the maturity assessment. We do a kind of capability maturity assessment, where we find that if the customer is actually a pioneer, or is actually a very traditional one, so it's very slow-going. Once we determine where is the stage of the customer is, we propose some specific proof of concept. And in three months usually, we're putting this in place. >> You also talked about realtime everywhere. We in our research, we talk about the, historically, you had batchy of interactive, and now you have what we call continuous, or realtime streaming workloads. How prevalent is that? Where do you see it going in the future? >> So I think is another train for the future, as I mentioned this morning in my presentation. So and Spark is actually doing the open-source memory engine process, is actually the core of this stuff. We see 60 to 70 time faster analytics, compared to not to use Spark. So many customer implemented Spark because of this. The requirement are that the customer needs an immediate response time, okay, for a specific decision-making that they have to do, in order to improve their business, in order to improve their life. But this require a different architecture. >> I have a question, 'cause you, you've lived in the United States, you're obviously global, and spent a lot of time in Europe as well, and a lot of times, people want to discuss the differences between, let's make it specific here, the European continent and North America, and from a sophistication standpoint, same, we can agree on that, but there are still differences. Maybe, more greater privacy concerns. The whole thing with the Cloud and the NSA in the United States, created some concerns. What do you see as the differences today between North America and Europe? >> From my perspective, I think we are much more for example take IoT, Industrial IoT. I think in Europe we are much more advanced. I think in the manufacturing and the automotive space, the connected car kind of things, autonomous driving, this is something that we know already how to manage, how to do it. I mean, Tesla in the US is a good example that what I'm saying is not true, but if I look at for example, large German manufacturing car, they always implemented these type of things already today. >> Dave: For years, yeah. >> That's the difference, right? I think the second step is about the faster analytic approach. So what I mentioned before. The Power the Intelligent Edge, in my opinion at the moment, is much more advanced in the US compared to Europe. But I think Europe is starting to run back, and going on the same route. Because we believe that putting compute capacity on the edge is what actually the customer wants. But that's the two big differences I see. >> The other two big external factors that we like to look at, are Brexit and Trump. So (laughs) how 'about Brexit? Now that it's starting to sort of actually become, begin the process, how should we think about it? Is it overblown? It is critical? What's your take? >> Well, I think it's too early to say. UK just split a few days ago, right, officially. It's going to take another 18 months before it's going to be completed. From a commercial standpoint, we don't see any difference so far. We're actually working the same way. For me it's too early to say if there's going to be any implication on that. >> And we don't know about Trump. We don't have to talk about it, but the, but I saw some data recently that's, European sentiment, business sentiment is trending stronger than the US, which is different than it's been for the last many years. What do you see in terms of just sentiment, business conditions in Europe? Do you see a pick up? >> It's getting better, it is getting better. I mean, if I look at the major countries, the P&L is going positive, 1.5%. So I think from that perspective, we are getting better. Of course we are still suffering from the Chinese, and Japanese market sometimes. Especially in some of the big large deals. The inclusion of the Japanese market, I feel it, and the Chinese market, I feel that. But I think the economy is going to be okay, so it's going to be good. >> Carlo, I want to thank you for coming on and sharing your insight, final question for you. You're new to HPE, okay. We have a lot of history, obviously I was, spent a long part of my career there, early in my career. Dave and I have covered the transformation of HP for many, many years, with theCUBE certainly. What attracted you to HP and what would you say is going on at HP from your standpoint, that people should know about? >> So I think the number one thing is that for us the word is going to be hybrid. It means that some of the services that you can implement, either on-premise or on Cloud, could be done very well by the new Pointnext organization. I'm not part of Pointnext. I'm in the EG, Enterprise Group division. But I am fan for Pointnext because I believe this is the future of our company, is on the services side, that's where it's going. >> I would just point out, Dave and I, our commentary on the spin merge has been, create these highly cohesive entities, very focused. Antonio now running EG, big fans, of where it's actually an efficient business model. >> Carlo: Absolutely. >> And Chris Hsu is running the Micro Focus, CUBE Alumni. >> Carlo: It's a very efficient model, yes. >> Well, congratulations and thanks for coming on and sharing your insights here in Europe. And certainly it is an IoT world, IIoT. I love the analytics story, foundational services. It's going to be great, open source powering it, and this is theCUBE, opening up our content, and sharing that with you. I'm John Furrier, Dave Vellante. Stay with us for more great coverage, here from Munich after the short break.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Welcome to theCUBE. and now back into the saddle there. I mean, great, great run. data's at the center of the value proposition, and HP's focusing on the new style And one of the things that we are looking at is, it's smaller than the DataWorks or Hadoop Summit Can you comment on how you guys are tackling the IoT? and that's the case of the upstream business, You're getting at the kinds of business conversations I mean and 10 years ago, they would have seemed fantasy. and the decision-maker can get to decision in a faster time. So you have a dynamic reactive, And that's the key point, right? It's a solution we bring in the market a few months ago. One of the first ones. That's the key point. it's going to take too long, it's not going to work. Now the other thing is, sort of, the new HP, post these spin merges. It's part of the columnar side-- But the new strategy is to be more That's the new terminology we like to bring in the market, John doesn't like the term data link. and the same level, they want to have but we have actually proved that this is the way to go, So, 3.0 is actually functional. So having the containerization of the application Hadoop is a service could be run on-premise, all the customer want to start very small. John: Love that. and the point I want to make is, they want to start small, and now you have what we call continuous, is actually the core of this stuff. in the United States, created some concerns. I mean, Tesla in the US is a good example is much more advanced in the US compared to Europe. actually become, begin the process, before it's going to be completed. We don't have to talk about it, but the, and the Chinese market, I feel that. Dave and I have covered the transformation of HP It means that some of the services that you can implement, our commentary on the spin merge has been, I love the analytics story, foundational services.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

CarloPERSON

0.99+

OracleORGANIZATION

0.99+

EuropeLOCATION

0.99+

IBMORGANIZATION

0.99+

GermanyLOCATION

0.99+

TrumpPERSON

0.99+

Meg WhitmanPERSON

0.99+

VerticaORGANIZATION

0.99+

PointnextORGANIZATION

0.99+

Chris HsuPERSON

0.99+

JohnPERSON

0.99+

Carlo VaitiPERSON

0.99+

John FurrierPERSON

0.99+

HPORGANIZATION

0.99+

MunichLOCATION

0.99+

HPEORGANIZATION

0.99+

YahooORGANIZATION

0.99+

Sun MicrosystemsORGANIZATION

0.99+

AntonioPERSON

0.99+

USLOCATION

0.99+

EGORGANIZATION

0.99+

second elementQUANTITY

0.99+

United StatesLOCATION

0.99+

second stepQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

December last yearDATE

0.99+

iPhoneCOMMERCIAL_ITEM

0.99+

San JoseLOCATION

0.99+

1.5%QUANTITY

0.99+

yesterdayDATE

0.99+

North AmericaLOCATION

0.99+

September 1stDATE

0.99+

'97DATE

0.99+

'88DATE

0.99+

AfricaLOCATION

0.99+

oneQUANTITY

0.99+

TodayDATE

0.99+

three monthsQUANTITY

0.99+

Eastern EuropeLOCATION

0.99+

SunORGANIZATION

0.99+

Two daysQUANTITY

0.99+

60QUANTITY

0.99+

DataWorks 2017EVENT

0.99+

10 years agoDATE

0.99+

DXCORGANIZATION

0.98+

EMEA Digital SolutionsORGANIZATION

0.98+

five years agoDATE

0.98+

a year agoDATE

0.98+

TeslaORGANIZATION

0.98+

Paola Peraza Calderon & Viraj Parekh, Astronomer | Cube Conversation


 

(soft electronic music) >> Hey everyone, welcome to this CUBE conversation as part of the AWS Startup Showcase, season three, episode one, featuring Astronomer. I'm your host, Lisa Martin. I'm in the CUBE's Palo Alto Studios, and today excited to be joined by a couple of guests, a couple of co-founders from Astronomer. Viraj Parekh is with us, as is Paola Peraza-Calderon. Thanks guys so much for joining us. Excited to dig into Astronomer. >> Thank you so much for having us. >> Yeah, thanks for having us. >> Yeah, and we're going to be talking about the role of data orchestration. Paola, let's go ahead and start with you. Give the audience that understanding, that context about Astronomer and what it is that you guys do. >> Mm-hmm. Yeah, absolutely. So, Astronomer is a, you know, we're a technology and software company for modern data orchestration, as you said, and we're the driving force behind Apache Airflow. The Open Source Workflow Management tool that's since been adopted by thousands and thousands of users, and we'll dig into this a little bit more. But, by data orchestration, we mean data pipeline, so generally speaking, getting data from one place to another, transforming it, running it on a schedule, and overall just building a central system that tangibly connects your entire ecosystem of data services, right. So what, that's Redshift, Snowflake, DVT, et cetera. And so tangibly, we build, we at Astronomer here build products powered by Apache Airflow for data teams and for data practitioners, so that they don't have to. So, we sell to data engineers, data scientists, data admins, and we really spend our time doing three things. So, the first is that we build Astro, our flagship cloud service that we'll talk more on. But here, we're really building experiences that make it easier for data practitioners to author, run, and scale their data pipeline footprint on the cloud. And then, we also contribute to Apache Airflow as an open source project and community. So, we cultivate the community of humans, and we also put out open source developer tools that actually make it easier for individual data practitioners to be productive in their day-to-day jobs, whether or not they actually use our product and and pay us money or not. And then of course, we also have professional services and education and all of these things around our commercial products that enable folks to use our products and use Airflow as effectively as possible. So yeah, super, super happy with everything we've done and hopefully that gives you an idea of where we're starting. >> Awesome, so when you're talking with those, Paola, those data engineers, those data scientists, how do you define data orchestration and what does it mean to them? >> Yeah, yeah, it's a good question. So, you know, if you Google data orchestration you're going to get something about an automated process for organizing silo data and making it accessible for processing and analysis. But, to your question, what does that actually mean, you know? So, if you look at it from a customer's perspective, we can share a little bit about how we at Astronomer actually do data orchestration ourselves and the problems that it solves for us. So, as many other companies out in the world do, we at Astronomer need to monitor how our own customers use our products, right? And so, we have a weekly meeting, for example, that goes through a dashboard and a dashboarding tool called Sigma where we see the number of monthly customers and how they're engaging with our product. But, to actually do that, you know, we have to use data from our application database, for example, that has behavioral data on what they're actually doing in our product. We also have data from third party API tools, like Salesforce and HubSpot, and other ways in which our customer, we actually engage with our customers and their behavior. And so, our data team internally at Astronomer uses a bunch of tools to transform and use that data, right? So, we use FiveTran, for example, to ingest. We use Snowflake as our data warehouse. We use other tools for data transformations. And even, if we at Astronomer don't do this, you can imagine a data team also using tools like, Monte Carlo for data quality, or Hightouch for Reverse ETL, or things like that. And, I think the point here is that data teams, you know, that are building data-driven organizations have a plethora of tooling to both ingest the right data and come up with the right interfaces to transform and actually, interact with that data. And so, that movement and sort of synchronization of data across your ecosystem is exactly what data orchestration is responsible for. Historically, I think, and Raj will talk more about this, historically, schedulers like KRON and Oozie or Control-M have taken a role here, but we think that Apache Airflow has sort of risen over the past few years as the defacto industry standard for writing data pipelines that do tasks, that do data jobs that interact with that ecosystem of tools in your organization. And so, beyond that sort of data pipeline unit, I think where we see it is that data acquisition is not only writing those data pipelines that move your data, but it's also all the things around it, right, so, CI/CD tool and Secrets Management, et cetera. So, a long-winded answer here, but I think that's how we talk about it here at Astronomer and how we're building our products. >> Excellent. Great context, Paola. Thank you. Viraj, let's bring you into the conversation. Every company these days has to be a data company, right? They've got to be a software company- >> Mm-hmm. >> whether it's my bank or my grocery store. So, how are companies actually doing data orchestration today, Viraj? >> Yeah, it's a great question. So, I think one thing to think about is like, on one hand, you know, data orchestration is kind of a new category that we're helping define, but on the other hand, it's something that companies have been doing forever, right? You need to get data moving to use it, you know. You've got it all in place, aggregate it, cleaning it, et cetera. So, when you look at what companies out there are doing, right. Sometimes, if you're a more kind of born in the cloud company, as we say, you'll adopt all these cloud native tooling things your cloud provider gives you. If you're a bank or another sort of institution like that, you know, you're probably juggling an even wider variety of tools. You're thinking about a cloud migration. You might have things like Kron running in one place, Uzi running somewhere else, Informatics running somewhere else, while you're also trying to move all your workloads to the cloud. So, there's quite a large spectrum of what the current state is for companies. And then, kind of like Paola was saying, Apache Airflow started in 2014, and it was actually started by Airbnb, and they put out this blog post that was like, "Hey here's how we use Apache Airflow to orchestrate our data across all their sources." And really since then, right, it's almost been a decade since then, Airflow emerged as the open source standard, and there's companies of all sorts using it. And, it's really used to tie all these tools together, especially as that number of tools increases, companies move to hybrid cloud, hybrid multi-cloud strategies, and so on and so forth. But you know, what we found is that if you go to any company, especially a larger one and you say like, "Hey, how are you doing data orchestration?" They'll probably say something like, "Well, I have five data teams, so I have eight different ways I do data orchestration." Right. This idea of data orchestration's been there but the right way to do it, kind of all the abstractions you need, the way your teams need to work together, and so on and so forth, hasn't really emerged just yet, right? It's such a quick moving space that companies have to combine what they were doing before with what their new business initiatives are today. So, you know, what we really believe here at Astronomer is Airflow is the core of how you solve data orchestration for any sort of use case, but it's not everything. You know, it needs a little more. And, that's really where our commercial product, Astro comes in, where we've built, not only the most tried and tested airflow experience out there. We do employ a majority of the Airflow Core Committers, right? So, we're kind of really deep in the project. We've also built the right things around developer tooling, observability, and reliability for customers to really rely on Astro as the heart of the way they do data orchestration, and kind of think of it as the foundational layer that helps tie together all the different tools, practices and teams large companies have to do today. >> That foundational layer is absolutely critical. You've both mentioned open source software. Paola, I want to go back to you, and just give the audience an understanding of how open source really plays into Astronomer's mission as a company, and into the technologies like Astro. >> Mm-hmm. Yeah, absolutely. I mean, we, so we at Astronomers started using Airflow and actually building our products because Airflow is open source and we were our own customers at the beginning of our company journey. And, I think the open source community is at the core of everything we do. You know, without that open source community and culture, I think, you know, we have less of a business, and so, we're super invested in continuing to cultivate and grow that. And, I think there's a couple sort of concrete ways in which we do this that personally make me really excited to do my own job. You know, for one, we do things like we organize meetups and we sponsor the Airflow Summit and there's these sort of baseline community efforts that I think are really important and that reminds you, hey, there just humans trying to do their jobs and learn and use both our technology and things that are out there and contribute to it. So, making it easier to contribute to Airflow, for example, is another one of our efforts. As Viraj mentioned, we also employ, you know, engineers internally who are on our team whose full-time job is to make the open source project better. Again, regardless of whether or not you're a customer of ours or not, we want to make sure that we continue to cultivate the Airflow project in and of itself. And, we're also building developer tooling that might not be a part of the Apache Open Source project, but is still open source. So, we have repositories in our own sort of GitHub organization, for example, with tools that individual data practitioners, again customers are not, can use to make them be more productive in their day-to-day jobs with Airflow writing Dags for the most common use cases out there. The last thing I'll say is how important I think we've found it to build sort of educational resources and documentation and best practices. Airflow can be complex. It's been around for a long time. There's a lot of really, really rich feature sets. And so, how do we enable folks to actually use those? And that comes in, you know, things like webinars, and best practices, and courses and curriculum that are free and accessible and open to the community are just some of the ways in which I think we're continuing to invest in that open source community over the next year and beyond. >> That's awesome. It sounds like open source is really core, not only to the mission, but really to the heart of the organization. Viraj, I want to go back to you and really try to understand how does Astronomer fit into the wider modern data stack and ecosystem? Like what does that look like for customers? >> Yeah, yeah. So, both in the open source and with our commercial customers, right? Folks everywhere are trying to tie together a huge variety of tools in order to start making sense of their data. And you know, I kind of think of it almost like as like a pyramid, right? At the base level, you need things like data reliability, data, sorry, data freshness, data availability, and so on and so forth, right? You just need your data to be there. (coughs) I'm sorry. You just need your data to be there, and you need to make it predictable when it's going to be there. You need to make sure it's kind of correct at the highest level, some quality checks, and so on and so forth. And oftentimes, that kind of takes the case of ELT or ETL use cases, right? Taking data from somewhere and moving it somewhere else, usually into some sort of analytics destination. And, that's really what businesses can do to just power the core parts of getting insights into how their business is going, right? How much revenue did I had? What's in my pipeline, salesforce, and so on and so forth. Once that kind of base foundation is there and people can get the data they need, how they need it, it really opens up a lot for what customers can do. You know, I think one of the trendier things out there right now is MLOps, and how do companies actually put machine learning into production? Well, when you think about it you kind of have to squint at it, right? Like, machine learning pipelines are really just any other data pipeline. They just have a certain set of needs that might not not be applicable to ELT pipelines. And, when you kind of have a common layer to tie together all the ways data can move through your organization, that's really what we're trying to make it so companies can do. And, that happens in financial services where, you know, we have some customers who take app data coming from their mobile apps, and actually run it through their fraud detection services to make sure that all the activity is not fraudulent. We have customers that will run sports betting models on our platform where they'll take data from a bunch of public APIs around different sporting events that are happening, transform all of that in a way their data scientist can build models with it, and then actually bet on sports based on that output. You know, one of my favorite use cases I like to talk about that we saw in the open source is we had there was one company whose their business was to deliver blood transfusions via drone into remote parts of the world. And, it was really cool because they took all this data from all sorts of places, right? Kind of orchestrated all the aggregation and cleaning and analysis that happened had to happen via airflow and the end product would be a drone being shot out into a real remote part of the world to actually give somebody blood who needed it there. Because it turns out for certain parts of the world, the easiest way to deliver blood to them is via drone and not via some other, some other thing. So, these kind of, all the things people do with the modern data stack is absolutely incredible, right? Like you were saying, every company's trying to be a data-driven company. What really energizes me is knowing that like, for all those best, super great tools out there that power a business, we get to be the connective tissue, or the, almost like the electricity that kind of ropes them all together and makes so people can actually do what they need to do. >> Right. Phenomenal use cases that you just described, Raj. I mean, just the variety alone of what you guys are able to do and impact is so cool. So Paola, when you're with those data engineers, those data scientists, and customer conversations, what's your pitch? Why use Astro? >> Mm-hmm. Yeah, yeah, it's a good question. And honestly, to piggyback off of Viraj, there's so many. I think what keeps me so energized is how mission critical both our product and data orchestration is, and those use cases really are incredible and we work with customers of all shapes and sizes. But, to answer your question, right, so why use Astra? Why use our commercial products? There's so many people using open source, why pay for something more than that? So, you know, the baseline for our business really is that Airflow has grown exponentially over the last five years, and like we said has become an industry standard that we're confident there's a huge opportunity for us as a company and as a team. But, we also strongly believe that being great at running Airflow, you know, doesn't make you a successful company at what you do. What makes you a successful company at what you do is building great products and solving problems and solving pin points of your own customers, right? And, that differentiating value isn't being amazing at running Airflow. That should be our job. And so, we want to abstract those customers from meaning to do things like manage Kubernetes infrastructure that you need to run Airflow, and then hiring someone full-time to go do that. Which can be hard, but again doesn't add differentiating value to your team, or to your product, or to your customers. So, folks to get away from managing that infrastructure sort of a base, a base layer. Folks who are looking for differentiating features that make their team more productive and allows them to spend less time tweaking Airflow configurations and more time working with the data that they're getting from their business. For help, getting, staying up with Airflow releases. There's a ton of, we've actually been pretty quick to come out with new Airflow features and releases, and actually just keeping up with that feature set and working strategically with a partner to help you make the most out of those feature sets is a key part of it. And, really it's, especially if you're an organization who currently is committed to using Airflow, you likely have a lot of Airflow environments across your organization. And, being able to see those Airflow environments in a single place and being able to enable your data practitioners to create Airflow environments with a click of a button, and then use, for example, our command line to develop your Airflow Dags locally and push them up to our product, and use all of the sort of testing and monitoring and observability that we have on top of our product is such a key. It sounds so simple, especially if you use Airflow, but really those things are, you know, baseline value props that we have for the customers that continue to be excited to work with us. And of course, I think we can go beyond that and there's, we have ambitions to add whole, a whole bunch of features and expand into different types of personas. >> Right? >> But really our main value prop is for companies who are committed to Airflow and want to abstract themselves and make use of some of the differentiating features that we now have at Astronomer. >> Got it. Awesome. >> Thank you. One thing, one thing I'll add to that, Paola, and I think you did a good job of saying is because every company's trying to be a data company, companies are at different parts of their journey along that, right? And we want to meet customers where they are, and take them through it to where they want to go. So, on one end you have folks who are like, "Hey, we're just building a data team here. We have a new initiative. We heard about Airflow. How do you help us out?" On the farther end, you know, we have some customers that have been using Airflow for five plus years and they're like, "Hey, this is awesome. We have 10 more teams we want to bring on. How can you help with this? How can we do more stuff in the open source with you? How can we tell our story together?" And, it's all about kind of taking this vast community of data users everywhere, seeing where they're at, and saying like, "Hey, Astro and Airflow can take you to the next place that you want to go." >> Which is incredibly- >> Mm-hmm. >> and you bring up a great point, Viraj, that every company is somewhere in a different place on that journey. And it's, and it's complex. But it sounds to me like a lot of what you're doing is really stripping away a lot of the complexity, really enabling folks to use their data as quickly as possible, so that it's relevant and they can serve up, you know, the right products and services to whoever wants what. Really incredibly important. We're almost out of time, but I'd love to get both of your perspectives on what's next for Astronomer. You give us a a great overview of what the company's doing, the value in it for customers. Paola, from your lens as one of the co-founders, what's next? >> Yeah, I mean, I think we'll continue to, I think cultivate in that open source community. I think we'll continue to build products that are open sourced as part of our ecosystem. I also think that we'll continue to build products that actually make Airflow, and getting started with Airflow, more accessible. So, sort of lowering that barrier to entry to our products, whether that's price wise or infrastructure requirement wise. I think making it easier for folks to get started and get their hands on our product is super important for us this year. And really it's about, I think, you know, for us, it's really about focused execution this year and all of the sort of core principles that we've been talking about. And continuing to invest in all of the things around our product that again, enable teams to use Airflow more effectively and efficiently. >> And that efficiency piece is, everybody needs that. Last question, Viraj, for you. What do you see in terms of the next year for Astronomer and for your role? >> Yeah, you know, I think Paola did a really good job of laying it out. So it's, it's really hard to disagree with her on anything, right? I think executing is definitely the most important thing. My own personal bias on that is I think more than ever it's important to really galvanize the community around airflow. So, we're going to be focusing on that a lot. We want to make it easier for our users to get get our product into their hands, be that open source users or commercial users. And last, but certainly not least, is we're also really excited about Data Lineage and this other open source project in our umbrella called Open Lineage to make it so that there's a standard way for users to get lineage out of different systems that they use. When we think about what's in store for data lineage and needing to audit the way automated decisions are being made. You know, I think that's just such an important thing that companies are really just starting with, and I don't think there's a solution that's emerged that kind of ties it all together. So, we think that as we kind of grow the role of Airflow, right, we can also make it so that we're helping solve, we're helping customers solve their lineage problems all in Astro, which is our kind of the best of both worlds for us. >> Awesome. I can definitely feel and hear the enthusiasm and the passion that you both bring to Astronomer, to your customers, to your team. I love it. We could keep talking more and more, so you're going to have to come back. (laughing) Viraj, Paola, thank you so much for joining me today on this showcase conversation. We really appreciate your insights and all the context that you provided about Astronomer. >> Thank you so much for having us. >> My pleasure. For my guests, I'm Lisa Martin. You're watching this Cube conversation. (soft electronic music)

Published Date : Feb 21 2023

SUMMARY :

to this CUBE conversation Thank you so much and what it is that you guys do. and hopefully that gives you an idea and the problems that it solves for us. to be a data company, right? So, how are companies actually kind of all the abstractions you need, and just give the And that comes in, you of the organization. and analysis that happened that you just described, Raj. that you need to run Airflow, that we now have at Astronomer. Awesome. and I think you did a good job of saying and you bring up a great point, Viraj, and all of the sort of core principles and for your role? and needing to audit the and all the context that you (soft electronic music)

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Viraj ParekhPERSON

0.99+

Lisa MartinPERSON

0.99+

PaolaPERSON

0.99+

VirajPERSON

0.99+

2014DATE

0.99+

AstronomerORGANIZATION

0.99+

Paola Peraza-CalderonPERSON

0.99+

Paola Peraza CalderonPERSON

0.99+

AirflowORGANIZATION

0.99+

AirbnbORGANIZATION

0.99+

five plus yearsQUANTITY

0.99+

AstroORGANIZATION

0.99+

RajPERSON

0.99+

UziORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

firstQUANTITY

0.99+

bothQUANTITY

0.99+

todayDATE

0.99+

KronORGANIZATION

0.99+

10 more teamsQUANTITY

0.98+

AstronomersORGANIZATION

0.98+

AstraORGANIZATION

0.98+

oneQUANTITY

0.98+

AirflowTITLE

0.98+

InformaticsORGANIZATION

0.98+

Monte CarloTITLE

0.98+

this yearDATE

0.98+

HubSpotORGANIZATION

0.98+

one companyQUANTITY

0.97+

AstronomerTITLE

0.97+

next yearDATE

0.97+

ApacheORGANIZATION

0.97+

Airflow SummitEVENT

0.97+

AWSORGANIZATION

0.95+

both worldsQUANTITY

0.93+

KRONORGANIZATION

0.93+

CUBEORGANIZATION

0.92+

MORGANIZATION

0.92+

RedshiftTITLE

0.91+

SnowflakeTITLE

0.91+

five data teamsQUANTITY

0.91+

GitHubORGANIZATION

0.91+

OozieORGANIZATION

0.9+

Data LineageORGANIZATION

0.9+

Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business


 

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ErikPERSON

0.99+

Alex MyersonPERSON

0.99+

Ken ShifmanPERSON

0.99+

Sanjay PoonenPERSON

0.99+

Dave VellantePERSON

0.99+

DavePERSON

0.99+

Erik BradleyPERSON

0.99+

November 21DATE

0.99+

Darren BramenPERSON

0.99+

AlexPERSON

0.99+

Cheryl KnightPERSON

0.99+

PostgresORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

NetskopeORGANIZATION

0.99+

AdobeORGANIZATION

0.99+

Rob HofPERSON

0.99+

FivetranORGANIZATION

0.99+

$50 millionQUANTITY

0.99+

21%QUANTITY

0.99+

Chris LynchPERSON

0.99+

19%QUANTITY

0.99+

Jeremy BurtonPERSON

0.99+

$800 millionQUANTITY

0.99+

6,000QUANTITY

0.99+

OracleORGANIZATION

0.99+

Redis LabsORGANIZATION

0.99+

November '21DATE

0.99+

ETRORGANIZATION

0.99+

FirstQUANTITY

0.99+

25%QUANTITY

0.99+

last yearDATE

0.99+

OneTrustORGANIZATION

0.99+

two dimensionsQUANTITY

0.99+

two groupsQUANTITY

0.99+

November of 21DATE

0.99+

bothQUANTITY

0.99+

BostonLOCATION

0.99+

more than 400 companiesQUANTITY

0.99+

Kristen MartinPERSON

0.99+

MySQLTITLE

0.99+

MoogsoftORGANIZATION

0.99+

The CubeORGANIZATION

0.99+

thirdQUANTITY

0.99+

GrafanaORGANIZATION

0.99+

H2OORGANIZATION

0.99+

Mike SpeiserPERSON

0.99+

david.vellante@siliconangle.comOTHER

0.99+

secondQUANTITY

0.99+

twoQUANTITY

0.99+

firstQUANTITY

0.99+

28%QUANTITY

0.99+

16%QUANTITY

0.99+

SecondQUANTITY

0.99+

Natasha | DigitalBits VIP Gala Dinner Monaco


 

(upbeat music) >> Hello, everyone. Welcome back to theCUBE's extended coverage. I'm John Furrier, host of theCUBE. We are here in Monaco at the Yacht Club, part of the VIP Gala with Prince Albert, DigitalBits, theCUBE. theCUBE and Prince Albert celebrating Monaco leaning into crypto. I'm here with Natasha Mahfar, who's our guest. She just came on theCUBE. Great story. Great to see you. Thanks for coming on. >> Thank you so much for having me. >> Tell the folks what you do real quick. >> Sure. So I actually started my career in Silicon Valley, like you have. And I had the idea of creating a startup in mental health that was voice based only. So it was peer to peer support groups via voice. So I created this startup, pretended to be a student at Stanford and built out a whole team, and unfortunately, at that time, no one was in the space of mental health and voice. Now, as you know, it's a $30 billion industry that's one of the biggest in Silicon Valley. So my career really started from there. And due to that startup, I got involved in the World XR Forum. Now, the World XR Forum is kind of like a mini Davos, but a little bit more exclusive, where we host entrepreneurs, people in blockchain, crypto, and we have a five day event covering all sorts of topics. So- >> When you host them, you mean like host them and they hang out and sleep over? It's a hotel? Is it an event? A workshop? >> There's workshops. We arrange hotels. We pretty much arrange everything that there is. >> It's a group get together. >> It's a group get together. Pretty much like Davos. >> And so Natasha, I wanted to talk to you about what we're passionate about which is theCUBE is bringing people up to have a voice and give them a voice. Give people a platform. You don't have to be famous. If you have something to say and share, we found that right now in this environment with media, we go out to an event, we stream as many stories, but we also have the virtual version of our studio. And I could tell you, I've found that internationally now as we bring people together, there are so many great stories. >> Absolutely. >> Out there that need to be told. And the bottleneck isn't the media, it's the fact that it's open now. >> Yes. >> So why aren't the stories coming out? So our mission is to get the stories. >> Wow. >> Scale stories. The more stories that are scaled, the more people can feel it. More people are impacted by it, and it changes the world. It gets people serendipity with data 'cause we're, you know, you shared some data about what you're working on. >> Yeah, of course. It's all about data these days. And the fact that you're doing it so openly is great because there is a need for that today, so. >> What do you see right now in the market for media? I mean, we got emerging markets, a lot of misinformation. Trust is a big problem. >> Right. >> Bullying, harassing. Smear campaigns. What's news, what's not news. I mean, how do you get your news? I mean, how do people figure out what's going on? >> No, absolutely. And this is such a pure format and a way of doing it. How did you come up with the idea, and how did you start? >> Well, I started... I realized after the Web 2.0, when social media started taking over and ruining the democratization . Blogging, podcasting, which I started in 2004, one of the first podcasts in Silicon Valley. >> Wow. >> I saw the network of that. I saw the value that people had when normal people, they call it user generated content, shared information. And I discovered something amazing that a nobody like me can have a really top podcast. >> Well, you're definitely not a nobody, but... >> Well, I was back then. And nobody knew me back then. But what it is is that even... If you put your voice out there, people will connect to it. And if you have the ability to bring other people in, you start to see a social dynamic. And what social media ruined, Facebook, Twitter, not so much Twitter 'cause Twitter's more smeary, but it's still got to open the API, LinkedIn, they're all terrible. They're all gardens. They don't really bring people together, so I think that stalled for about almost eight years or nine years. Now, with crypto and decentralization, you start to see the same thing come back. Democratization, level the playing field, remove the middle man and person, intermediate the middle bottlenecks. So with media, we found that live streaming and going to events was what the community wants. And then interviewing people, and getting their ideas out there. Not promotional, not getting paid to say stuff. Yeah, they get the plug in for the company that they're working on, that's good for everybody. But more share something that you're passionate about, data. And it works. And people like it. And we've been doing it for 12 years, and it creates a great brand of openness, community, and network effect. So we scaled up the brand to be- >> And it seems like you're international now. I mean, we're sitting in Monte Carlo, so I don't think it gets better than that. >> Well, in 2016, we started going international. 2017, we started doing stuff in Europe. 2018, we did the crypto, Middle East. And we also did London, a lot of different events. We had B2B Enterprise and Crypto Blooming. 2019, we were like, "Let's go global with staff and whatnot." >> Wow. >> And the pandemic hits. >> I know. >> And that really kind of allowed us to pivot and turn us into a virtual hybrid. And that's why we're into the metaverse, as we see the value of a physical face to face event where intimacy's there, but why aren't my friends connected first party? >> Right. How much would you say the company has grown from the time that you kind of pivoted? >> Well, we've grown in a different direction with new capabilities because the old way is over. >> Right. >> Every event right now, this event here, is in person. People are talking. They get connections. But every person that's connecting has a social graph behind them that's online too, and immediately available. And with Instagram, direct messaging, Telegram, Signal, all there. >> It's brilliant. Honestly, it was brilliant idea and a brilliant pivot. >> Thank you for interviewing me. >> Yeah, of course. (Natasha and John laugh) >> Any other questions? >> That should do it. >> Okay. Are you going to have fun tonight? >> Absolutely. >> What is your take of the Monaco scene here? What's it like? >> You know, I think it's a really interesting scene. I think there's a lot of potential because this is such an international place so it draws a very eclectic crowd, and I think there's a lot that could be done here. And you have a lot of people from Europe that are starting to get into this whole crypto, leaving kind of the traditional banks and finance behind. So I think the potential is very strong. >> Very progressive. Well, Natasha, thank you for sharing. >> Thank you so much. >> Here on theCUBE. We're the extended edition CUBE here in Monaco with Prince Albert, theCUBE, and Prince Albert, DigitalBits Al Burgio, a great market here for them. And just an amazing time. And thanks for watching. Natasha, thanks for coming on. Thanks for watching theCUBE. We'll be back with more after this break. (upbeat music)

Published Date : Aug 22 2022

SUMMARY :

part of the VIP Gala with Prince Albert, And I had the idea of creating everything that there is. It's a group get together. And so Natasha, I wanted to talk to you And the bottleneck isn't the media, So our mission is to get the stories. the more people can feel it. And the fact that you're now in the market for media? I mean, how do you get your news? And this is such a pure I realized after the Web 2.0, I saw the network of that. Well, you're definitely And if you have the ability And it seems like And we also did London, a And that really kind from the time that you kind of pivoted? because the old way is over. And with Instagram, direct it was brilliant idea Yeah, of course. to have fun tonight? And you have a lot of people from Europe Well, Natasha, thank you for sharing. We're the extended edition

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Natasha MahfarPERSON

0.99+

NatashaPERSON

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

2004DATE

0.99+

EuropeLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

2018DATE

0.99+

12 yearsQUANTITY

0.99+

2019DATE

0.99+

2016DATE

0.99+

2017DATE

0.99+

$30 billionQUANTITY

0.99+

MonacoLOCATION

0.99+

DigitalBitsORGANIZATION

0.99+

theCUBEORGANIZATION

0.99+

five dayQUANTITY

0.99+

Monte CarloLOCATION

0.99+

LondonLOCATION

0.99+

Middle EastLOCATION

0.98+

todayDATE

0.98+

FacebookORGANIZATION

0.98+

TwitterORGANIZATION

0.97+

tonightDATE

0.97+

LinkedInORGANIZATION

0.96+

oneQUANTITY

0.96+

nine yearsQUANTITY

0.96+

World XR ForumEVENT

0.95+

first podcastsQUANTITY

0.95+

StanfordORGANIZATION

0.93+

first partyQUANTITY

0.9+

B2B EnterpriseORGANIZATION

0.89+

Prince AlbertORGANIZATION

0.88+

AlbertORGANIZATION

0.86+

PrincePERSON

0.84+

Prince AlbertPERSON

0.82+

DavosPERSON

0.8+

eight yearsQUANTITY

0.8+

InstagramORGANIZATION

0.76+

DinnerEVENT

0.74+

Yacht ClubLOCATION

0.72+

TelegramTITLE

0.71+

pandemicEVENT

0.68+

CUBEORGANIZATION

0.67+

Al BurgioORGANIZATION

0.61+

SignalORGANIZATION

0.5+

Crypto BloomingEVENT

0.41+

theCUBETITLE

0.4+

Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization


 

>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)

Published Date : Jun 18 2022

SUMMARY :

bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Frank SlootmanPERSON

0.99+

Frank SlootmanPERSON

0.99+

Doug HenschenPERSON

0.99+

Stephanie ChanPERSON

0.99+

Christian KleinermanPERSON

0.99+

AWSORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Rob HofPERSON

0.99+

Benoit DagevillePERSON

0.99+

2014DATE

0.99+

Matt SulkinsPERSON

0.99+

JPMCORGANIZATION

0.99+

2019DATE

0.99+

Cheryl KnightPERSON

0.99+

Palo AltoLOCATION

0.99+

Denise PerssonPERSON

0.99+

Alex MyersonPERSON

0.99+

Tony BearPERSON

0.99+

Dave MenningerPERSON

0.99+

DellORGANIZATION

0.99+

JulyDATE

0.99+

GeicoORGANIZATION

0.99+

NovemberDATE

0.99+

SnowflakeTITLE

0.99+

40%QUANTITY

0.99+

OracleORGANIZATION

0.99+

App StoreTITLE

0.99+

Capital OneORGANIZATION

0.99+

second principleQUANTITY

0.99+

Sanjeev MohanPERSON

0.99+

SnowflakeORGANIZATION

0.99+

1300 native appsQUANTITY

0.99+

Tony BearPERSON

0.99+

David.Vellante@siliconangle.comOTHER

0.99+

Kristin MartinPERSON

0.99+

MongoORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

Snowflake Summit 2022EVENT

0.99+

FirstQUANTITY

0.99+

twoDATE

0.99+

PythonTITLE

0.99+

10 different tablesQUANTITY

0.99+

FacebookORGANIZATION

0.99+

ETRORGANIZATION

0.99+

bothQUANTITY

0.99+

SnowflakeEVENT

0.98+

one placeQUANTITY

0.98+

each weekQUANTITY

0.98+

O'ReillyORGANIZATION

0.98+

This weekDATE

0.98+

Hadoop WorldEVENT

0.98+

this weekDATE

0.98+

PureORGANIZATION

0.98+

about 40 partnersQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

last quarterDATE

0.98+

OneQUANTITY

0.98+

S3TITLE

0.97+

HadoopLOCATION

0.97+

singleQUANTITY

0.97+

Caesars ForumLOCATION

0.97+

IcebergTITLE

0.97+

single sourceQUANTITY

0.97+

SiliconORGANIZATION

0.97+

Nearly 10,000 peopleQUANTITY

0.97+

Apache IcebergORGANIZATION

0.97+

Jon Loyens, data.world | Snowflake Summit 2022


 

>>Good morning, everyone. Welcome back to the Cube's coverage of snowflake summit 22 live from Caesar's forum in Las Vegas. Lisa Martin, here with Dave Valante. This is day three of our coverage. We've had an amazing, amazing time. Great conversations talking with snowflake executives, partners, customers. We're gonna be digging into data mesh with data.world. Please welcome John loins, the chief product officer. Great to have you on the program, John, >>Thank you so much for, for having me here. I mean, the summit, like you said, has been incredible, so many great people, so such a good time, really, really nice to be back in person with folks. >>It is fabulous to be back in person. The fact that we're on day four for, for them. And this is the, the solution showcase is as packed as it is at 10 11 in the morning. Yeah. Is saying something >>Yeah. Usually >>Chopping at the bit to hear what they're doing and innovate. >>Absolutely. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, especially >>In Vegas. This is impressive. Talk to the audience a little bit about data.world, what you guys do and talk about the snowflake relationship. >>Absolutely data.world is the only true cloud native enterprise data catalog. We've been an incredible snowflake partner and Snowflake's been an incredible partner to us really since 2018. When we became the first data catalog in the snowflake partner connect experience, you know, snowflake and the data cloud make it so possible. And it's changed so much in terms of being able to, you know, very easily transition data into the cloud to break down those silos and to have a platform that enables folks to be incredibly agile with data from an engineering and infrastructure standpoint, data out world is able to provide a layer of discovery and governance that matches that agility and the ability for a lot of different stakeholders to really participate in the process of data management and data governance. >>So data mesh basically Jamma, Dani lays out the first of all, the, the fault domains of existing data and big data initiatives. And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams that you have to go through and it just slows everything down and it doesn't scale. They don't have domain context. So she came up with four principles if I may, yep. Domain ownership. So push it out to the businesses. They have the context they should own the data. The second is data as product. We're certainly hearing a lot about that today this week. The third is that. So that makes it sounds good. Push out the, the data great, but it creates two problems. Self-serve infrastructure. Okay. But her premises infrastructure should be an operational detail. And then the fourth is computational governance. So you talked about data CA where do you fit in those four principles? >>You know, honestly, we are able to help teams realize the data mesh architecture. And we know that data mesh is really, it's, it's both a process in a culture change, but then when you want to enact a process in a culture change like this, you also need to select the appropriate tools to match the culture that you're trying to build the process in the architecture that you're trying to build. And the data world data catalog can really help along all four of those axes. When you start thinking first about, let's say like, let's take the first one, you know, data as a product, right? We even like very meta of us from metadata management platform at the end of the day. But very meta of us. When you talk about data as a product, we track adoption and usage of all your data assets within your organization and provide program teams and, you know, offices of the CDO with incredible evented analytics, very detailed that gives them the right audit trail that enables them to direct very scarce data engineering, data architecture resources, to make sure that their data assets are getting adopted and used properly. >>On the, on the domain driven side, we are entirely knowledge graph and open standards based enabling those different domains. We have, you know, incredible joint snowflake customers like Prologis. And we chatted a lot about this in our session here yesterday, where, because of our knowledge graph underpinnings, because of the flexibility of our metadata model, it enables those domains to actually model their assets uniquely from, from group to group, without having to, to relaunch or run different environments. Like you can do that all within one day catalog platform without having to have separate environments for each of those domains, federated governance. Again, the amount of like data exhaust that we create that really enables ambient governance and participatory governance as well. We call it agile data governance, really the adoption of agile and open principles applied to governance to make it more inclusive and transparent. And we provide that in a way that Confederate across those means and make it consistent. >>Okay. So you facilitate across that whole spectrum of, of principles. And so what in the, in the early examples of data mesh that I've studied and actually collaborated with, like with JPMC, who I don't think is who's not using your data catalog, but hello, fresh who may or may not be, but I mean, there, there are numbers and I wanna get to that. But what they've done is they've enabled the domains to spin up their own, whatever data lakes, data, warehouses, data hubs, at least in, in concept, most of 'em are data lakes on AWS, but still in concept, they wanna be inclusive and they've created a master data catalog. And then each domain has its sub catalogue, which feeds into the master and that's how they get consistency and governance and everything else is, is that the right way to think about it? And or do you have a different spin on that? >>Yeah, I, I, you know, I have a slightly different spin on it. I think organizationally it's the right way to think about it. And in absence of a catalog that can truly have multiple federated metadata models, multiple graphs in one platform, I, that is really kind of the, the, the only way to do it, right with data.world. You don't have to do that. You can have one platform, one environment, one instance of data.world that spans all of your domains, enable them to operate independently and then federate across. So >>You just answered my question as to why I should use data.world versus Amazon glue. >>Oh, absolutely. >>And that's a, that's awesome that you've done now. How have you done that? What, what's your secret >>Sauce? The, the secret sauce era is really an all credit to our CTO. One of my closest friends who was a true student of knowledge graph practices and principles, and really felt that the right way to manage metadata and knowledge about the data analytics ecosystem that companies were building was through federated linked data, right? So we use standards and we've built a, a, an open and extensible metadata model that we call costs that really takes the best parts of existing open standards in the semantics space. Things like schema.org, DCA, Dublin core brings them together and models out the most typical enterprise data assets providing you with an ontology that's ready to go. But because of the graph nature of what we do is instantly accessible without having to rebuild environments, without having to do a lot of management against it. It's, it's really quite something. And it's something all of our customers are, are very impressed with and, and, and, and, you know, are getting a lot of leverage out of, >>And, and we have a lot of time today, so we're not gonna shortchange this topic. So one last question, then I'll shut up and let you jump in. This is an open standard. It's not open source. >>No, it's an open built on open standards, built on open standards. We also fundamentally believe in extensibility and openness. We do not want to vertically like lock you into our platform. So everything that we have is API driven API available. Your metadata belongs to you. If you need to export your graph, you know, instantly available in open machine readable formats. That's really, we come from the open data community. That was a lot of the founding of data.world. We, we worked a lot in with the open data community and we, we fundamentally believe in that. And that's enabled a lot of our customers as well to truly take data.world and not have it be a data catalog application, but really an entire metadata management platform and extend it even further into their enterprise to, to really catalog all of their assets, but also to build incredible integrations to things like corporate search, you know, having data assets show up in corporate Wiki search, along with all the, the descriptive metadata that people need has been incredibly powerful and an incredible extension of our platform that I'm so happy to see our customers in. >>So leasing. So it's not exclusive to, to snowflake. It's not exclusive to AWS. You can bring it anywhere. Azure GCP, >>Anytime. Yeah. You know where we are, where we love snowflake, look, we're at the snowflake summit. And we've always had a great relationship with snowflake though, and really leaned in there because we really believe Snowflake's principles, particularly around cloud and being cloud native and the operating advantages that it affords companies that that's really aligned with what we do. And so snowflake was really the first of the cloud data catalogs that we ultimately or say the cloud data warehouses that we integrated with and to see them transition to building really out the data cloud has been awesome. >>Talk about how data world and snowflake enable companies like per lodges to be data companies. These days, every company has to be a data company, but they, they have to be able to do so quickly to be competitive and to, to really win. How do you help them if we like up level the conversation to really impacting the overall business? >>That's a great question, especially right now, everybody knows. And pro is a great example. They're a logistics and supply chain company at the end of the day. And we know how important logistics and supply chain is nowadays and for them and for a lot of our customers. I think one of the advantages of having a data catalog is the ability to build trust, transparency and inclusivity into their data analytics practice by adopting agile principles, by adopting a data mesh, you're able to extend your data analytics practice to a much broader set of stakeholders and to involve them in the process while the work is getting done. One of the greatest things about agile software development, when it became a thing in the early two thousands was how inclusive it was. And that inclusivity led to a much faster ROI on software projects. And we see the same thing happening in data analytics, people, you know, we have amazing data scientists and data analysts coming up with these insights that could be business changing that could make their company significantly more resilient, especially in the face of economic uncertainty. >>But if you have to sit there and argue with your business stakeholders about the validity of the data, about the, the techniques that were used to do the analysis, and it takes you three months to get people to trust what you've done, that opportunity's passed. So how do we shorten those cycles? How do we bring them closer? And that's, that's really a huge benefit that like Prologis has, has, has realized just tightening that cycle time, building trust, building inclusion, and making sure ultimately humans learn by doing, and if you can be inclusive, it, even, it even increases things like that. We all want to, to, to, to help cuz Lord knows the world needs it. Things like data literacy. Yeah. Right. >>So data.world can inform me as to where on the spectrum of data quality, my data set lives. So I can say, okay, this is usable, shareable, you know, exactly of gold standard versus fix this. Right. Okay. Yep. >>Yep. >>That's yeah. Okay. And you could do that with one data catalog, not a bunch of >>Yeah. And trust trust is really a multifaceted and multi multi-angle idea, right? It's not just necessarily data quality or data observability. And we have incredible partnerships in that space, like our partnership with, with Monte Carlo, where we can ingest all their like amazing observability information and display that in a really like a really consumable way in our data catalog. But it also includes things like the lineage who touch it, who is involved in the process of a, can I get a, a, a question answered quickly about this data? What's it been used for previously? And do I understand that it's so multifaceted that you have to be able to really model and present that in a way that's unique to any given organization, even unique within domains within a single organization. >>If you're not, that means to suggest you're a data quality. No, no supplier. Absolutely. But your partner with them and then that you become the, the master catalog. >>That's brilliant. I love it. Exactly. And you're >>You, you just raised your series C 15 million. >>We did. Yeah. So, you know, really lucky to have incredible investors like Goldman Sachs, who, who led our series C it really, I think, communicates the trust that they have in our vision and what we're doing and the impact that we can have on organization's ability to be agile and resilient around data analytics, >>Enabling customers to have that single source of truth is so critical. You talked about trust. That is absolutely. It's no joke. >>Absolutely. >>That is critical. And there's a tremendous amount of business impact, positive business impact that can come from that. What are some of the things that are next for data.world that we're gonna see? >>Oh, you know, I love this. We have such an incredibly innovative team. That's so dedicated to this space and the mission of what we're doing. We're out there trying to fundamentally change how people get data analytics work done together. One of the big reasons I founded the company is I, I really truly believe that data analytics needs to be a team sport. It needs to go from, you know, single player mode to team mode and everything that we've worked on in the last six years has leaned into that. Our architecture being cloud native, we do, we've done over a thousand releases a year that nobody has to manage. You don't have to worry about upgrading your environment. It's a lot of the same story that's made snowflake. So great. We are really excited to have announced in March on our own summit. And we're rolling this suite of features out over the course of the year, a new package of features that we call data.world Eureka, which is a suite of automations and, you know, knowledge driven functionality that really helps you leverage a knowledge graph to make decisions faster and to operationalize your data in, in the data ops way with significantly less effort, >>Big, big impact there. John, thank you so much for joining David, me unpacking what data world is doing. The data mesh, the opportunities that you're giving to customers and every industry. We appreciate your time and congratulations on the news and the funding. >>Ah, thank you. It's been a, a true pleasure. Thank you for having me on and, and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. Thank you. >>We will. All right. For our guest and Dave ante, I'm Lisa Martin. You're watching the cubes third day of coverage of snowflake summit, 22 live from Vegas, Dave and I will be right back with our next guest. So stick around.

Published Date : Jun 16 2022

SUMMARY :

Great to have you on the program, John, I mean, the summit, like you said, has been incredible, It is fabulous to be back in person. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, what you guys do and talk about the snowflake relationship. And it's changed so much in terms of being able to, you know, very easily transition And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams about, let's say like, let's take the first one, you know, data as a product, We have, you know, incredible joint snowflake customers like Prologis. governance and everything else is, is that the right way to think about it? And in absence of a catalog that can truly have multiple federated How have you done that? of knowledge graph practices and principles, and really felt that the right way to manage then I'll shut up and let you jump in. an incredible extension of our platform that I'm so happy to see our customers in. It's not exclusive to AWS. first of the cloud data catalogs that we ultimately or say the cloud data warehouses but they, they have to be able to do so quickly to be competitive and to, thing happening in data analytics, people, you know, we have amazing data scientists and data the data, about the, the techniques that were used to do the analysis, and it takes you three So I can say, okay, this is usable, shareable, you know, That's yeah. that you have to be able to really model and present that in a way that's unique to any then that you become the, the master catalog. And you're that we can have on organization's ability to be agile and resilient Enabling customers to have that single source of truth is so critical. What are some of the things that are next for data.world that we're gonna see? It needs to go from, you know, single player mode to team mode and everything The data mesh, the opportunities that you're giving to customers and every industry. and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. So stick around.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

Lisa MartinPERSON

0.99+

Dave ValantePERSON

0.99+

DavePERSON

0.99+

JohnPERSON

0.99+

Jon LoyensPERSON

0.99+

Monte CarloORGANIZATION

0.99+

John loinsPERSON

0.99+

AmazonORGANIZATION

0.99+

MarchDATE

0.99+

Las VegasLOCATION

0.99+

VegasLOCATION

0.99+

Goldman SachsORGANIZATION

0.99+

yesterdayDATE

0.99+

three monthsQUANTITY

0.99+

AWSORGANIZATION

0.99+

one platformQUANTITY

0.99+

one dayQUANTITY

0.99+

thirdQUANTITY

0.99+

two problemsQUANTITY

0.99+

fourthQUANTITY

0.99+

OneQUANTITY

0.99+

2018DATE

0.99+

15 millionQUANTITY

0.98+

DaniPERSON

0.98+

secondQUANTITY

0.98+

firstQUANTITY

0.98+

third dayQUANTITY

0.98+

first oneQUANTITY

0.98+

SnowflakeORGANIZATION

0.98+

DCAORGANIZATION

0.98+

one last questionQUANTITY

0.98+

data.world.ORGANIZATION

0.97+

PrologisORGANIZATION

0.97+

JPMCORGANIZATION

0.97+

each domainQUANTITY

0.97+

today this weekDATE

0.97+

JammaPERSON

0.97+

bothQUANTITY

0.97+

first data catalogQUANTITY

0.95+

Snowflake Summit 2022EVENT

0.95+

eachQUANTITY

0.94+

todayDATE

0.94+

singleQUANTITY

0.94+

data.worldORGANIZATION

0.93+

day threeQUANTITY

0.93+

oneQUANTITY

0.93+

one instanceQUANTITY

0.92+

over a thousand releases a yearQUANTITY

0.92+

day fourQUANTITY

0.91+

SnowflakeTITLE

0.91+

fourQUANTITY

0.91+

10 11 in the morningDATE

0.9+

22QUANTITY

0.9+

one environmentQUANTITY

0.9+

single organizationQUANTITY

0.88+

four principlesQUANTITY

0.86+

agileTITLE

0.85+

last six yearsDATE

0.84+

one data catalogQUANTITY

0.84+

EurekaORGANIZATION

0.83+

Azure GCPTITLE

0.82+

CaesarPERSON

0.82+

series COTHER

0.8+

CubeORGANIZATION

0.8+

data.worldOTHER

0.78+

LordPERSON

0.75+

thousandsQUANTITY

0.74+

single sourceQUANTITY

0.74+

DublinORGANIZATION

0.73+

snowflake summit 22EVENT

0.7+

WikiTITLE

0.68+

schema.orgORGANIZATION

0.67+

early twoDATE

0.63+

CDOTITLE

0.48+

Michael Sotnick, Pure Storage | VeeamON 2022


 

>>We're back with the Cube's coverage of Von 2022 from the aria in Las Vegas, we're talking pure data protection and nobody better to talk you about that than pure storage. You can't miss these guys when they're around because the orange crush is here. Dave ante for Dave Nicholson, Michael Sonic is here. He is the vice president of global alliances at pure storage. Michael. Good to see you again. Thanks for the little golf. Appreciate joy Lee. Yeah, appreciate that. Hopefully you get out there my back and, uh, be seen. So how you doing, man? I'm doing face to face. >>It's wonderful to be face to face with the cube. It's, uh, always a pleasure to have the opportunity to spend some time with you. Good to meet you. Good to meet. You have the opportunity to get, spend some time together. You guys, um, it's just great being at a show my first one back. And so I'm, uh, you know, just feeling the energy from the room and, uh, just great to come in here and see the cube all lit >>Up. Yeah. Accelerate 2019 in Austin was an awesome event. And, and one of the last ones that we did before, you know, the pandemic for all of us, um, we did some obviously support for, for virtual. You guys are having another show finally face to face in June. So look forward to that. Got it. >>20 days, 21 days, >>We'll see you there. Right. So tell us about what's going on with Veeam. Give us the update. Yeah, >>Look, we're thrilled to be here as a sponsor for Veeam and for Veeam on, uh, this is a longstanding partnership, you know, us, right. So found in 2009 start shipping product in 2012, um, really disrupted the block storage space with an all flash solution. Yeah. And you know, it's, it's a success story in terms of company going from single product to multi-product to portfolio, to solution. And along that way, the data protection use case and workload has really come into the, you know, kind of center focus for us first with flash blade in the market, which is our unified fast file and object solution. And more recently with flash array C, which is our capacity optimized flash array for block storage, which is a great relationship with Veeam and an area where we've done some, um, significant, you know, joint engineering and the FAC, which is what we refer to it as. >>And the Veeam selling motion is extremely strong. Um, and you know, it's, it's solving a real problem and that is, you know, customers are increasingly being faced with these, um, tighter and tighter SLAs to ensure the availability of their data is there. And then there's also, you know, the, the security element. And I think a term that VES using here was cyber resiliency, which I like, right. I mean, you know, the, um, safe mode integration, which is our solution for IM mutability and, and for, um, anti ransomware or, or one step to take, to, to safeguard yourself in a, against a ransomware event. Um, you know, that those are great complimentary parts of us. And indeed, >>You know, Michael, I, I, I want to ask you about your shared vision with, with Veeam. I remember I was talking to cause on the cube. Um, it might have been a 2019. I can't remember. Might have been the year before that. No, I >>Think shorts or pants. >>Sorry. >>Was he in shorts or pants? He was in pants. >>Ah, okay. So I was pushing him on, well, why don't you do it this way? Why don't you do that? Why don't you do tiering all this stuff? And it just always came back to simplicity. He said, we optimize for simplicity over all this complexity and you know, we'll get the, the function through the ecosystem partnership. So is that the shared vision with Veeam? I mean, that, it's kind of, it's just work. It just works is their mantra. But, but talk about that shared vision as particularly as it relates to data protection and cyber resiliency. >>Yeah. Thanks so much for, for recalling that Dave, cuz it, it, we hear it constantly. It's it's in the, now that we're coming back to the office, it's in the hallways, it's out in front of conference rooms, you know, the, the elegance and, and the, and the, um, value and simplicity is everywhere inside of pure. Um, I would say it's part of our shared vision. I think it's customer centricity is at the core of what Veeam and pure, you know, has really fused us together. We're both global, you know, their history is European based and grew up out of there and then have succeeded in north America. Ours is absolutely north America based first on the west coast, then across the country. And then finally into Europe, more recently globally and, and, and a lot of growth internationally, including APJ. So it's customer centricity, it's global, it's the way we go to the customer. So a partner centric go to market motion is live and well in both organizations, uh, solution providers, MSPs, GSIs, you know, range of different ways to get to that customer. Um, and without a doubt, the customer experience is, is part of the, is part of the piece. And, and that's where our simplicity is, um, front and center. And, and I know vem is the same >>Dig, dig it into it. Go ahead. >>Yeah. Yeah, no. So, so out in the real world, are the conversations still about flash for backup and recovery convincing people that that makes sense or, or have we moved on to where now it's the pure flash value proposition BEC because people accept that flash makes sense. Where, where are we in the real world? >>Yeah, I, I think it's, it's different in different industries, different use cases, different workloads, different environments, and it's, um, part of a bigger story. But I think what, what is happening now is, um, we were before the inevitability of flash as the data center, primary storage solution, but now, like, I don't think anyone would debate that. Right. And, and I think now in data protection flash as a component to a robust, secure data protection, both as a target for backup and as a source of recovery is an inevitability to be part of that conversation. Flash is >>There, you guys like to be first at a lot of things, you know, gives you bragging rights, but it also gives you market momentum. Again, I'll pull it. My, my bag of pure memories. I remember meeting Scott Deason, who was the first CEO of, of pure, it's only been two to my knowledge, right. Scott was first Charlie G Carlo now, um, in, in, it was early days, it was 2009, like in an Oracle open world or something. And he was telling me about this startup that he's doing. And, and one of the things that struck me is the vision around the API economy, which was new at the time. It was like, well, okay, what's that? And that's really, what's, what's happened here. It's part of simplicity. It's part about ecosystem. We were talking about products versus platforms. You can't really have a platform unless you have an ecosystem. So where are we at today? How does that relate to your partnership with, with Veeam? Yeah. >>It, it's such a great recall on your part, you know, cause I think, um, we are a storage company. We do provide a raise in the wild, you know, over 10,000 customer, tens of thousands of arrays now. And you know, but at the core, it's the software that matters and, and that's really what drives the user experience. And we're proud to be, you know, the development partner on the universal storage API, the us API for Veeam, that is a essential ingredient to success for the joint pure and vem customer experience. It gives them that single pane of glass, that administrative view, where they're able to get the information they need on what's happening within their environment and be able to take corrective action. And, you know, we're very proud of all the tools that we provide our storage customers, but in a da in a data protection use case and workload, they want to put, you know, they want to go right to Veeam and, and have that be the source of truth. And that's where that API is so important. >>What, what's the story to customers, Michael, in terms of particularly cyber resilience, you've got obviously got a TCO play, simple equals lower cost. Um, you got really much tighter service level agreements and requirements now, um, the security, the storage and data protection and security space are kind of coming together. So what's the narrative for customers. Give me the pitch. Yeah, >>Look, I think, I think every customer today has an obligation to include security as a must have within their solution anywhere in the data center. And for us, it's, you know, simply put the combination of Veeam for data protection, with pure for FlashRay C or flash blade with safe mode, you know, which provides that imutability provides that customer with a safeguarded copy against bad actors externally to their organization, or was jointly developed with a customer to prevent the risk of bad actors inside of the organization. Um, city of new Orleans is one of the customer references that's up on, you know, the pure storage website, just a, a great, um, you know, story in terms of the city's ability to defend against ransomware attack, continue, you know, with continuity of essential services, police, ambulances, fire departments, um, all on the combination of pure and deem. And so, you know, a good, you know, example to pull that thread all the way through in terms of what the value proposition is. And then what's the experience for the customer when they are find themselves on the other side of that event. >>What's the nature of the partnership, um, with, with Veeam, obviously there's a go to market, um, are there, you know, solutions that you guys are doing together, engineering work that you're doing together? Can you explain that? Yeah, >>You bet. I mean, you know, these are two of, uh, I think high profile adjacencies in the data center, you've got your primary storage and then secondary tertiary, and you've got your data protection use case and workload. Um, with Veeam, we've got dedicated engineering to the Veeam partnership on the pure side, as a development partner for the us API, um, you know, is a, is a key piece we're integrated into what the support experience is like for the customer. And really trying, starting to challenge ourselves now with some of the leadership changes that beam's taken on and the opportunity to sit down and, and spend some time, you know, with the non and, and John, and really say, Hey, like we're at the core here, we've got an opportunity. Let's, let's open up some strategic doors and see what could be next. >>Well, Veeam Ising, there's no question it's kind of early Veeam was the wild west that's right. Course big parties are still, you know, the reputational, but, but as you think about these joint engineering and joint go to market and you talk to, to joint customers, where do you see sort of the future? I mean, I, I, you know, the ransomware stuff, obviously the pandemic was impossible to predict. I, I shouldn't say that a lot of people did predict it, but now that we see it, but now that you have some visibility on these permanent changes that are affecting CSO, buying strategies, data protection, storage, buying strategies, how do you see the future of this relationship? >>Yeah, look, I think, I think the, um, at the core we do what we do and we're focused on continuing to innovate and do it with excellence in everything that we do. Um, we measure ourselves rigorously against a net promoter score. It's a certified net promoter score. We're at 85.2 top 1% of all B2B. So >>Head of V even >>At the core, >>Barely at the >>Core it's, it's about that customer experience and customer satisfaction. Um, and, and so that's a, maybe a, a different way of saying we trust that our partners do what they do with excellence. And in the case of Veeam, you know, partnering around the data protection, use case and workload, looking at how that's evolving into holistic data management and hybrid cloud environment. Um, we see rich opportunity for us to continue the partnerships, strengthen it, learn and listen from our customers and our partners. And, uh, and maybe challenge ourselves to, to do some things a little differently uniquely along the way I talked >>To them. Oh, good. >>Yeah, no, yeah. You, you mentioned, uh, you mentioned something at the outset that lends a lot of cred, credibility to the pure story anywhere you seek to play. Um, you mentioned that he, uh, you know, founded in 2009 product shipping in 2012. Um, I remember that Dave's not old enough to remember that period of time <laugh>, but, uh, if you remember, um, violent memory was the king, they, they were, they were the ones to be. Yep. And you guys were quietly toiling in a bit of obscurity and people were asking, well, come on, come on, come on, come on, give us something, but you didn't until you were ready. So I've seen that methodical approach in every, in every step of the way as you've transformed from being a product into solution focus and partnership focus. Um, so what does that look like? Moving forward? You, you mentioned kind of getting ahead of the game in terms of all backups and recovery, uh, volumes being on flash. What does that addressable market look like to you guys in the future? How, how are you looking at that? Yeah. Is this just the beginning of a new thing that's gonna develop over time? >>Yeah, I think, I think it's a, is a great question. It's an insightful question. It's also a great way for me to plug accelerate in, in 20, you know, 20 days or so. Um, it's a great backdrop for pure to make some announcements in terms of what's next and, and, you know, and when we're ready to make them, you know, it's a good example. Um, but, but in direct answer to your question, you know, without a doubt, you know, the adjacencies between data protection, primary storage, secondary storage, the blurring that's happening within that, you know, based on the ransomware threats, based on, you know, other environments around cloud and, and how customers have learned from cloud experiences early on and applying those learnings, not just to demanding simplicity in their solutions, but demanding the ability for, you know, kind of the storage is code and, and to have that cloud operating model across everything that they do. And so, you know, I think those are at the core, some of the things that we think about in terms of what's next and, and, uh, and to do it with partners like beam at the forefront, as well as the voice of our customer at >>The forefront. And that's why I wanted to ask you that's great setup. Thank you, David. Um, so the port works acquisition was really interesting. We're at, um, in Valencia Spain, the cube is, uh, our, our colleagues are over there. Unfortunately, John furier couldn't make the trip, the vid hit him. Uh, but one of the conversations, the topic of conversations over there is, you know, shift left with the solar winds, hack the sensitivity around the software supply chain. We certainly talked about it last week at red hat summit. I haven't heard a lot about DevOps here, but it's sort of intrinsic that, that whole shift left component, that idea of not bolting on data protection at the tail end, actually shifting left means doing it in the development cycle, not throwing it over the fence, you know, to, to the operations people. What's that conversation like subsequent to the port works acquisition, which was very interesting. A small lever can go a long way. Can you give us the update there? >>Yeah. And first and foremost, I hope John's okay. Right? >>He is. He's doing well. Good, Mr. John, >>We do. And so, you know, I think the, um, the, the future of applications is really on center stage when you put port works into the conversation mm-hmm <affirmative>. And so as companies move like no, one's gonna develop applications today without a container strategy right related to that. And that's gonna allow for the applications to move and data gravity to really play a bigger role and pure feels confident in our ability to play a big role in that. And as those applications mature up the containerized curve, they're definitely gonna have data protection, data management, other fundamental things built into it in that shift left context that we're gonna be prepared to take advantage of based on the assets. We have >>The two hardcore engineering cultures, uh, that, that have momentum, uh, pure and, and Veeam. Michael. It's great to see you again. Thanks so much for coming to the cube. >>Uh, it's always a pleasure to be with you gentlemen, and, uh, great to meet you for the first time. Good to meet you, Michael. Look forward to seeing you the next time and, and thanks again. >>All right. You bet. All right, keep it right there, everybody. Thanks for watching. This is the Cube's coverage of vem on 2022. We're at the area in Las Vegas, and we'll be right back right after the short break.

Published Date : May 17 2022

SUMMARY :

Good to see you again. uh, you know, just feeling the energy from the room and, uh, just great to come in here and see the cube all lit And, and one of the last ones that we did before, you know, the pandemic for all of us, We'll see you there. And you know, it's, it's a success story in terms of And then there's also, you know, You know, Michael, I, I, I want to ask you about your shared vision with, with Veeam. He was in pants. this complexity and you know, we'll get the, the function through the ecosystem partnership. is at the core of what Veeam and pure, you know, has really fused us together. Dig, dig it into it. for backup and recovery convincing people that that makes sense or, of flash as the data center, primary storage solution, but now, There, you guys like to be first at a lot of things, you know, gives you bragging rights, but it also gives you market momentum. We do provide a raise in the wild, you know, over 10,000 customer, you got really much tighter service level agreements and requirements now, And for us, it's, you know, simply put the combination of Veeam for data protection, taken on and the opportunity to sit down and, and spend some time, you know, with the non and, I mean, I, I, you know, the ransomware stuff, Yeah, look, I think, I think the, um, at the core we do what we do And in the case of Veeam, you know, partnering around the data protection, market look like to you guys in the future? the blurring that's happening within that, you know, based on the ransomware threats, the topic of conversations over there is, you know, shift left with the solar winds, hack the sensitivity around He is. And so, you know, I think the, um, the, It's great to see you again. Uh, it's always a pleasure to be with you gentlemen, and, uh, great to meet you for the first time. This is the Cube's coverage of vem

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

MichaelPERSON

0.99+

2009DATE

0.99+

2012DATE

0.99+

Michael SotnickPERSON

0.99+

JohnPERSON

0.99+

Dave NicholsonPERSON

0.99+

DavePERSON

0.99+

VeeamORGANIZATION

0.99+

Las VegasLOCATION

0.99+

20 daysQUANTITY

0.99+

LeePERSON

0.99+

twoQUANTITY

0.99+

AustinLOCATION

0.99+

ScottPERSON

0.99+

Michael SonicPERSON

0.99+

John furierPERSON

0.99+

JuneDATE

0.99+

21 daysQUANTITY

0.99+

north AmericaLOCATION

0.99+

EuropeLOCATION

0.99+

20QUANTITY

0.99+

Valencia SpainLOCATION

0.99+

2019DATE

0.99+

firstQUANTITY

0.99+

Scott DeasonPERSON

0.99+

last weekDATE

0.99+

new OrleansLOCATION

0.99+

bothQUANTITY

0.99+

1%QUANTITY

0.98+

todayDATE

0.98+

oneQUANTITY

0.98+

over 10,000 customerQUANTITY

0.98+

85.2QUANTITY

0.97+

first timeQUANTITY

0.97+

2022DATE

0.97+

both organizationsQUANTITY

0.97+

OracleORGANIZATION

0.96+

single productQUANTITY

0.95+

tens of thousandsQUANTITY

0.94+

single paneQUANTITY

0.94+

VeeamPERSON

0.94+

first oneQUANTITY

0.93+

Charlie G CarloPERSON

0.89+

ariaORGANIZATION

0.89+

APJORGANIZATION

0.89+

red hatEVENT

0.82+

BECORGANIZATION

0.82+

pandemicEVENT

0.8+

one stepQUANTITY

0.74+

DevOpsTITLE

0.73+

CubeORGANIZATION

0.72+

EuropeanLOCATION

0.67+

VESORGANIZATION

0.63+

Veeam IsingPERSON

0.61+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Kristen MartinPERSON

0.99+

Rob HoffPERSON

0.99+

Cheryl KnightPERSON

0.99+

Stephanie ChanPERSON

0.99+

Alex MyersonPERSON

0.99+

DavePERSON

0.99+

ZhamakPERSON

0.99+

oneQUANTITY

0.99+

Dave VellantePERSON

0.99+

AWSORGANIZATION

0.99+

10 lakesQUANTITY

0.99+

Sanji MohanPERSON

0.99+

MicrosoftORGANIZATION

0.99+

Paul AndrewPERSON

0.99+

twoQUANTITY

0.99+

NetflixORGANIZATION

0.99+

Zhamak DehghaniPERSON

0.99+

Data Mesh: Delivering Data-Driven Value at ScaleTITLE

0.99+

BostonLOCATION

0.99+

OracleORGANIZATION

0.99+

14 plus yearsQUANTITY

0.99+

Palo AltoLOCATION

0.99+

two pointsQUANTITY

0.99+

siliconangle.comOTHER

0.99+

second layerQUANTITY

0.99+

2016DATE

0.99+

LinkedInORGANIZATION

0.99+

todayDATE

0.99+

SnowflakeORGANIZATION

0.99+

hundreds of lakesQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

david.vellante@siliconangle.comOTHER

0.99+

theCUBE StudiosORGANIZATION

0.98+

SQLTITLE

0.98+

one unitQUANTITY

0.98+

firstQUANTITY

0.98+

100 levelQUANTITY

0.98+

third pointQUANTITY

0.98+

DatabricksORGANIZATION

0.98+

EuropeLOCATION

0.98+

three bucketsQUANTITY

0.98+

ETRORGANIZATION

0.98+

DevStackTITLE

0.97+

OneQUANTITY

0.97+

wikibon.comOTHER

0.97+

bothQUANTITY

0.97+

ThoughtworksORGANIZATION

0.96+

one setQUANTITY

0.96+

one streamQUANTITY

0.96+

IntuitORGANIZATION

0.95+

one wayQUANTITY

0.93+

two worldsQUANTITY

0.93+

HelloFreshORGANIZATION

0.93+

this weekDATE

0.93+

last nightDATE

0.91+

fourth oneQUANTITY

0.91+

SnowflakeTITLE

0.91+

two different modelsQUANTITY

0.91+

ML AnalyticsTITLE

0.91+

Breaking AnalysisTITLE

0.87+

two worldsQUANTITY

0.84+

PUBLIC SECTOR Speed to Insight


 

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition at fraud waste and abuse per the government accountability office is broad as an attempt to obtain something about a value through unwelcomed misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal, uh, benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically for the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external perpetrators, again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, uh, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, those are broad stroke areas. What are the actual use cases that, um, agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use great, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment integrity. So fraud has its, um, uh, underpinnings in quite a few different government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that Chev is going to talk about later it's how do I look at more, that real, that streaming information? >>How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in essence, looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in, um, some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, uh, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific constituent, are there areas where we're seeing, uh, um, other aspects of a fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at, uh, simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. >>Um, and again, that really lends itself to a new opportunities. And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, uh, doing these baskets. >>Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection using, uh, Cloudera underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our data sets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so then comes clutter's platform and this reference architecture that needs to before you, so, uh, let's start on the left-hand side of this reference architecture with the collect phase. >>So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different porosities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with clutter data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL string builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or exploratory data analysis and visualization, uh, can all be enabled through clever visual patient technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutter's technology, right? And so, uh, the IRS is one of, uh, clutters customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually, uh, recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around, uh, how quad areas working in the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining Chevy and I today, we greatly appreciate your time and look forward to future >>Conversation..

Published Date : Aug 5 2021

SUMMARY :

So as we look at fraud, So as we also look at a So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, looking at, uh, deep learning type models around, uh, you know, So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, And on that, I'm going to turn it over to Shev to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher It could be in the data center or even on edge devices, and this data needs to be collected so uh, you know, downstream systems for further process. So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL string builder, historically collected data set, uh, to do this, we can use a combination of supervised And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, um, to give you some insights on

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Cindy MikeyPERSON

0.99+

NvidiaORGANIZATION

0.99+

MollyPERSON

0.99+

2017DATE

0.99+

patrickPERSON

0.99+

NVIDIAORGANIZATION

0.99+

PWCORGANIZATION

0.99+

CindyPERSON

0.99+

Patrick OsbournePERSON

0.99+

JoePERSON

0.99+

PeterPERSON

0.99+

NIFAORGANIZATION

0.99+

TodayDATE

0.99+

todayDATE

0.99+

HPORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

over $65 billionQUANTITY

0.99+

over $51 billionQUANTITY

0.99+

last yearDATE

0.99+

ShevPERSON

0.99+

57 billionQUANTITY

0.99+

IRSORGANIZATION

0.99+

ShevaPERSON

0.98+

JasonPERSON

0.98+

firstQUANTITY

0.98+

bothQUANTITY

0.97+

oneQUANTITY

0.97+

HPEORGANIZATION

0.97+

IntelORGANIZATION

0.97+

AvroPERSON

0.96+

saltyPERSON

0.95+

eight XQUANTITY

0.95+

ApacheORGANIZATION

0.94+

single technologyQUANTITY

0.92+

eight timesQUANTITY

0.92+

91 billionQUANTITY

0.91+

zero changesQUANTITY

0.9+

next yearDATE

0.9+

calderaORGANIZATION

0.9+

ChevORGANIZATION

0.87+

RichmondLOCATION

0.85+

three prongQUANTITY

0.85+

$148 billionQUANTITY

0.84+

single common formatQUANTITY

0.83+

SQLTITLE

0.82+

KafkaPERSON

0.82+

ChevyPERSON

0.8+

HP LabsORGANIZATION

0.8+

one individualQUANTITY

0.8+

PatrickPERSON

0.78+

Monte CarloTITLE

0.76+

halfQUANTITY

0.75+

over halfQUANTITY

0.68+

17QUANTITY

0.65+

secondQUANTITY

0.65+

HBaseTITLE

0.56+

elementsQUANTITY

0.53+

Apache FlinkORGANIZATION

0.53+

cloudera.comOTHER

0.5+

coffinPERSON

0.5+

SparkTITLE

0.49+

LakeCOMMERCIAL_ITEM

0.48+

HPETITLE

0.47+

mini fiveCOMMERCIAL_ITEM

0.45+

GreenORGANIZATION

0.37+

PUBLIC SECTOR V1 | CLOUDERA


 

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud. Isn't an attempt to obtain something about value through unwelcome misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external, uh, perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically about 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from permit out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's a broad stroke areas. What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're gonna use focused specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of, um, unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has it it's, um, uh, underpinnings inquiry, like you different on government agencies and difficult, different analytical methods, and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models. We're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that shad is going to talk about later is how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the, the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like a constituent, are there areas where we're seeing, uh, >>Um, other >>Aspects of, of fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, uh, agent-based modeling techniques, where we're looking at simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to chef to talk about, uh, the reference architecture for, uh, doing these buckets. >>Thanks, Cindy. Um, so I'm gonna walk you through an example, reference architecture for fraud detection using, uh, Cloudera's underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. We need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, thinking, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on a patch NIFA in mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geolocation that's in that transaction data can be enriched with previously known locations of that very same individual. And all of that enriched data can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stricted to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So coffee is going to pretty much provide you with, uh, extremely fast resilient and fault tolerance storage. And it's also gonna give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone, uh, allowed that, you know, 17. So you can store that data in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL stream builder, which enables us to write, uh, streaming SQL jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks. And these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutters technology, right? And so, uh, the IRS is one of, uh, clutter's customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their spark based analytics and their machine learning, uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection, uh, looking at neural network analysis, time series information. So next steps we'd love to have additional conversation with you. You can also find on some additional information around, I have caught areas working in the, the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us Sheva and I today. We greatly appreciate your time and look forward to future progress. >>Good day, everyone. Thank you for joining me. I'm Sydney. Mike joined by Rick Taylor of Cloudera. Uh, we're here to talk about predictive maintenance for the public sector and how to increase assets, service, reliability on today's agenda. We'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on, um, what type of data, the analytical methods that we're typically seeing used, um, the associated, uh, Brooke, we'll go over a case study as well as a reference architecture. So by basic definition, uh, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets of actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. >>McKinsey has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about our corrective maintenance, and that's when we're performing maintenance on an asset, um, after the equipment fails. But the challenges with that is we end up with unplanned. We end up with disruptions in our schedules, um, as well as reduced quality, um, around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. Um, the challenges with that is we're typically doing it regardless of the actual condition of the asset, um, which has resulted in unnecessary downtime and expense. Um, and specifically we're really now focused on pre uh, condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Um, within that we've seen organizations, um, and again, source from McKenzie have a 50% reduction in downtime, as well as an overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy, um, several years ago, >>Um, they've really >>Looked at what does predictive maintenance mean to the public sector? What is the benefit, uh, looking at increasing return on investment of assets, reducing, uh, you know, reduction in downtime, um, as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure. Um, and then the movement towards, uh, preventative, which is based upon a set schedule or looking at predictive where we're monitoring real-time conditions. Um, and most importantly is now actually leveraging IOT and data and analytics to further reduce those overall downtimes. And there's a research report by the, uh, department of energy that goes into more specifics, um, on the opportunity within the public sector. So, Rick, let's talk a little bit about what are some of the challenges, uh, regarding data, uh, regarding predictive maintenance. >>Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in, in these silos of information. Uh, couple that with huge increases in data volume data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and insights and, and that in turn then fuels, uh, machine learning and, um, and, and what we call artificial intelligence, which enables predictive maintenance. Next slide. So >>Let's look specifically at, you know, the, the types of use cases and I'm going to Rick and I are going to focus on those use cases, where do we see predictive maintenance coming into the procurement facility, supply chain, operations and logistics. Um, we've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about, uh, using, uh, information, whether it be on a, um, a connected asset or a vehicle doing monitoring, uh, to also leveraging predictive maintenance on how do we bring about, uh, looking at data from connected warehouses facilities and buildings all bring on an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at re uh, looking at cost efficiency, as well as looking at risk and safety and the types of data, um, you know, that Rick mentioned around, you know, the new types of information, some of those data elements that we typically have seen is looking at failure history. >>So when has that an asset or a machine or a component within a machine failed in the past? Uh, we've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets, uh, looking at when we've replaced certain components to looking at, um, how are we actually leveraging the assets? What were the operating conditions, uh, um, pulling off data from a sensor on that asset? Um, also looking at the, um, the features of an asset, whether it's, you know, engine size it's make and model, um, where's the asset located on to also looking at who's operated the asset, uh, you know, whether it be their certifications, what's their experience, um, how are they leveraging the assets and then also bringing in together, um, some of the, the pattern analysis that we've seen. So what are the operating limits? Um, are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So, Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. Sure. >>So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So, as we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, um, uh, or temperature and humidity, for example, all this stuff is then combined together, uh, and then use to develop machine learning models that better inform, uh, predictive maintenance, because we'll do need to keep, uh, to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here's some examples of private sector, uh, maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running cloud era on Azure to capture secure and correlate sensor data collected from equipment within the airport, the people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. >>The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning, help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies and transport systems. These all improve for another example is Navistar Navistar, leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owner. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called on command. >>The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks, speed, acceleration, cooling temperature, and break where this data is then correlated with other Navistar and third-party data sources, including weather geo location, vehicle usage, traffic warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the, the benefits Navistar's helped fleet owners reduce maintenance by more than 30%. The same platform is also used to help school buses run safely. And on time, for example, one school district with 110 buses that travel over a million miles annually reduce the number of PTOs needed year over year, thanks to predictive insights delivered by this platform. >>So I'd like to take a moment and walk through the data. Life cycle is depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform. Whereas combined with data from existing systems of record to provide additional insights, and this platform supports multiple analytic functions working together on the same data while maintaining strict security governance and control measures once processed the data is used to train machine learning models, which are then deployed into production, monitored, and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence, analytics, and dashboards. And in fact, this data lifecycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. >>And the benefits they've discovered include less unscheduled maintenance and a reduction in mean man hours to repair increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically cost more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle. We've been discussing Cloudera data flow, the data ingest data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes an integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the cloud era data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you, Rick, >>Thank you. And I hope that, uh, Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together, um, data sources that maybe you're having challenges with today. Uh, bringing that, uh, more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually, uh, optimize maintenance and reduce costs within the, uh, each of your agencies, uh, to learn a little bit more about Cloudera, um, and our, what we're doing from a predictive maintenance please, uh, business@cloudera.com solutions slash public sector. And we look forward to scheduling a meeting with you, and on that, we appreciate your time today and thank you very much.

Published Date : Aug 4 2021

SUMMARY :

So as we look at fraud, Um, the types of fraud that we see is specifically around cyber crime, So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, the breadth and the opportunity really comes about when you can integrate and Some of the techniques that we use and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, I'm going to turn it over to chef to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. It could be in the data center or even on edge devices, and this data needs to be collected At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL stream builder, obtain the accuracy of the performance, the scores that we want, Um, and one of the neat things with the IRS the analysis, the information that Sheva and I have provided, um, to give you some insights on the analytical methods that we're typically seeing used, um, the associated, doing it regardless of the actual condition of the asset, um, uh, you know, reduction in downtime, um, as well as overall maintenance costs. And so the challenge is to collect all these assets together and begin the types of data, um, you know, that Rick mentioned around, you know, the new types on to also looking at who's operated the asset, uh, you know, whether it be their certifications, So we want, what we want to do is combine that information with So to help fleet So the platform then uses machine learning and advanced analytics to automatically detect problems So data ingest from the edge may include feeds from the factory floor or things like improved aircraft availability, and the ability to avoid cascading And I hope that, uh, Rick and I provided you some insights on how predictive

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Cindy MikeyPERSON

0.99+

RickPERSON

0.99+

Rick TaylorPERSON

0.99+

MollyPERSON

0.99+

NvidiaORGANIZATION

0.99+

2017DATE

0.99+

PWCORGANIZATION

0.99+

40%QUANTITY

0.99+

110 busesQUANTITY

0.99+

EuropeLOCATION

0.99+

50%QUANTITY

0.99+

CindyPERSON

0.99+

MikePERSON

0.99+

JoePERSON

0.99+

ClouderaORGANIZATION

0.99+

TodayDATE

0.99+

todayDATE

0.99+

NavistarORGANIZATION

0.99+

FirstQUANTITY

0.99+

twoQUANTITY

0.99+

more than 30%QUANTITY

0.99+

over $51 billionQUANTITY

0.99+

NIFAORGANIZATION

0.99+

over $65 billionQUANTITY

0.99+

IRSORGANIZATION

0.99+

over a million milesQUANTITY

0.99+

firstQUANTITY

0.98+

oneQUANTITY

0.98+

JasonPERSON

0.98+

AzureTITLE

0.98+

BrookePERSON

0.98+

AvroPERSON

0.98+

one school districtQUANTITY

0.98+

SQLTITLE

0.97+

bothQUANTITY

0.97+

$148 billionQUANTITY

0.97+

ShevaPERSON

0.97+

three typesQUANTITY

0.96+

eachQUANTITY

0.95+

McKenzieORGANIZATION

0.95+

more than 375,000 connected vehiclesQUANTITY

0.95+

ClouderaTITLE

0.95+

about 57 billionQUANTITY

0.95+

saltyPERSON

0.94+

several years agoDATE

0.94+

single technologyQUANTITY

0.94+

eight timesQUANTITY

0.93+

91 billionQUANTITY

0.93+

eight XQUANTITY

0.92+

business@cloudera.comOTHER

0.92+

McKinseyORGANIZATION

0.92+

zero changesQUANTITY

0.92+

Monte CarloTITLE

0.92+

calderaORGANIZATION

0.91+

coupleQUANTITY

0.9+

over 70 sensor data feedsQUANTITY

0.88+

RichmondLOCATION

0.84+

Navistar NavistarORGANIZATION

0.82+

single viewQUANTITY

0.81+

17OTHER

0.8+

single common formatQUANTITY

0.8+

thousands of data pointsQUANTITY

0.79+

SydneyLOCATION

0.78+

Cindy Maike & Nasheb Ismaily | Cloudera


 

>>Hi, this is Cindy Mikey, vice president of industry solutions at Cloudera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and Shev we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud is an attempt to obtain something about a value through unwelcomed. Misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external are perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's broad stroke areas? What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has its, um, uh, underpinnings in quite a few different on government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to lock at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case, that shadow is going to talk about later it's how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So quite a different variety of data and the, the breadth, um, and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a sense, looking at a more extensive on data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be, um, investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like, uh, constituent, are there areas where we're seeing, uh, um, other aspects of, of fraud potentially being, uh, occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at simulation, Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, the public sector. >>Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing these buckets. >>Sure. Yeah. Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection, using Cloudera as underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or anomalous behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create from normal behavior profiles and these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jace on or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to know downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin is going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer API APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transform data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on from machine learning, right? So the next step in the architecture is to leverage, uh, clutter SQL stream builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real-time. Uh, I'll, you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage Q2, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed, and we've our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, even deep learning techniques with neural networks. Uh, and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the X one, uh, scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. Uh, and this entire pipeline is powered by clutters technology. Uh, Cindy, next slide please. >>Right. And so, uh, the IRS is one of, uh, clutter as customers. That's leveraging our platform today and implementing a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, uh, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around how called areas working in federal government, by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us today. Uh, we greatly appreciate your time and look forward to future conversations. Thank you.

Published Date : Jul 22 2021

SUMMARY :

So as we look at fraud and across So as we also look at a report So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, Um, and we can also look at more, uh, advanced data sources So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, um, And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing Um, and you know, before I get into the technical details, uh, I want to talk about how this It could be in the data center or even on edge devices, and this data needs to be collected so At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage, uh, clutter SQL stream builder, obtain the accuracy of the performance, the X one, uh, scores that we want, And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, uh, to give you some insights

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Cindy MikeyPERSON

0.99+

NvidiaORGANIZATION

0.99+

MollyPERSON

0.99+

Nasheb IsmailyPERSON

0.99+

PWCORGANIZATION

0.99+

JoePERSON

0.99+

CindyPERSON

0.99+

ClouderaORGANIZATION

0.99+

2017DATE

0.99+

Cindy MaikePERSON

0.99+

TodayDATE

0.99+

over $65 billionQUANTITY

0.99+

todayDATE

0.99+

NIFAORGANIZATION

0.99+

over $51 billionQUANTITY

0.99+

57 billionQUANTITY

0.99+

saltyPERSON

0.99+

singleQUANTITY

0.98+

firstQUANTITY

0.98+

JasonPERSON

0.98+

oneQUANTITY

0.97+

91 billionQUANTITY

0.97+

IRSORGANIZATION

0.96+

ShevPERSON

0.95+

bothQUANTITY

0.95+

AvroPERSON

0.94+

ApacheORGANIZATION

0.93+

eightQUANTITY

0.93+

$148 billionQUANTITY

0.92+

zero changesQUANTITY

0.91+

RichmondLOCATION

0.91+

ShevaPERSON

0.88+

single technologyQUANTITY

0.86+

ClouderaTITLE

0.85+

Monte CarloTITLE

0.84+

eight timesQUANTITY

0.83+

cloudera.comOTHER

0.79+

KafkaTITLE

0.77+

secondQUANTITY

0.77+

one individualQUANTITY

0.76+

coffinPERSON

0.72+

KafkaPERSON

0.69+

JaceTITLE

0.69+

SQLTITLE

0.68+

17QUANTITY

0.68+

over halfQUANTITY

0.63+

ChevyORGANIZATION

0.57+

elementsQUANTITY

0.56+

halfQUANTITY

0.56+

mini fiveCOMMERCIAL_ITEM

0.54+

Apache FlinkORGANIZATION

0.52+

HBaseTITLE

0.45+

F1 Racing at the Edge of Real-Time Data: Omer Asad, HPE & Matt Cadieux, Red Bull Racing


 

>>Edge computing is predict, projected to be a multi-trillion dollar business. You know, it's hard to really pinpoint the size of this market. Let alone fathom the potential of bringing software, compute, storage, AI, and automation to the edge and connecting all that to clouds and on-prem systems. But what, you know, what is the edge? Is it factories? Is it oil rigs, airplanes, windmills, shipping containers, buildings, homes, race cars. Well, yes and so much more. And what about the data for decades? We've talked about the data explosion. I mean, it's mind boggling, but guess what, we're gonna look back in 10 years and laugh. What we thought was a lot of data in 2020, perhaps the best way to think about edge is not as a place, but when is the most logical opportunity to process the data and maybe it's the first opportunity to do so where it can be decrypted and analyzed at very low latencies that that defines the edge. And so by locating compute as close as possible to the sources of data, to reduce latency and maximize your ability to get insights and return them to users quickly, maybe that's where the value lies. Hello everyone. And welcome to this cube conversation. My name is Dave Vellante and with me to noodle on these topics is Omar Assad, VP, and GM of primary storage and data management services at HPE. Hello, Omer. Welcome to the program. >>Hey Steve. Thank you so much. Pleasure to be here. >>Yeah. Great to see you again. So how do you see the edge in the broader market shaping up? >>Uh, David? I think that's a super important, important question. I think your ideas are quite aligned with how we think about it. Uh, I personally think, you know, as enterprises are accelerating their sort of digitization and asset collection and data collection, uh, they're typically, especially in a distributed enterprise, they're trying to get to their customers. They're trying to minimize the latency to their customers. So especially if you look across industries manufacturing, which is distributed factories all over the place, they are going through a lot of factory transformations where they're digitizing their factories. That means a lot more data is being now being generated within their factories. A lot of robot automation is going on that requires a lot of compute power to go out to those particular factories, which is going to generate their data out there. We've got insurance companies, banks that are creating and interviewing and gathering more customers out at the edge for that. >>They need a lot more distributed processing out at the edge. What this is requiring is what we've seen is across analysts. A common consensus is that more than 50% of an enterprise is data, especially if they operate globally around the world is going to be generated out at the edge. What does that mean? More data is new data is generated at the edge, but needs to be stored. It needs to be processed data. What is not required needs to be thrown away or classified as not important. And then it needs to be moved for Dr. Purposes either to a central data center or just to another site. So overall in order to give the best possible experience for manufacturing, retail, uh, you know, especially in distributed enterprises, people are generating more and more data centric assets out at the edge. And that's what we see in the industry. >>Yeah. We're definitely aligned on that. There's some great points. And so now, okay. You think about all this diversity, what's the right architecture for these deploying multi-site deployments, robo edge. How do you look at that? >>Oh, excellent question. So now it's sort of, you know, obviously you want every customer that we talk to wants SimpliVity, uh, in, in, and, and, and, and no pun intended because SimpliVity is reasoned with a simplistic edge centric architecture, right? So because let's, let's take a few examples. You've got large global retailers, uh, they have hundreds of global retail stores around the world that is generating data that is producing data. Then you've got insurance companies, then you've got banks. So when you look at a distributed enterprise, how do you deploy in a very simple and easy to deploy manner, easy to lifecycle, easy to mobilize and easy to lifecycle equipment out at the edge. What are some of the challenges that these customers deal with these customers? You don't want to send a lot of ID staff out there because that adds costs. You don't want to have islands of data and islands of storage and promote sites, because that adds a lot of States outside of the data center that needs to be protected. >>And then last but not the least, how do you push lifecycle based applications, new applications out at the edge in a very simple to deploy better. And how do you protect all this data at the edge? So the right architecture in my opinion, needs to be extremely simple to deploy. So storage, compute and networking, uh, out towards the edge in a hyperconverged environment. So that's, we agree upon that. It's a very simple to deploy model, but then comes, how do you deploy applications on top of that? How do you manage these applications on top of that? How do you back up these applications back towards the data center, all of this keeping in mind that it has to be as zero touch as possible. We at HBS believe that it needs to be extremely simple. Just give me two cables, a network cable, a power cable, tied it up, connected to the network, push it state from the data center and back up at state from the ed back into the data center. Extremely simple. >>It's gotta be simple because you've got so many challenges. You've got physics that you have to deal your latency to deal with. You got RPO and RTO. What happens if something goes wrong, you've gotta be able to recover quickly. So, so that's great. Thank you for that. Now you guys have hard news. W what is new from HPE in this space >>From a, from a, from a, from a deployment perspective, you know, HPE SimpliVity is just gaining like it's exploding, like crazy, especially as distributed enterprises adopt it as it's standardized edge architecture, right? It's an HCI box has got stories, computer networking, all in one. But now what we have done is not only you can deploy applications all from your standard V-Center interface, from a data center, what have you have now added is the ability to backup to the cloud, right? From the edge. You can also back up all the way back to your core data center. All of the backup policies are fully automated and implemented in the, in the distributed file system. That is the heart and soul of, of the SimpliVity installation. In addition to that, the customers now do not have to buy any third-party software into backup is fully integrated in the architecture and it's van efficient. >>In addition to that, now you can backup straight to the client. You can backup to a central, uh, high-end backup repository, which is in your data center. And last but not least, we have a lot of customers that are pushing the limit in their application transformation. So not only do we previously were, were one-on-one them leaving VMware deployments out at the edge sites. Now revolver also added both stateful and stateless container orchestration, as well as data protection capabilities for containerized applications out at the edge. So we have a lot, we have a lot of customers that are now deploying containers, rapid manufacturing containers to process data out at remote sites. And that allows us to not only protect those stateful applications, but back them up, back into the central data center. >>I saw in that chart, it was a light on no egress fees. That's a pain point for a lot of CEOs that I talked to. They grit their teeth at those entities. So, so you can't comment on that or >>Excellent, excellent question. I'm so glad you brought that up and sort of at that point, uh, uh, pick that up. So, uh, along with SimpliVity, you know, we have the whole green Lake as a service offering as well. Right? So what that means, Dave, is that we can literally provide our customers edge as a service. And when you compliment that with, with Aruba wired wireless infrastructure, that goes at the edge, the hyperconverged infrastructure, as part of SimpliVity, that goes at the edge, you know, one of the things that was missing with cloud backups is the every time you backup to the cloud, which is a great thing, by the way, anytime you restore from the cloud, there is that breastfeed, right? So as a result of that, as part of the GreenLake offering, we have cloud backup service natively now offered as part of HPE, which is included in your HPE SimpliVity edge as a service offering. So now not only can you backup into the cloud from your edge sites, but you can also restore back without any egress fees from HBS data protection service. Either you can restore it back onto your data center, you can restore it back towards the edge site and because the infrastructure is so easy to deploy centrally lifecycle manage, it's very mobile. So if you want to deploy and recover to a different site, you could also do that. >>Nice. Hey, uh, can you, Omar, can you double click a little bit on some of the use cases that customers are choosing SimpliVity for, particularly at the edge, and maybe talk about why they're choosing HPE? >>What are the major use cases that we see? Dave is obviously, uh, easy to deploy and easy to manage in a standardized form factor, right? A lot of these customers, like for example, we have large retailer across the us with hundreds of stores across us. Right now you cannot send service staff to each of these stores. These data centers are their data center is essentially just a closet for these guys, right? So now how do you have a standardized deployment? So standardized deployment from the data center, which you can literally push out and you can connect a network cable and a power cable, and you're up and running, and then automated backup elimination of backup and state and BR from the edge sites and into the data center. So that's one of the big use cases to rapidly deploy new stores, bring them up in a standardized configuration, both from a hardware and a software perspective, and the ability to backup and recover that instantly. >>That's one large use case. The second use case that we see actually refers to a comment that you made in your opener. Dave was where a lot of these customers are generating a lot of the data at the edge. This is robotics automation that is going to up in manufacturing sites. These is racing teams that are out at the edge of doing post-processing of their cars data. Uh, at the same time, there is disaster recovery use cases where you have, uh, you know, campsites and local, uh, you know, uh, agencies that go out there for humanity's benefit. And they move from one site to the other. It's a very, very mobile architecture that they need. So those, those are just a few cases where we were deployed. There was a lot of data collection, and there's a lot of mobility involved in these environments. So you need to be quick to set up quick, to up quick, to recover, and essentially you're up to your next, next move. >>You seem pretty pumped up about this, uh, this new innovation and why not. >>It is, it is, uh, you know, especially because, you know, it is, it has been taught through with edge in mind and edge has to be mobile. It has to be simple. And especially as, you know, we have lived through this pandemic, which, which I hope we see the tail end of it in at least 2021, or at least 2022. They, you know, one of the most common use cases that we saw, and this was an accidental discovery. A lot of the retail sites could not go out to service their stores because, you know, mobility is limited in these, in these strange times that we live in. So from a central center, you're able to deploy applications, you're able to recover applications. And, and a lot of our customers said, Hey, I don't have enough space in my data center to back up. Do you have another option? So then we rolled out this update release to SimpliVity verse from the edge site. You can now directly back up to our backup service, which is offered on a consumption basis to the customers, and they can recover that anywhere they want. >>Fantastic Omer, thanks so much for coming on the program today. >>It's a pleasure, Dave. Thank you. >>All right. Awesome to see you. Now, let's hear from red bull racing and HPE customer, that's actually using SimpliVity at the edge. Countdown really begins when the checkered flag drops on a Sunday. It's always about this race to manufacture >>The next designs to make it more adapt to the next circuit to run those. Of course, if we can't manufacture the next component in time, all that will be wasted. >>Okay. We're back with Matt kudu, who is the CIO of red bull racing? Matt, it's good to see you again. >>Great to say, >>Hey, we're going to dig into a real-world example of using data at the edge and in near real time to gain insights that really lead to competitive advantage. But, but first Matt, tell us a little bit about red bull racing and your role there. >>Sure. So I'm the CIO at red bull racing and that red bull race. And we're based in Milton Keynes in the UK. And the main job job for us is to design a race car, to manufacture the race car, and then to race it around the world. So as CIO, we need to develop the ITT group needs to develop the applications is the design, manufacturing racing. We also need to supply all the underlying infrastructure and also manage security. So it's really interesting environment. That's all about speed. So this season we have 23 races and we need to tear the car apart and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 20 a movable deadlines, um, this big evolving prototype to manage with our car. Um, but we're also improving all of our tools and methods and software that we use to design and make and race the car. >>So we have a big can do attitude of the company around continuous improvement. And the expectations are that we continuously make the car faster. That we're, that we're winning races, that we improve our methods in the factory and our tools. And, um, so for, I take it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility, agility, and needs. So my job is, is really to make sure we have the right staff, the right partners, the right technical platforms. So we can live up to expectations >>That tear down and rebuild for 23 races. Is that because each track has its own unique signature that you have to tune to, or are there other factors involved there? >>Yeah, exactly. Every track has a different shape. Some have lots of strengths. Some have lots of curves and lots are in between. Um, the track surface is very different and the impact that has some tires, um, the temperature and the climate is very different. Some are hilly, some, a big curves that affect the dynamics of the power. So all that in order to win, you need to micromanage everything and optimize it for any given race track. >>Talk about some of the key drivers in your business and some of the key apps that give you a competitive advantage to help you win races. >>Yeah. So in our business, everything is all about speed. So the car obviously needs to be fast, but also all of our business operations needed to be fast. We need to be able to design a car and it's all done in the virtual world, but the, the virtual simulations and designs need to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulation is the algorithms and have all the underlying infrastructure that runs it quickly and reliably. Um, in manufacturing, um, we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running and helping us do that. And at the race track itself in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race, we have split seconds, literally seconds to alter our race strategy if an event happens. So if there's an accident, um, and the safety car comes out, or the weather changes, we revise our tactics and we're running Monte Carlo for example. And he is an experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors, all of that needs it. Um, so work at a very high level. >>It's interesting. I mean, as a lay person, historically we know when I think about technology and car racing, of course, I think about the mechanical aspects of a self-propelled vehicle, the electronics and the light, but not necessarily the data, but the data's always been there. Hasn't it? I mean, maybe in the form of like tribal knowledge, if somebody who knows the track and where the Hills are and experience and gut feel, but today you're digitizing it and you're, you're processing it and close to real time. >>It's amazing. I think exactly right. Yeah. The car's instrumented with sensors, we post-process at Virgin, um, video, um, image analysis, and we're looking at our car, our competitor's car. So there's a huge amount of, um, very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah. The data and the applications that can leverage it are really key. Um, and that's a critical success factor for us. >>So let's talk about your data center at the track, if you will. I mean, if I can call it that paint a picture for us, what does that look like? >>So we have to send, um, a lot of equipment to the track at the edge. Um, and even though we have really a great wide area network linked back to the factory and there's cloud resources, a lot of the trucks are very old. You don't have hardened infrastructure, don't have ducks that protect cabling, for example, and you could lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions, all that needs to be at the edge where the car operates. So historically we had three racks of equipment, like a safe infrastructure, um, and it was really hard to manage, um, to make changes. It was too flexible. Um, there were multiple panes of glass, um, and, um, and it was too slow. It didn't run her applications quickly. Um, it was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. >>So we, um, we'd, we'd introduced hyperconvergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said, there's a lot smarter way of operating. We can get rid of all the slow and flexible, expensive legacy and introduce hyperconvergence. And we saw really excellent benefits for doing that. Um, we saw a three X speed up for a lot of our applications. So I'm here where we're post-processing data, and we have to make decisions about race strategy. Time is of the essence in a three X reduction in processing time really matters. Um, we also, um, were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a hundred thousand dollars a year in freight costs by shipping less equipment, um, things like backup, um, mistakes happen. >>Sometimes the user makes a mistake. So for example, a race engineer could load the wrong data map into one of our simulations. And we could restore that VDI through SimpliVity backup at 90 seconds. And this makes sure it enables engineers to focus on the car to make better decisions without having downtime. And we sent them to, I take guys to every race they're managing 60 users, a really diverse environment, juggling a lot of balls and having a simple management platform like HPE SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >>Yeah. So you had the nice Petri dish and the factory. So it sounds like your, your goals, obviously your number one KPI is speed to help shave seconds time, but also costs just the simplicity of setting up the infrastructure. >>Yeah. It's speed. Speed, speed. So we want applications absolutely fly, you know, get to actionable results quicker, um, get answers from our simulations quicker. The other area that speed's really critical is, um, our applications are also evolving prototypes, and we're always, the models are getting bigger. The simulations are getting bigger and they need more and more resource and being able to spin up resource and provision things without being a bottleneck is a big challenge in SimpliVity. It gives us the means of doing that. >>So did you consider any other options or was it because you had the factory knowledge? It was HCI was, you know, very clearly the option. What did you look at? >>Yeah, so, um, we have over five years of experience in the factory and we eliminated all of our legacy, um, um, infrastructure five years ago. And the benefits I've described, um, at the track, we saw that in the factory, um, at the track we have a three-year operational life cycle for our equipment. When into 2017 was the last year we had legacy as we were building for 2018. It was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized time has even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >>Why, why SimpliVity? What why'd you choose HPE SimpliVity? >>Yeah. So when we first heard about hyperconverged way back in the, in the factory, um, we had, um, a legacy infrastructure, overly complicated, too slow, too inflexible, too expensive. And we stepped back and said, there has to be a smarter way of operating. We went out and challenged our technology partners. We learned about hyperconvergence within enough, the hype, um, was real or not. So we underwent some PLCs and benchmarking and, and the, the PLCs were really impressive. And, and all these, you know, speed and agility benefits, we saw an HP for our use cases was the clear winner in the benchmarks. So based on that, we made an initial investment in the factory. Uh, we moved about 150 VMs in the 150 VDI into it. Um, and then as, as we've seen all the benefits we've successfully invested, and we now have, um, an estate to the factory of about 800 VMs and about 400 VDI. So it's been a great platform and it's allowed us to really push boundaries and, and give the business, um, the service that expects. >>So w was that with the time in which you were able to go from data to insight to recommendation or, or edict, uh, was that compressed, you kind of indicated that, but >>So we, we all telemetry from the car and we post-process it, and that reprocessing time really it's very time consuming. And, um, you know, we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time and all, ultimately that meant an engineer could understand what the car was during a practice session, recommend a tweak to the configuration or setup of it, and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >>Such a great example. How are you guys feeling about the season, Matt? What's the team's sentiment? >>Yeah, I think we're optimistic. Um, we w we, um, uh, we have a new driver >>Lineup. Uh, we have, um, max for stopping his carries on with the team and Sergio joins the team. So we're really excited about this year and, uh, we want to go and win races. Great, Matt, good luck this season and going forward and thanks so much for coming back in the cube. Really appreciate it. And it's my pleasure. Great talking to you again. Okay. Now we're going to bring back Omer for quick summary. So keep it real >>Without having solutions from HB, we can't drive those five senses, CFD aerodynamics that would undermine the simulations being software defined. We can bring new apps into play. If we can bring new them's storage, networking, all of that can be highly advises is a hugely beneficial partnership for us. We're able to be at the cutting edge of technology in a highly stressed environment. That is no bigger challenge than the formula. >>Okay. We're back with Omar. Hey, what did you think about that interview with Matt? >>Great. Uh, I have to tell you I'm a big formula one fan, and they are one of my favorite customers. Uh, so, you know, obviously, uh, one of the biggest use cases as you saw for red bull racing is Trackside deployments. There are now 22 races in a season. These guys are jumping from one city to the next, they've got to pack up, move to the next city, set up, set up the infrastructure very, very quickly and average formula. One car is running the thousand plus sensors on that is generating a ton of data on track side that needs to be collected very quickly. It needs to be processed very quickly, and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull back factory back at the data center. What does this all need? It needs reliability. >>It needs compute power in a very short form factor. And it needs agility quick to set up quick, to go quick, to recover. And then in post processing, they need to have CPU density so they can pack more VMs out at the edge to be able to do that processing now. And we accomplished that for, for the red bull racing guys in basically two are you have two SimpliVity nodes that are running track side and moving with them from one, one race to the next race, to the next race. And every time those SimpliVity nodes connect up to the data center collector to a satellite, they're backing up back to their data center. They're sending snapshots of data back to the data center, essentially making their job a whole lot easier, where they can focus on racing and not on troubleshooting virtual machines, >>Red bull racing and HPE SimpliVity. Great example. It's agile, it's it's cost efficient, and it shows a real impact. Thank you very much. I really appreciate those summary comments. Thank you, Dave. Really appreciate it. All right. And thank you for watching. This is Dave Volante. >>You.

Published Date : Mar 30 2021

SUMMARY :

as close as possible to the sources of data, to reduce latency and maximize your ability to get Pleasure to be here. So how do you see the edge in the broader market shaping up? A lot of robot automation is going on that requires a lot of compute power to go out to More data is new data is generated at the edge, but needs to be stored. How do you look at that? a lot of States outside of the data center that needs to be protected. We at HBS believe that it needs to be extremely simple. You've got physics that you have to deal your latency to deal with. In addition to that, the customers now do not have to buy any third-party In addition to that, now you can backup straight to the client. So, so you can't comment on that or So as a result of that, as part of the GreenLake offering, we have cloud backup service natively are choosing SimpliVity for, particularly at the edge, and maybe talk about why from the data center, which you can literally push out and you can connect a network cable at the same time, there is disaster recovery use cases where you have, uh, out to service their stores because, you know, mobility is limited in these, in these strange times that we always about this race to manufacture The next designs to make it more adapt to the next circuit to run those. it's good to see you again. insights that really lead to competitive advantage. So this season we have 23 races and we So my job is, is really to make sure we have the right staff, that you have to tune to, or are there other factors involved there? So all that in order to win, you need to micromanage everything and optimize it for Talk about some of the key drivers in your business and some of the key apps that So all of that requires a lot of expertise to develop the simulation is the algorithms I mean, maybe in the form of like tribal So there's a huge amount of, um, very complicated models that So let's talk about your data center at the track, if you will. So the applications we need to operate the car and to make really Time is of the essence in a three X reduction in processing So for example, a race engineer could load the wrong but also costs just the simplicity of setting up the infrastructure. So we want applications absolutely fly, So did you consider any other options or was it because you had the factory knowledge? And the benefits that we see with hyper-converged actually mattered even more at the edge And, and all these, you know, speed and agility benefits, we saw an HP So we saw big, big reductions in time and all, How are you guys feeling about the season, Matt? we have a new driver Great talking to you again. We're able to be at Hey, what did you think about that interview with Matt? and then sometimes believe it or not, snapshots of this data needs to be sent to the red bull And we accomplished that for, for the red bull racing guys in And thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

SergioPERSON

0.99+

MattPERSON

0.99+

DavidPERSON

0.99+

DavePERSON

0.99+

two racksQUANTITY

0.99+

StevePERSON

0.99+

Dave VolantePERSON

0.99+

2020DATE

0.99+

OmarPERSON

0.99+

Omar AssadPERSON

0.99+

2018DATE

0.99+

Matt CadieuxPERSON

0.99+

20QUANTITY

0.99+

Red Bull RacingORGANIZATION

0.99+

HBSORGANIZATION

0.99+

Milton KeynesLOCATION

0.99+

2017DATE

0.99+

23 racesQUANTITY

0.99+

60 usersQUANTITY

0.99+

22 racesQUANTITY

0.99+

three-yearQUANTITY

0.99+

90 secondsQUANTITY

0.99+

eight minutesQUANTITY

0.99+

Omer AsadPERSON

0.99+

UKLOCATION

0.99+

two cablesQUANTITY

0.99+

One carQUANTITY

0.99+

more than 50%QUANTITY

0.99+

twoQUANTITY

0.99+

nineQUANTITY

0.99+

each trackQUANTITY

0.99+

ITTORGANIZATION

0.99+

SimpliVityTITLE

0.99+

last yearDATE

0.99+

two minutesQUANTITY

0.99+

VirginORGANIZATION

0.99+

HPE SimpliVityTITLE

0.99+

three racksQUANTITY

0.99+

Matt kuduPERSON

0.99+

oneQUANTITY

0.99+

hundreds of storesQUANTITY

0.99+

five sensesQUANTITY

0.99+

hundredsQUANTITY

0.99+

about 800 VMsQUANTITY

0.99+

bothQUANTITY

0.98+

green LakeORGANIZATION

0.98+

about 400 VDIQUANTITY

0.98+

10 yearsQUANTITY

0.98+

second use caseQUANTITY

0.98+

one cityQUANTITY

0.98+

ArubaORGANIZATION

0.98+

one siteQUANTITY

0.98+

five years agoDATE

0.98+

F1 RacingORGANIZATION

0.98+

todayDATE

0.98+

SimpliVityORGANIZATION

0.98+

this yearDATE

0.98+

150 VDIQUANTITY

0.98+

about 150 VMsQUANTITY

0.98+

SundayDATE

0.98+

red bullORGANIZATION

0.97+

firstQUANTITY

0.97+

OmerPERSON

0.97+

multi-trillion dollarQUANTITY

0.97+

over five yearsQUANTITY

0.97+

one large use caseQUANTITY

0.97+

first opportunityQUANTITY

0.97+

HPEORGANIZATION

0.97+

eachQUANTITY

0.96+

decadesQUANTITY

0.96+

one ratiosQUANTITY

0.96+

HPORGANIZATION

0.96+

one raceQUANTITY

0.95+

GreenLakeORGANIZATION

0.94+

Omer Asad, HPE ft Matt Cadieux, Red Bull Racing full v1 (UNLISTED)


 

(upbeat music) >> Edge computing is projected to be a multi-trillion dollar business. It's hard to really pinpoint the size of this market let alone fathom the potential of bringing software, compute, storage, AI and automation to the edge and connecting all that to clouds and on-prem systems. But what is the edge? Is it factories? Is it oil rigs, airplanes, windmills, shipping containers, buildings, homes, race cars. Well, yes and so much more. And what about the data? For decades we've talked about the data explosion. I mean, it's a mind-boggling but guess what we're going to look back in 10 years and laugh what we thought was a lot of data in 2020. Perhaps the best way to think about Edge is not as a place but when is the most logical opportunity to process the data and maybe it's the first opportunity to do so where it can be decrypted and analyzed at very low latencies. That defines the edge. And so by locating compute as close as possible to the sources of data to reduce latency and maximize your ability to get insights and return them to users quickly, maybe that's where the value lies. Hello everyone and welcome to this CUBE conversation. My name is Dave Vellante and with me to noodle on these topics is Omer Asad, VP and GM of Primary Storage and Data Management Services at HPE. Hello Omer, welcome to the program. >> Thanks Dave. Thank you so much. Pleasure to be here. >> Yeah. Great to see you again. So how do you see the edge in the broader market shaping up? >> Dave, I think that's a super important question. I think your ideas are quite aligned with how we think about it. I personally think enterprises are accelerating their sort of digitization and asset collection and data collection, they're typically especially in a distributed enterprise, they're trying to get to their customers. They're trying to minimize the latency to their customers. So especially if you look across industries manufacturing which has distributed factories all over the place they are going through a lot of factory transformations where they're digitizing their factories. That means a lot more data is now being generated within their factories. A lot of robot automation is going on, that requires a lot of compute power to go out to those particular factories which is going to generate their data out there. We've got insurance companies, banks, that are creating and interviewing and gathering more customers out at the edge for that. They need a lot more distributed processing out at the edge. What this is requiring is what we've seen is across analysts. A common consensus is this that more than 50% of an enterprises data especially if they operate globally around the world is going to be generated out at the edge. What does that mean? New data is generated at the edge what needs to be stored. It needs to be processed data. Data which is not required needs to be thrown away or classified as not important. And then it needs to be moved for DR purposes either to a central data center or just to another site. So overall in order to give the best possible experience for manufacturing, retail, especially in distributed enterprises, people are generating more and more data centric assets out at the edge. And that's what we see in the industry. >> Yeah. We're definitely aligned on that. There's some great points and so now, okay. You think about all this diversity what's the right architecture for these multi-site deployments, ROBO, edge? How do you look at that? >> Oh, excellent question, Dave. Every customer that we talked to wants SimpliVity and no pun intended because SimpliVity is reasoned with a simplistic edge centric architecture, right? Let's take a few examples. You've got large global retailers, they have hundreds of global retail stores around the world that is generating data that is producing data. Then you've got insurance companies, then you've got banks. So when you look at a distributed enterprise how do you deploy in a very simple and easy to deploy manner, easy to lifecycle, easy to mobilize and easy to lifecycle equipment out at the edge. What are some of the challenges that these customers deal with? These customers, you don't want to send a lot of IT staff out there because that adds cost. You don't want to have islands of data and islands of storage and promote sites because that adds a lot of states outside of the data center that needs to be protected. And then last but not the least how do you push lifecycle based applications, new applications out at the edge in a very simple to deploy manner. And how do you protect all this data at the edge? So the right architecture in my opinion needs to be extremely simple to deploy so storage compute and networking out towards the edge in a hyper converged environment. So that's we agree upon that. It's a very simple to deploy model but then comes how do you deploy applications on top of that? How do you manage these applications on top of that? How do you back up these applications back towards the data center, all of this keeping in mind that it has to be as zero touch as possible. We at HPE believe that it needs to be extremely simple, just give me two cables, a network cable, a power cable, fire it up, connect it to the network, push it state from the data center and back up it state from the edge back into the data center, extremely simple. >> It's got to be simple 'cause you've got so many challenges. You've got physics that you have to deal, you have latency to deal with. You got RPO and RTO. What happens if something goes wrong you've got to be able to recover quickly. So that's great. Thank you for that. Now you guys have heard news. What is new from HPE in this space? >> Excellent question, great. So from a deployment perspective, HPE SimpliVity is just gaining like it's exploding like crazy especially as distributed enterprises adopted as it's standardized edge architecture, right? It's an HCI box has got storage computer networking all in one. But now what we have done is not only you can deploy applications all from your standard V-Center interface from a data center, what have you have now added is the ability to backup to the cloud right from the edge. You can also back up all the way back to your core data center. All of the backup policies are fully automated and implemented in the distributed file system that is the heart and soul of the SimpliVity installation. In addition to that, the customers now do not have to buy any third-party software. Backup is fully integrated in the architecture and it's then efficient. In addition to that now you can backup straight to the client. You can back up to a central high-end backup repository which is in your data center. And last but not least, we have a lot of customers that are pushing the limit in their application transformation. So not only, we previously were one-on-one leaving VMware deployments out at the edge site now evolved also added both stateful and stateless container orchestration as well as data protection capabilities for containerized applications out at the edge. So we have a lot of customers that are now deploying containers, rapid manufacture containers to process data out at remote sites. And that allows us to not only protect those stateful applications but back them up back into the central data center. >> I saw in that chart, it was a line no egress fees. That's a pain point for a lot of CIOs that I talked to. They grit their teeth at those cities. So you can't comment on that or? >> Excellent question. I'm so glad you brought that up and sort of at the point that pick that up. So along with SimpliVity, we have the whole Green Lake as a service offering as well, right? So what that means Dave is, that we can literally provide our customers edge as a service. And when you compliment that with with Aruba Wired Wireless Infrastructure that goes at the edge, the hyperconverged infrastructure as part of SimpliVity that goes at the edge. One of the things that was missing with cloud backups is that every time you back up to the cloud, which is a great thing by the way, anytime you restore from the cloud there is that egress fee, right? So as a result of that, as part of the GreenLake offering we have cloud backup service natively now offered as part of HPE, which is included in your HPE SimpliVity edge as a service offering. So now not only can you backup into the cloud from your edge sites, but you can also restore back without any egress fees from HPE's data protection service. Either you can restore it back onto your data center, you can restore it back towards the edge site and because the infrastructure is so easy to deploy centrally lifecycle manage, it's very mobile. So if you want to deploy and recover to a different site, you could also do that. >> Nice. Hey, can you, Omer, can you double click a little bit on some of the use cases that customers are choosing SimpliVity for particularly at the edge and maybe talk about why they're choosing HPE? >> Excellent question. So one of the major use cases that we see Dave is obviously easy to deploy and easy to manage in a standardized form factor, right? A lot of these customers, like for example, we have large retailer across the US with hundreds of stores across US, right? Now you cannot send service staff to each of these stores. Their data center is essentially just a closet for these guys, right? So now how do you have a standardized deployment? So standardized deployment from the data center which you can literally push out and you can connect a network cable and a power cable and you're up and running and then automated backup, elimination of backup and state and DR from the edge sites and into the data center. So that's one of the big use cases to rapidly deploy new stores, bring them up in a standardized configuration both from a hardware and a software perspective and the ability to backup and recover that instantly. That's one large use case. The second use case that we see actually refers to a comment that you made in your opener, Dave, was when a lot of these customers are generating a lot of the data at the edge. This is robotics automation that is going up in manufacturing sites. These is racing teams that are out at the edge of doing post-processing of their cars data. At the same time there is disaster recovery use cases where you have campsites and local agencies that go out there for humanity's benefit. And they move from one site to the other. It's a very, very mobile architecture that they need. So those are just a few cases where we were deployed. There was a lot of data collection and there was a lot of mobility involved in these environments, so you need to be quick to set up, quick to backup, quick to recover. And essentially you're up to your next move. >> You seem pretty pumped up about this new innovation and why not. >> It is, especially because it has been taught through with edge in mind and edge has to be mobile. It has to be simple. And especially as we have lived through this pandemic which I hope we see the tail end of it in at least 2021 or at least 2022. One of the most common use cases that we saw and this was an accidental discovery. A lot of the retail sites could not go out to service their stores because mobility is limited in these strange times that we live in. So from a central recenter you're able to deploy applications. You're able to recover applications. And a lot of our customers said, hey I don't have enough space in my data center to back up. Do you have another option? So then we rolled out this update release to SimpliVity verse from the edge site. You can now directly back up to our backup service which is offered on a consumption basis to the customers and they can recover that anywhere they want. >> Fantastic Omer, thanks so much for coming on the program today. >> It's a pleasure, Dave. Thank you. >> All right. Awesome to see you, now, let's hear from Red Bull Racing an HPE customer that's actually using SimpliVity at the edge. (engine revving) >> Narrator: Formula one is a constant race against time Chasing in tens of seconds. (upbeat music) >> Okay. We're back with Matt Cadieux who is the CIO Red Bull Racing. Matt, it's good to see you again. >> Great to see you Dave. >> Hey, we're going to dig in to a real world example of using data at the edge in near real time to gain insights that really lead to competitive advantage. But first Matt tell us a little bit about Red Bull Racing and your role there. >> Sure. So I'm the CIO at Red Bull Racing and at Red Bull Racing we're based in Milton Keynes in the UK. And the main job for us is to design a race car, to manufacture the race car and then to race it around the world. So as CIO, we need to develop, the IT group needs to develop the applications use the design, manufacturing racing. We also need to supply all the underlying infrastructure and also manage security. So it's really interesting environment that's all about speed. So this season we have 23 races and we need to tear the car apart and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 23 and movable deadlines this big evolving prototype to manage with our car but we're also improving all of our tools and methods and software that we use to design make and race the car. So we have a big can-do attitude of the company around continuous improvement. And the expectations are that we continue to say, make the car faster. That we're winning races, that we improve our methods in the factory and our tools. And so for IT it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility of needs. So my job is really to make sure we have the right staff, the right partners, the right technical platforms. So we can live up to expectations. >> And Matt that tear down and rebuild for 23 races, is that because each track has its own unique signature that you have to tune to or are there other factors involved? >> Yeah, exactly. Every track has a different shape. Some have lots of straight, some have lots of curves and lots are in between. The track surface is very different and the impact that has on tires, the temperature and the climate is very different. Some are hilly, some have big curbs that affect the dynamics of the car. So all that in order to win you need to micromanage everything and optimize it for any given race track. >> COVID has of course been brutal for sports. What's the status of your season? >> So this season we knew that COVID was here and we're doing 23 races knowing we have COVID to manage. And as a premium sporting team with Pharma Bubbles we've put health and safety and social distancing into our environment. And we're able to able to operate by doing things in a safe manner. We have some special exceptions in the UK. So for example, when people returned from overseas that they did not have to quarantine for two weeks, but they get tested multiple times a week. And we know they're safe. So we're racing, we're dealing with all the hassle that COVID gives us. And we are really hoping for a return to normality sooner instead of later where we can get fans back at the track and really go racing and have the spectacle where everyone enjoys it. >> Yeah. That's awesome. So important for the fans but also all the employees around that ecosystem. Talk about some of the key drivers in your business and some of the key apps that give you competitive advantage to help you win races. >> Yeah. So in our business, everything is all about speed. So the car obviously needs to be fast but also all of our business operations need to be fast. We need to be able to design a car and it's all done in the virtual world, but the virtual simulations and designs needed to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulations, the algorithms and have all the underlying infrastructure that runs it quickly and reliably. In manufacturing we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running and helping us do that. And at the race track itself. And in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race we have split seconds literally seconds to alter our race strategy if an event happens. So if there's an accident and the safety car comes out or the weather changes, we revise our tactics and we're running Monte-Carlo for example. And use an experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors. All of that needs IT to work at a very high level. >> Yeah, it's interesting. I mean, as a lay person, historically when I think about technology in car racing, of course I think about the mechanical aspects of a self-propelled vehicle, the electronics and the light but not necessarily the data but the data's always been there. Hasn't it? I mean, maybe in the form of like tribal knowledge if you are somebody who knows the track and where the hills are and experience and gut feel but today you're digitizing it and you're processing it and close to real time. Its amazing. >> I think exactly right. Yeah. The car's instrumented with sensors, we post process and we are doing video image analysis and we're looking at our car, competitor's car. So there's a huge amount of very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah. The data and the applications that leverage it are really key and that's a critical success factor for us. >> So let's talk about your data center at the track, if you will. I mean, if I can call it that. Paint a picture for us what does that look like? >> So we have to send a lot of equipment to the track at the edge. And even though we have really a great wide area network link back to the factory and there's cloud resources a lot of the tracks are very old. You don't have hardened infrastructure, don't have ducks that protect cabling, for example and you can lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions all that needs to be at the edge where the car operates. So historically we had three racks of equipment like I said infrastructure and it was really hard to manage, to make changes, it was too flexible. There were multiple panes of glass and it was too slow. It didn't run our applications quickly. It was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. So we'd introduced hyper convergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said, there's a lot smarter way of operating. We can get rid of all the slow and flexible expensive legacy and introduce hyper convergence. And we saw really excellent benefits for doing that. We saw up three X speed up for a lot of our applications. So I'm here where we're post-processing data. And we have to make decisions about race strategy. Time is of the essence. The three X reduction in processing time really matters. We also were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a $100,000 a year in freight costs by shipping less equipment. Things like backup mistakes happen. Sometimes the user makes a mistake. So for example a race engineer could load the wrong data map into one of our simulations. And we could restore that DDI through SimpliVity backup at 90 seconds. And this enables engineers to focus on the car to make better decisions without having downtime. And we sent two IT guys to every race, they're managing 60 users a really diverse environment, juggling a lot of balls and having a simple management platform like HPE SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >> Yeah. So you had the nice Petri dish in the factory so it sounds like your goals are obviously number one KPIs speed to help shave seconds, awesome time, but also cost just the simplicity of setting up the infrastructure is-- >> That's exactly right. It's speed, speed, speed. So we want applications absolutely fly, get to actionable results quicker, get answers from our simulations quicker. The other area that speed's really critical is our applications are also evolving prototypes and we're always, the models are getting bigger. The simulations are getting bigger and they need more and more resource and being able to spin up resource and provision things without being a bottleneck is a big challenge in SimpliVity. It gives us the means of doing that. >> So did you consider any other options or was it because you had the factory knowledge? It was HCI was very clearly the option. What did you look at? >> Yeah, so we have over five years of experience in the factory and we eliminated all of our legacy infrastructure five years ago. And the benefits I've described at the track we saw that in the factory. At the track we have a three-year operational life cycle for our equipment. When in 2017 was the last year we had legacy as we were building for 2018, it was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized. Time is even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >> Why SimpliVity, why'd you choose HPE SimpliVity? >> Yeah. So when we first heard about hyper-converged way back in the factory, we had a legacy infrastructure overly complicated, too slow, too inflexible, too expensive. And we stepped back and said there has to be a smarter way of operating. We went out and challenged our technology partners, we learned about hyperconvergence, would enough the hype was real or not. So we underwent some PLCs and benchmarking and the PLCs were really impressive. And all these speed and agility benefits we saw and HPE for our use cases was the clear winner in the benchmarks. So based on that we made an initial investment in the factory. We moved about 150 VMs and 150 VDIs into it. And then as we've seen all the benefits we've successfully invested and we now have an estate in the factory of about 800 VMs and about 400 VDIs. So it's been a great platform and it's allowed us to really push boundaries and give the business the service it expects. >> Awesome fun stories, just coming back to the metrics for a minute. So you're running Monte Carlo simulations in real time and sort of near real-time. And so essentially that's if I understand it, that's what ifs and it's the probability of the outcome. And then somebody got to make, then the human's got to say, okay, do this, right? Was the time in which you were able to go from data to insight to recommendation or edict was that compressed and you kind of indicated that. >> Yeah, that was accelerated. And so in that use case, what we're trying to do is predict the future and you're saying, well and before any event happens, you're doing what ifs and if it were to happen, what would you probabilistic do? So that simulation, we've been running for awhile but it gets better and better as we get more knowledge. And so that we were able to accelerate that with SimpliVity but there's other use cases too. So we also have telemetry from the car and we post-process it. And that reprocessing time really, is it's very time consuming. And we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time. And ultimately that meant an engineer could understand what the car was doing in a practice session, recommend a tweak to the configuration or setup of it and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >> Such a great example. How are you guys feeling about the season, Matt? What's the team's sentiment? >> I think we're optimistic. Thinking our simulations that we have a great car we have a new driver lineup. We have the Max Verstapenn who carries on with the team and Sergio Cross joins the team. So we're really excited about this year and we want to go and win races. And I think with COVID people are just itching also to get back to a little degree of normality and going racing again even though there's no fans, it gets us into a degree of normality. >> That's great, Matt, good luck this season and going forward and thanks so much for coming back in theCUBE. Really appreciate it. >> It's my pleasure. Great talking to you again. >> Okay. Now we're going to bring back Omer for quick summary. So keep it right there. >> Narrator: That's where the data comes face to face with the real world. >> Narrator: Working with Hewlett Packard Enterprise is a hugely beneficial partnership for us. We're able to be at the cutting edge of technology in a highly technical, highly stressed environment. There is no bigger challenge than Formula One. (upbeat music) >> Being in the car and driving in on the limit that is the best thing out there. >> Narrator: It's that innovation and creativity to ultimately achieves winning of this. >> Okay. We're back with Omer. Hey, what did you think about that interview with Matt? >> Great. I have to tell you, I'm a big formula One fan and they are one of my favorite customers. So obviously one of the biggest use cases as you saw for Red Bull Racing is track side deployments. There are now 22 races in a season. These guys are jumping from one city to the next they got to pack up, move to the next city, set up the infrastructure very very quickly. An average Formula One car is running the thousand plus sensors on, that is generating a ton of data on track side that needs to be collected very quickly. It needs to be processed very quickly and then sometimes believe it or not snapshots of this data needs to be sent to the Red Bull back factory back at the data center. What does this all need? It needs reliability. It needs compute power in a very short form factor. And it needs agility quick to set up, quick to go, quick to recover. And then in post processing they need to have CPU density so they can pack more VMs out at the edge to be able to do that processing. And we accomplished that for the Red Bull Racing guys in basically two of you have two SimpliVity nodes that are running track side and moving with them from one race to the next race to the next race. And every time those SimpliVity nodes connect up to the data center, collect up to a satellite they're backing up back to their data center. They're sending snapshots of data back to the data center essentially making their job a whole lot easier where they can focus on racing and not on troubleshooting virtual machines. >> Red bull Racing and HPE SimpliVity. Great example. It's agile, it's it's cost efficient and it shows a real impact. Thank you very much Omer. I really appreciate those summary comments. >> Thank you, Dave. Really appreciate it. >> All right. And thank you for watching. This is Dave Volante for theCUBE. (upbeat music)

Published Date : Mar 5 2021

SUMMARY :

and connecting all that to Pleasure to be here. So how do you see the edge in And then it needs to be moved for DR How do you look at that? and easy to deploy It's got to be simple and implemented in the So you can't comment on that or? and because the infrastructure is so easy on some of the use cases and the ability to backup You seem pretty pumped up about A lot of the retail sites on the program today. It's a pleasure, Dave. SimpliVity at the edge. a constant race against time Matt, it's good to see you again. in to a real world example and then to race it around the world. So all that in order to win What's the status of your season? and have the spectacle So important for the fans So the car obviously needs to be fast and close to real time. and to continuously improve our car. data center at the track, So the applications we Petri dish in the factory and being able to spin up the factory knowledge? And the benefits that we see and the PLCs were really impressive. Was the time in which you And so that we were able to about the season, Matt? and Sergio Cross joins the team. and thanks so much for Great talking to you again. going to bring back Omer comes face to face with the real world. We're able to be at the that is the best thing out there. and creativity to ultimately that interview with Matt? So obviously one of the biggest use cases and it shows a real impact. Thank you, Dave. And thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Matt CadieuxPERSON

0.99+

DavePERSON

0.99+

Dave VellantePERSON

0.99+

Sergio CrossPERSON

0.99+

2017DATE

0.99+

2018DATE

0.99+

Red Bull RacingORGANIZATION

0.99+

MattPERSON

0.99+

2020DATE

0.99+

Milton KeynesLOCATION

0.99+

two weeksQUANTITY

0.99+

three-yearQUANTITY

0.99+

20QUANTITY

0.99+

Red Bull RacingORGANIZATION

0.99+

Omer AsadPERSON

0.99+

Dave VolantePERSON

0.99+

USLOCATION

0.99+

OmerPERSON

0.99+

Red BullORGANIZATION

0.99+

UKLOCATION

0.99+

two racksQUANTITY

0.99+

23 racesQUANTITY

0.99+

Max VerstapennPERSON

0.99+

90 secondsQUANTITY

0.99+

60 usersQUANTITY

0.99+

22 racesQUANTITY

0.99+

eight minutesQUANTITY

0.99+

more than 50%QUANTITY

0.99+

each trackQUANTITY

0.99+

twoQUANTITY

0.99+

one raceQUANTITY

0.99+

two minutesQUANTITY

0.99+

two cablesQUANTITY

0.99+

nineQUANTITY

0.99+

Hewlett Packard EnterpriseORGANIZATION

0.99+

150 VDIsQUANTITY

0.99+

SimpliVityTITLE

0.99+

Pharma BubblesORGANIZATION

0.99+

oneQUANTITY

0.99+

five years agoDATE

0.99+

first opportunityQUANTITY

0.99+

last yearDATE

0.99+

OneQUANTITY

0.99+

about 800 VMsQUANTITY

0.99+

three racksQUANTITY

0.98+

firstQUANTITY

0.98+

one siteQUANTITY

0.98+

HPEORGANIZATION

0.98+

Monte CarloTITLE

0.98+

about 400 VDIsQUANTITY

0.98+

Primary Storage and Data Management ServicesORGANIZATION

0.98+

hundreds of storesQUANTITY

0.98+

Red bull RacingORGANIZATION

0.98+

bothQUANTITY

0.98+

thousand plus sensorsQUANTITY

0.98+

tens of secondsQUANTITY

0.98+

second use caseQUANTITY

0.98+

multi-trillion dollarQUANTITY

0.98+

over five yearsQUANTITY

0.98+

todayDATE

0.97+

GreenLakeORGANIZATION

0.97+

one cityQUANTITY

0.97+

10 yearsQUANTITY

0.96+

HPE SimpliVityTITLE

0.96+

COVIDOTHER

0.96+

hundreds of global retail storesQUANTITY

0.96+

about 150 VMsQUANTITY

0.96+

Matt Cadieux, CIO Red Bull Racing v2


 

(mellow music) >> Okay, we're back with Matt Cadieux who is the CIO Red Bull Racing. Matt, it's good to see you again. >> Yeah, great to see you, Dave. >> Hey, we're going to dig into a real world example of using data at the edge and in near real-time to gain insights that really lead to competitive advantage. But first Matt, tell us a little bit about Red Bull Racing and your role there. >> Sure, so I'm the CIO at Red Bull Racing. And at Red Bull Racing we're based in Milton Keynes in the UK. And the main job for us is to design a race car, to manufacture the race car, and then to race it around the world. So as CIO, we need to develop, the IT team needs to develop the applications used for the design, manufacturing, and racing. We also need to supply all the underlying infrastructure, and also manage security. So it's a really interesting environment that's all about speed. So this season we have 23 races, and we need to tear the car apart, and rebuild it to a unique configuration for every individual race. And we're also designing and making components targeted for races. So 23 immovable deadlines, this big evolving prototype to manage with our car. But we're also improving all of our tools and methods and software that we use to design and make and race the car. So we have a big can-do attitude in the company, around continuous improvement. And the expectations are that we continue to make the car faster, that we're winning races, that we improve our methods in the factory and our tools. And so for IT it's really unique and that we can be part of that journey and provide a better service. It's also a big challenge to provide that service and to give the business the agility it needs. So my job is really to make sure we have the right staff, the right partners, the right technical platforms, so we can live up to expectations. >> And Matt that tear down and rebuild for 23 races. Is that because each track has its own unique signature that you have to tune to or are there other factors involved there? >> Yeah, exactly. Every track has a different shape. Some have lots of straight, some have lots of curves and lots are in between. The track's surface is very different and the impact that has on tires, the temperature and the climate is very different. Some are hilly, some are big curves that affect the dynamics of the car. So all that in order to win, you need to micromanage everything and optimize it for any given race track. >> And, you know, COVID has, of course, been brutal for sports. What's the status of your season? >> So this season we knew that COVID was here and we're doing 23 races knowing we have COVID to manage. And as a premium sporting team we've formed bubbles, we've put health and safety and social distancing into our environment. And we're able to operate by doing things in a safe manner. We have some special exhibitions in the UK. So for example, when people return from overseas that they do not have to quarantine for two weeks but they get tested multiple times a week and we know they're safe. So we're racing, we're dealing with all the hassle that COVID gives us. And we are really hoping for a return to normality sooner instead of later where we can get fans back at the track and really go racing and have the spectacle where everyone enjoys it. >> Yeah, that's awesome. So important for the fans but also all the employees around that ecosystem. Talk about some of the key drivers in your business and some of the key apps that give you competitive advantage to help you win races. >> Yeah, so in our business everything is all about speed. So the car obviously needs to be fast but also all of our business operations need to be fast. We need to be able to design our car and it's all done in the virtual world but the virtual simulations and designs need to correlate to what happens in the real world. So all of that requires a lot of expertise to develop the simulations, the algorithms, and have all the underlying infrastructure that runs it quickly and reliably. In manufacturing, we have cost caps and financial controls by regulation. We need to be super efficient and control material and resources. So ERP and MES systems are running, helping us do that. And at the race track itself in speed, we have hundreds of decisions to make on a Friday and Saturday as we're fine tuning the final configuration of the car. And here again, we rely on simulations and analytics to help do that. And then during the race, we have split seconds, literally seconds to alter our race strategy if an event happens. So if there's an accident and the safety car comes out or the weather changes, we revise our tactics. And we're running Monte Carlo for example. And using experienced engineers with simulations to make a data-driven decision and hopefully a better one and faster than our competitors. All of that needs IT to work at a very high level. >> You know it's interesting, I mean, as a lay person, historically when I think about technology and car racing, of course, I think about the mechanical aspects of a self-propelled vehicle, the electronics and the like, but not necessarily the data. But the data's always been there, hasn't it? I mean, maybe in the form of like tribal knowledge, if it's somebody who knows the track and where the hills are and experience and gut feel. But today you're digitizing it and you're processing it in close to real-time. It's amazing. >> Yeah, exactly right. Yeah, the car is instrumented with sensors, we post-process, we're doing video, image analysis and we're looking at our car, our competitor's car. So there's a huge amount of very complicated models that we're using to optimize our performance and to continuously improve our car. Yeah, the data and the applications that leverage it are really key. And that's a critical success factor for us. >> So let's talk about your data center at the track, if you will, I mean, if I can call it that. Paint a picture for us. >> Sure. What does that look like? >> So we have to send a lot of equipment to the track, at the edge. And even though we have really a great lateral network link back to the factory and there's cloud resources, a lot of the tracks are very old. You don't have hardened infrastructure, you don't have docks that protect cabling, for example, and you can lose connectivity to remote locations. So the applications we need to operate the car and to make really critical decisions, all that needs to be at the edge where the car operates. So historically we had three racks of equipment, legacy infrastructure and it was really hard to manage, to make changes, it was too inflexible. There were multiple panes of glass, and it was too slow. It didn't run our applications quickly. It was also too heavy and took up too much space when you're cramped into a garage with lots of environmental constraints. So we'd introduced hyper-convergence into the factory and seen a lot of great benefits. And when we came time to refresh our infrastructure at the track, we stepped back and said there's a lot smarter way of operating. We can get rid of all this slow and inflexible expensive legacy and introduce hyper-convergence. And we saw really excellent benefits for doing that. We saw a three X speed up for a lot of our applications. So here where we're post-processing data, and we have to make decisions about race strategy, time is of the essence and a three X reduction in processing time really matters. We also were able to go from three racks of equipment down to two racks of equipment and the storage efficiency of the HPE SimpliVity platform with 20 to one ratios allowed us to eliminate a rack. And that actually saved a $100,000 a year in freight costs by shipping less equipment. Things like backup, mistakes happen. Sometimes a user makes a mistake. So for example a race engineer could load the wrong data map into one of our simulations. And we could restore that DDI through SimpliVity backup in 90 seconds. And this makes sure, enables engineers to focus on the car, to make better decisions without having downtime. And we send two IT guys to every race. They're managing 60 users, a really diverse environment, juggling a lot of balls and having a simple management platform like HP SimpliVity gives us, allows them to be very effective and to work quickly. So all of those benefits were a huge step forward relative to the legacy infrastructure that we used to run at the edge. >> Yes, so you had the nice Petri dish in the factory, so it sounds like your goals obviously, number one KPI is speed to help shave seconds off the time, but also cost. >> That's right. Just the simplicity of setting up the infrastructure is key. >> Yeah, that's exactly right. >> It's speed, speed, speed. So we want applications that absolutely fly, you know gets actionable results quicker, get answers from our simulations quicker. The other area that speed's really critical is our applications are also evolving prototypes and we're always, the models are getting bigger, the simulations are getting bigger, and they need more and more resource. And being able to spin up resource and provision things without being a bottleneck is a big challenge. And SimpliVity gives us the means of doing that. >> So did you consider any other options or was it because you had the factory knowledge, HCI was, you know, very clearly the option? What did you look at? >> Yeah, so we have over five years of experience in the factory and we eliminated all of our legacy infrastructure five years ago. And the benefits I've described at the track we saw that in the factory. At the track, we have a three-year operational life cycle for our equipment. 2017 was the last year we had legacy. As we were building for 2018, it was obvious that hyper-converged was the right technology to introduce. And we'd had years of experience in the factory already. And the benefits that we see with hyper-converged actually mattered even more at the edge because our operations are so much more pressurized. Time is even more of the essence. And so speeding everything up at the really pointy end of our business was really critical. It was an obvious choice. >> So why SimpliVity? Why do you choose HPE SimpliVity? >> Yeah, so when we first heard about hyper-converged, way back in the factory. We had a legacy infrastructure, overly complicated, too slow, too inflexible, too expensive. And we stepped back and said there has to be a smarter way of operating. We went out and challenged our technology partners. We learned about hyper-convergence. We didn't know if the hype was real or not. So we underwent some PLCs and benchmarking and the PLCs were really impressive. And all these, you know, speed and agility benefits we saw and HPE for our use cases was the clear winner in the benchmarks. So based on that we made an initial investment in the factory. We moved about 150 VMs and 150 VDIs into it. And then as we've seen all the benefits we've successfully invested, and we now have an estate in the factory of about 800 VMs and about 400 VDIs. So it's been a great platform and it's allowed us to really push boundaries and give the business the service it expects. >> Well that's a fun story. So just coming back to the metrics for a minute. So you're running Monte Carlo simulations in real-time and sort of near real-time. >> Yeah. And so essentially that's, if I understand it, that's what-ifs and it's the probability of the outcome. And then somebody's got to make, >> Exactly. then a human's got to say, okay, do this, right. And so was that, >> Yeah. with the time in which you were able to go from data to insight to recommendation or edict was that compressed? You kind of indicated that, but. >> Yeah, that was accelerated. And so in that use case, what we're trying to do is predict the future and you're saying well, and before any event happens, you're doing what-ifs. Then if it were to happen, what would you probabilistically do? So, you know, so that simulation we've been running for a while but it gets better and better as we get more knowledge. And so that we were able to accelerate that with SimpliVity. But there's other use cases too. So we offload telemetry from the car and we post-process it. And that reprocessing time really is very time consuming. And, you know, we went from nine, eight minutes for some of the simulations down to just two minutes. So we saw big, big reductions in time. And ultimately that meant an engineer could understand what the car was doing in a practice session, recommend a tweak to the configuration or setup of it, and just get more actionable insight quicker. And it ultimately helps get a better car quicker. >> Such a great example. How are you guys feeling about the season, Matt? What's the team's, the sentiment? >> Yeah, I think we're optimistic. We with thinking our simulations that we have a great car. We have a new driver lineup. We have Max Verstappen who carries on with the team and Sergio Perez joins the team. So we're really excited about this year and we want to go and win races. And I think with COVID people are just itching also to get back to a little degree of normality, and, you know, and going racing again, even though there's no fans, it gets us into a degree of normality. >> That's great, Matt, good luck this season and going forward and thanks so much for coming back in theCUBE. Really appreciate it. >> It's my pleasure. Great talking to you again. >> Okay, now we're going to bring back Omar for a quick summary. So keep it right there. (mellow music)

Published Date : Mar 4 2021

SUMMARY :

Matt, it's good to see you again. and in near real-time and that we can be part of that journey And Matt that tear down and the impact that has on tires, What's the status of your season? and have the spectacle and some of the key apps So the car obviously needs to be fast the electronics and the like, and to continuously improve our car. data center at the track, What does that look like? So the applications we Petri dish in the factory, Just the simplicity of And being able to spin up And the benefits that we and the PLCs were really impressive. So just coming back to probability of the outcome. And so was that, from data to insight to recommendation And so that we were able to What's the team's, the sentiment? and Sergio Perez joins the team. and going forward and thanks so much Great talking to you again. So keep it right there.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Max VerstappenPERSON

0.99+

Matt CadieuxPERSON

0.99+

Sergio PerezPERSON

0.99+

MattPERSON

0.99+

two weeksQUANTITY

0.99+

Milton KeynesLOCATION

0.99+

Red Bull RacingORGANIZATION

0.99+

DavePERSON

0.99+

OmarPERSON

0.99+

2018DATE

0.99+

60 usersQUANTITY

0.99+

UKLOCATION

0.99+

20QUANTITY

0.99+

90 secondsQUANTITY

0.99+

23 racesQUANTITY

0.99+

150 VDIsQUANTITY

0.99+

three-yearQUANTITY

0.99+

two racksQUANTITY

0.99+

each trackQUANTITY

0.99+

2017DATE

0.99+

two minutesQUANTITY

0.99+

eight minutesQUANTITY

0.99+

nineQUANTITY

0.99+

three racksQUANTITY

0.99+

last yearDATE

0.99+

five years agoDATE

0.98+

hundredsQUANTITY

0.98+

todayDATE

0.98+

about 800 VMsQUANTITY

0.98+

HPORGANIZATION

0.98+

about 150 VMsQUANTITY

0.98+

about 400 VDIsQUANTITY

0.98+

one ratiosQUANTITY

0.98+

firstQUANTITY

0.96+

over five yearsQUANTITY

0.95+

this yearDATE

0.95+

SimpliVityTITLE

0.94+

$100,000 a yearQUANTITY

0.93+

23 immovableQUANTITY

0.93+

HCIORGANIZATION

0.93+

two ITQUANTITY

0.91+

SaturdayDATE

0.91+

Monte CarloTITLE

0.91+

oneQUANTITY

0.88+

Every trackQUANTITY

0.84+

a minuteQUANTITY

0.77+

COVIDOTHER

0.77+

threeQUANTITY

0.76+

Monte CarloCOMMERCIAL_ITEM

0.75+

every raceQUANTITY

0.75+

times a weekQUANTITY

0.75+

secondsQUANTITY

0.64+

FridayDATE

0.6+

of curvesQUANTITY

0.58+

noQUANTITY

0.56+

number oneQUANTITY

0.56+

straightQUANTITY

0.52+

SimpliVityOTHER

0.52+

COVIDTITLE

0.5+

HPETITLE

0.34+

Jamie Thomas, IBM | IBM Think 2020


 

Narrator: From theCUBE studios in Palo Alto and Boston, it's theCUBE, covering IBM Think, brought to you by IBM. >> We're back. You're watching theCUBE and our coverage of IBM Think 2020, the digital IBM thinking. We're here with Jamie Thomas, who's the general manager of strategy and development for IBM Systems. Jamie, great to see you. >> It's great to see you as always. >> You have been knee deep in qubits, the last couple years. And we're going to talk quantum. We've talked quantum a lot in the past, but it's a really interesting field. We spoke to you last year at IBM Think about this topic. And a year in this industry is a long time, but so give us the update what's new in quantum land? >> Well, Dave first of all, I'd like to say that in this environment we find ourselves in, I think we can all appreciate why innovation of this nature is perhaps more important going forward, right? If we look at some of the opportunities to solve some of the unsolvable problems, or solve problems much more quickly, in the case of pharmaceutical research. But for us in IBM, it's been a really busy year. First of all, we worked to advance the technology, which is first and foremost in terms of this journey to quantum. We just brought online our 53 qubit computer, which also has a quantum volume of 32, which we can talk about. And we've continued to advance the software stack that's attached to the technology because you have to have both the software and the hardware thing, right rate and pace. We've advanced our new network, which you and I have spoken about, which are those individuals across the commercial enterprises, academic and startups, who are working with us to co-create around quantum to help us understand the use cases that really can be solved in the future with quantum. And we've also continued to advance our community, which is serving as well in this new digital world that we're finding ourselves in, in terms of reaching out to developers. Now, we have over 300,000 unique downloads of the programming model that represents the developers that we're touching out there every day with quantum. These developers have, in the last year, have run over 140 billion quantum circuits. So, our machines in the cloud are quite active, and the cloud model, of course, is serving us well. The data's, in addition, to all the other things that I mentioned. >> So Jamie, what metrics are you trying to optimize on? You mentioned 53 qubits I saw that actually came online, I think, last fall. So you're nearly six months in now, which is awesome. But what are you measuring? Are you measuring stability or coherence or error rates? Number of qubits? What are the things that you're trying to optimize on to measure progress? >> Well, that's a good question. So we have this metric that we've defined over the last year or two called quantum volume. And quantum volume 32, which is the capacity of our current machine really is a representation of many of the things that you mentioned. It represents the power of the quantum machine, if you will. It includes a definition of our ability to provide error correction, to maintain states, to really accomplish workloads with the computer. So there's a number of factors that go into quantum volume, which we think are important. Now, qubits and the number of qubits is just one such metric. It really depends on the coherence and the effect of error correction, to really get the value out of the machine, and that's a very important metric. >> Yeah, we love to boil things down to a single metric. It's more complicated than that >> Yeah, yeah. >> specifically with quantum. So, talk a little bit more about what clients are doing and I'm particularly interested in the ecosystem that you're forming around quantum. >> Well, as I said, the ecosystem is both the network, which are those that are really intently working with us to co-create because we found, through our long history in IBM, that co-creation is really important. And also these researchers and developers realize that some of our developers today are really researchers, but as you as you go forward you get many different types of developers that are part of this mix. But in terms of our ecosystem, we're really fundamentally focused on key problems around chemistry, material science, financial services. And over the last year, there's over 200 papers that have been written out there from our network that really embody their work with us on this journey. So we're looking at things like quadratic speed up of things like Monte Carlo simulation, which is used in the financial services arena today to quantify risk. There's papers out there around topics like trade settlements, which in the world today trade settlements is a very complex domain with very interconnected complex rules and trillions of dollars in the purview of trade settlement. So, it's just an example. Options pricing, so you see examples around options pricing from corporations like JPMC in the area of financial services. And likewise in chemistry, there's a lot of research out there focused on batteries. As you can imagine, getting everything to electric powered batteries is an important topic. But today, the way we manufacture batteries can in fact create air pollution, in terms of the process, as well as we want batteries to have more retention in life to be more effective in energy conservation. So, how do we create batteries and still protect our environment, as we all would like to do? And so we've had a lot of research around things like the next generation of electric batteries, which is a key topic. But if you can think, you know Dave, there's so many topics here around chemistry, also pharmaceuticals that could be advanced with a quantum computer. Obviously, if you look at the COVID-19 news, our supercomputer that we installed at Oak Ridge National Laboratory for instance, is being used to analyze 8000 different compounds for specifically around COVID-19 and the possibilities of using those compounds to solve COVID-19, or influence it in a positive manner. You can think of the quantum computer when it comes online as an accelerator to a supercomputer like that, helping speed up this kind of research even faster than what we're able to do with something like the Summit supercomputer. Oak Ridge is one of our prominent clients with the quantum technology, and they certainly see it that way, right, as an accelerator to the capacity they already have. So a great example that I think is very germane in the time that we find ourselves in. >> How 'about startups in this ecosystem? Are you able to-- I mean there must be startups popping up all over the place for this opportunity. Are you working with any startups or incubating any startups? Can you talk about that? >> Oh yep. Absolutely. There's about a third of our network are in VC startups and there's a long list of them out there. They're focused on many different aspects of quantum computing. Many of 'em are focused on what I would call loosely, the programming model, looking at improving algorithms across different industries, making it easier for those that are, perhaps more skilled in domains, whether that is chemistry or financial services or mathematics, to use the power of the quantum computer. Many of those startups are leveraging our Qiskit, our quantum information science open programming model that we put out there so it's open. Many of the startups are using that programming model and then adding their own secret sauce, if you will, to understand how they can help bring on users in different ways. So it depends on their domain. You see some startups that are focused on the hardware as well, of course, looking at different hardware technologies that can be used to solve quantum. I would say I feel like more of them are focused on the software programming model. >> Well Jamie, it was interesting hear you talk about what some of the clients are doing. I mean obviously in pharmaceuticals, and battery manufacturers do a lot of advanced R and D, but you mentioned financial services, you know JPMC. It's almost like they're now doing advanced R and D trying to figure out how they can apply quantum to their business down the road. >> Absolutely, and we have a number of financial institutions that we've announced as part of the network. JPMC is just one of our premiere references who have written papers about it. But I would tell you that in the world of Monte Carlo simulation, options pricing, risk management, a small change can make a big difference in dollars. So we're talking about operations that in many cases they could achieve, but not achieve in the right amount of time. The ability to use quantum as an accelerator for these kind of operations is very important. And I can tell you, even in the last few weeks, we've had a number of briefings with financial companies for five hours on this topic. Looking at what could they do and learning from the work that's already done out there. I think this kind of advanced research is going to be very important. We also had new members that we announced at the beginning of the year at the CES show. Delta Airlines joined. First Transportation Company, Amgen joined, a pharmaceutical, an example of pharmaceuticals, as well as a number of other research organizations. Georgia Tech, University of New Mexico, Anthem Insurance, just an example of the industries that are looking to take advantage of this kind of technology as it matures. >> Well, and it strikes me too, that as you start to bring machine intelligence into the equation, it's a game changer. I mean, I've been saying that it's not Moore's Law driving the industry anymore, it's this combination of data, AI, and cloud for scale, but now-- Of course there are alternative processors going on, we're seeing that, but now as you bring in quantum that actually adds to that innovation cocktail, doesn't it? >> Yes, and as you recall when you and I spoke last year about this, there are certain domains today where you really cannot get as much effective gain out of classical computing. And clearly, chemistry is one of those domains because today, with classical computers, we're really unable to model even something as simple as a caffeine molecule, which we're all so very familiar with. I have my caffeine here with me today. (laughs) But you know, clearly, to the degree we can actually apply molecular modeling and the advantages that quantum brings to those fields, we'll be able to understand so much more about materials that affect all of us around the world, about energy, how to explore energy, and create energy without creating the carbon footprint and the bad outcomes associated with energy creation, and how to obviously deal with pharmaceutical creation much more effectively. There's a real promise in a lot of these different areas. >> I wonder if you could talk a little bit about some of the landscape and I'm really interested in what IBM brings to the table that's sort of different. You're seeing a lot of companies enter this space, some big and many small, what's the unique aspect that IBM brings to the table? You've mentioned co-creating before. Are you co-creating, coopertating with some of the other big guys? Maybe you could address that. >> Well, obviously this is a very hot topic, both within the technology industry and across government entities. I think that some of the key values we bring to the table is we are the only vendor right now that has a fleet of systems available in the cloud, and we've been out there for several years, enabling clients to take advantage of our capacity. We have both free access and premium access, which is what the network is paying for because they get access to the highest fidelity machines. Clearly, we understand intently, classical computing and the ability to leverage classical with quantum for advantage across many of these different industries, which I think is unique. We understand the cloud experience that we're bringing to play here with quantum since day one, and most importantly, I think we have strong relationships. We have, in many cases, we're still running the world. I see it every day coming through my clients' port vantage point. We understand financial services. We understand healthcare. We understand many of these important domains, and we're used to solving tough problems. So, we'll bring that experience with our clients and those industries to the table here and help them on this journey. >> You mentioned your experience in sort of traditional computing, basically if I understand it correctly, you're still using traditional silicon microprocessors to read and write the data that's coming out of quantum. I don't know if they're sitting physically side by side, but you've got this big cryogenic unit, cables coming in. That's the sort of standard for some time. It reminds me, can it go back to ENIAC? And now, which is really excites me because you look at the potential to miniaturize this over the next several decades, but is that right, you're sort of side by side with traditional computing approaches? >> Right, effectively what we do with quantum today does not happen without classical computers. The front end, you're coming in on classical computers. You're storing your data on classical computers, so that is the model that we're in today, and that will continue to happen. In terms of the quantum processor itself, it is a silicon based processor, but it's a superconducting technology, in our case, that runs inside that cryogenics unit at a very cold temperature. It is powered by next-generation electronics that we in IBM have innovated around and created our own electronic stack that actually sends microwave pulses into the processor that resides in the cryogenics unit. So when you think about the components of the system, you have to be innovating around the processor, the cryogenics unit, the custom electronic stack, and the software all at the same time. And yes, we're doing that in terms of being surrounded by this classical backplane that allows our Q network, as well as the developers around the world to actually communicate with these systems. >> The other thing that I really like about this conversation is it's not just R and D for the sake of R and D, you've actually, you're working with partners to, like you said, co-create, customers, financial services, airlines, manufacturing, et cetera. I wonder if you could maybe kind of address some of the things that you see happening in the sort of near to midterm, specifically as it relates to where people start. If I'm interested in this, what do I do? Do I need new skills? Do I need-- It's in the cloud, right? >> Yeah. >> So I can spit it up there, but where do people get started? >> Well they can certainly come to the Quantum Experience, which is our cloud experience and start to try out the system. So, we have both easy ways to get started with visual composition of circuits, as well as using the programming model that I mentioned, the Qiskit programming model. We've provided extensive YouTube videos out there already. So, developers who are interested in starting to learn about quantum can go out there and subscribe to our YouTube channel. We've got over 40 assets already recorded out there, and we continue to do those. We did one last week on quantum circuits for those that are more interested in that particular domain, but I think that's a part of this journey is making sure that we have all the assets out there digitally available for those around the world that want to interact with us. We have tremendous amount of education. We're also providing education to our business partners. One of our key network members, who I'll be speaking with later, I think today, is from Accenture. Accenture's an example of an organization that's helping their clients understand this quantum journey, and of course they're providing their own assets, if you will, but once again, taking advantage of the education that we're providing to them as a business partner. >> People talk about quantum being a decade away, but I think that's the wrong way to think about it, and I'd love your thoughts on this. It feels like, almost like the return coming out of COVID-19, it's going to come in waves, and there's parts that are going to be commercialized thoroughly and it's not binary. It's not like all of a sudden one day we're going to wake, "Hey, quantum is here!" It's really going to come in layers. Your thoughts? >> Yeah, I definitely agree with that. It's very important, that thought process because if you want to be competitive in your industry, you should think about getting started now. And that's why you see so many financial services, industrial firms, and others joining to really start experimentation around some of these domain areas to understand jointly how we evolve these algorithms to solve these problems. I think that the production level characteristics will curate the rate and pace of the industry. The industry, as we know, can drive things together faster. So together, we can make this a reality faster, and certainly none of us want to say it's going to be a decade, right. I mean, we're getting advantage today, in terms of the experimentation and the understanding of these problems, and we have to expedite that, I think, in the next few years. And certainly, with this arms race that we see, that's going to continue. One of the things I didn't mention is that IBM is also working with certain countries and we have significant agreements now with the countries of Germany and Japan to put quantum computers in an IBM facility in those countries. It's in collaboration with Fraunhofer Institute or miR Scientific Organization in Germany and with the University of Tokyo in Japan. So you can see that it's not only being pushed by industry, but it's also being pushed from the vantage of countries and bringing this research and technology to their countries. >> All right, Jamie, we're going to have to leave it there. Thanks so much for coming on theCUBE and give us the update. It's always great to see you. Hopefully, next time I see you, it'll be face to face. >> That's right, I hope so too. It's great to see you guys, thank you. Bye. >> All right, you're welcome. Keep it right there everybody. This is Dave Vellante for theCUBE. Be back right after this short break. (gentle music)

Published Date : May 5 2020

SUMMARY :

brought to you by IBM. the digital IBM thinking. We spoke to you last year at in the future with quantum. What are the things that you're trying of many of the things that you mentioned. things down to a single metric. interested in the ecosystem in the time that we find ourselves in. all over the place for this opportunity. Many of the startups are to their business down the road. just an example of the that actually adds to that and the bad outcomes associated of the other big guys? and the ability to leverage That's the sort of standard for some time. so that is the model that we're in today, in the sort of near to midterm, and subscribe to our YouTube channel. that are going to be One of the things I didn't It's always great to see you. It's great to see you guys, thank you. Be back right after this short break.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Jamie ThomasPERSON

0.99+

JamiePERSON

0.99+

Fraunhofer InstituteORGANIZATION

0.99+

GermanyLOCATION

0.99+

University of New MexicoORGANIZATION

0.99+

AccentureORGANIZATION

0.99+

Georgia TechORGANIZATION

0.99+

JPMCORGANIZATION

0.99+

First Transportation CompanyORGANIZATION

0.99+

five hoursQUANTITY

0.99+

DavePERSON

0.99+

JapanLOCATION

0.99+

AmgenORGANIZATION

0.99+

Delta AirlinesORGANIZATION

0.99+

BostonLOCATION

0.99+

Palo AltoLOCATION

0.99+

Anthem InsuranceORGANIZATION

0.99+

Monte CarloTITLE

0.99+

last yearDATE

0.99+

miR Scientific OrganizationORGANIZATION

0.99+

University of TokyoORGANIZATION

0.99+

53 qubitsQUANTITY

0.99+

Oak RidgeORGANIZATION

0.99+

last fallDATE

0.99+

YouTubeORGANIZATION

0.99+

oneQUANTITY

0.99+

COVID-19OTHER

0.99+

8000 different compoundsQUANTITY

0.99+

ENIACORGANIZATION

0.99+

over 200 papersQUANTITY

0.99+

trillions of dollarsQUANTITY

0.99+

53 qubitQUANTITY

0.99+

bothQUANTITY

0.98+

CESEVENT

0.98+

OneQUANTITY

0.98+

todayDATE

0.98+

single metricQUANTITY

0.97+

32QUANTITY

0.97+

firstQUANTITY

0.96+

FirstQUANTITY

0.96+

IBM ThinkORGANIZATION

0.95+

over 40 assetsQUANTITY

0.94+

twoQUANTITY

0.94+

IBM SystemsORGANIZATION

0.93+

over 140 billion quantum circuitsQUANTITY

0.93+

a yearQUANTITY

0.93+

last couple yearsDATE

0.92+

over 300,000 unique downloadsQUANTITY

0.92+

Oak Ridge National LaboratoryORGANIZATION

0.89+

one such metricQUANTITY

0.87+

nearly six monthsQUANTITY

0.87+

Bill Vass, AWS | AWS re:Invent 2019


 

>> Announcer: Live from Las Vegas, it's theCUBE! Covering AWS re:Invent 2019. Brought to you by Amazon Web Services and Intel. Along with it's ecosystem partners. >> Okay, welcome back everyone. It's theCUBE's live coverage here in Las Vegas for Amazon Web Series today, re:Invent 2019. It's theCUBE's seventh year covering re:Invent. Eight years they've been running this event. It gets bigger every year. It's been a great wave to ride on. I'm John Furrier, my cohost, Dave Vellante. We've been riding this wave, Dave, for years. It's so exciting, it gets bigger and more exciting. >> Lucky seven. >> This year more than ever. So much stuff is happening. It's been really exciting. I think there's a sea change happening, in terms of another wave coming. Quantum computing, big news here amongst other great tech. Our next guest is Bill Vass, VP of Technology, Storage Automation Management, part of the quantum announcement that went out. Bill, good to see you. >> Yeah, well, good to see you. Great to see you again. Thanks for having me on board. >> So, we love quantum, we talk about it all the time. My son loves it, everyone loves it. It's futuristic. It's going to crack everything. It's going to be the fastest thing in the world. Quantum supremacy. Andy referenced it in my one-on-one with him around quantum being important for Amazon. >> Yes, it is, it is. >> You guys launched it. Take us through the timing. Why, why now? >> Okay, so the Braket service, which is based on quantum notation made by Dirac, right? So we thought that was a good name for it. It provides for you the ability to do development in quantum algorithms using gate-based programming that's available, and then do simulation on classical computers, which is what we call our digital computers today now. (men chuckling) >> Yeah, it's a classic. >> These are classic computers all of a sudden right? And then, actually do execution of your algorithms on, today, three different quantum computers, one that's annealing and two-bit gate-based machines. And that gives you the ability to test them in parallel and separate from each other. In fact, last week, I was working with the team and we had two machines, an ion trap machine and an electromagnetic tunneling machine, solving the same problem and passing variables back and forth from each other, you could see the cloud watch metrics coming out, and the data was going to an S3 bucket on the output. And we do it all in a Jupiter notebook. So it was pretty amazing to see all that running together. I think it's probably the first time two different machines with two different technologies had worked together on a cloud computer, fully integrated with everything else, so it was pretty exciting. >> So, quantum supremacy has been a word kicked around. A lot of hand waving, IBM, Google. Depending on who you talk to, there's different versions. But at the end of the day, quantum is a leap in computing. >> Bill: Yes, it can be. >> It can be. It's still early days, it would be day zero. >> Yeah, well I think if you think of, we're about where computers were with tubes if you remember, if you go back that far, right, right? That's about where we are right now, where you got to kind of jiggle the tubes sometimes to get them running. >> A bug gets in there. Yeah, yeah, that bug can get in there, and all of those kind of things. >> Dave: You flip 'em off with a punch card. Yeah, yeah, so for example, a number of the machines, they run for four hours and then they come down for a half hour for calibration. And then they run for another four hours. So we're still sort of at that early stage, but you can do useful work on them. And more mature systems, like for example D-Wave, which is annealer, a little different than gate-based machines, is really quite mature, right? And so, I think as you go back and forth between these machines, the gate-based machines and annealers, you can really get a sense for what's capable today with Braket and that's what we want to do is get people to actually be able to try them out. Now, quantum supremacy is a fancy word for we did something you can't do on a classical computer, right? That's on a quantum computer for the first time. And quantum computers have the potential to exceed the processing power, especially on things like factoring and other things like that, or on Hamiltonian simulations for molecules, and those kids of things, because a quantum computer operates the way a molecule operates, right, in a lot of ways using quantum mechanics and things like that. And so, it's a fancy term for that. We don't really focus on that at Amazon. We focus on solving customer's problems. And the problem we're solving with Braket is to get them to learn it as it's evolving, and be ready for it, and continue to develop the environment. And then also offer a lot of choice. Amazon's always been big on choice. And if you look at our processing portfolio, we have AMD, Intel x86, great partners, great products from them. We have Nvidia, great partner, great products from them. But we also have our Graviton 1 and Graviton 2, and our new GPU-type chip. And those are great products, too, I've been doing a lot on those, as well. And the customer should have that choice, and with quantum computers, we're trying to do the same thing. We will have annealers, we will have ion trap machines, we will have electromagnetic machines, and others available on Braket. >> Can I ask a question on quantum if we can go back a bit? So you mentioned vacuum tubes, which was kind of funny. But the challenge there was with that, it was cooling and reliability, system downtime. What are the technical challenges with regard to quantum in terms of making it stable? >> Yeah, so some of it is on classical computers, as we call them, they have error-correction code built in. So you have, whether you know it or not, there's alpha particles that are flipping bits on your memory at all times, right? And if you don't have ECC, you'd get crashes constantly on your machine. And so, we've built in ECC, so we're trying to build the quantum computers with the proper error correction, right, to handle these things, 'cause nothing runs perfectly, you just think it's perfect because we're doing all the error correction under the covers, right? And so that needs to evolve on quantum computing. The ability to reproduce them in volume from an engineering perspective. Again, standard lithography has a yield rate, right? I mean, sometimes the yield is 40%, sometimes it's 20%, sometimes it's a really good fab and it's 80%, right? And so, you have a yield rate, as well. So, being able to do that. These machines also generally operate in a cryogenic world, that's a little bit more complicated, right? And they're also heavily affected by electromagnetic radiation, other things like that, so you have to sort of faraday cage them in some cases, and other things like that. So there's a lot that goes on there. So it's managing a physical environment like cryogenics is challenging to do well, having the fabrication to reproduce it in a new way is hard. The physics is actually, I shudder to say well understood. I would say the way the physics works is well understood, how it works is not, right? No one really knows how entanglement works, they just knows what it does, and that's understood really well, right? And so, so a lot of it is now, why we're excited about it, it's an engineering problem to solve, and we're pretty good at engineering. >> Talk about the practicality. Andy Jassy was on the record with me, quoted, said, "Quantum is very important to Amazon." >> Yes it is. >> You agree with that. He also said, "It's years out." You said that. He said, "But we want to make it practical "for customers." >> We do, we do. >> John: What is the practical thing? Is it just kicking the tires? Is it some of the things you mentioned? What's the core goal? >> So, in my opinion, we're at a point in the evolution of these quantum machines, and certainly with the work we're doing with Cal Tech and others, that the number of available cubits are starting to increase at an astronomic rate, a Moore's Law kind of of rate, right? Whether it's, no matter which machine you're looking at out there, and there's about 200 different companies building quantum computers now, and so, and they're all good technology. They've all got challenges, as well, as reproducibility, and those kind of things. And so now's a good time to start learning how to do this gate-based programming knowing that it's coming, because quantum computers, they won't replace a classical computer, so don't think that. Because there is no quantum ram, you can't run 200 petabytes of data through a quantum computer today, and those kind of things. What it can do is factoring very well, or it can do probability equations very well. It'll have affects on Monte Carlo simulations. It'll have affects specifically in material sciences where you can simulate molecules for the first time that you just can't do on classical computers. And when I say you can't do on classical computers, my quantum team always corrects me. They're like, "Well, no one has proven "that there's an algorithm you can run "on a classical computer that will do that yet," right? (men chuckle) So there may be times when you say, "Okay, I did this on a quantum computer," and you can only do it on a quantum computer. But then someone's very smart mathematician says, "Oh, I figured out how to do it on a regular computer. "You don't need a quantum computer for that." And that's constantly evolving, as well, in parallel, right? And so, and that's what's that argument between IBM and Google on quantum supremacy is that. And that's an unfortunate distraction in my opinion. What Google did was quite impressive, and if you're in the quantum world, you should be very happy with what they did. They had a very low error rate with a large number of cubits, and that's a big deal. >> Well, I just want to ask you, this industry is an arms race. But, with something like quantum where you've got 200 companies actually investing in it so early days, is collaboration maybe a model here? I mean, what do think? You mentioned Cal Tech. >> It certainly is for us because, like I said, we're going to have multiple quantum computers available, just like we collaborate with Intel, and AMD, and the other partners in that space, as well. That's sort of the nice thing about being a cloud service provider is we can give customers choice, and we can have our own innovation, plus their innovations available to customers, right? Innovation doesn't just happen in one place, right? We got a lot of smart people at Amazon, we don't invent everything, right? (Dave chuckles) >> So I got to ask you, obviously, we can take cube quantum and call it cubits, not to be confused with theCUBE video highlights. Joking aside, classical computers, will there be a classical cloud? Because this is kind of a futuristic-- >> Or you mean a quantum cloud? >> Quantum cloud, well then you get the classic cloud, you got the quantum cloud. >> Well no, they'll be together. So I think a quantum computer will be used like we used to use a math coprocessor if you like, or FPGAs are used today, right? So, you'll go along and you'll have your problem. And I'll give you a real, practical example. So let's say you had a machine with 125 cubits, okay? You could just start doing some really nice optimization algorithms on that. So imagine there's this company that ships stuff around a lot, I wonder who that could be? And they need to optimize continuously their delivery for a truck, right? And that changes all the time. Well that algorithm, if you're doing hundreds of deliveries in a truck, it's very complicated. That traveling salesman algorithm is a NP-hard problem when you do it, right? And so, what would be the fastest best path? But you got to take into account weather and traffic, so that's changing. So you might have a classical computer do those algorithms overnight for all the delivery trucks and then send them out to the trucks. The next morning they're driving around. But it takes a lot of computing power to do that, right? Well, a quantum computer can do that kind of problemistic or deterministic equation like that, not deterministic, a best-fit algorithm like that, much faster. And so, you could have it every second providing that. So your classical computer is sending out the manifests, interacting with the person, it's got the website on it. And then, it gets to the part where here's the problem to calculate, we call it a shot when you're on a quantum computer, it runs it in a few seconds that would take an hour or more. >> It's a fast job, yeah. >> And it comes right back with the result. And then it continues with it's thing, passes it to the driver. Another update occurs, (buzzing) and it's just going on all the time. So those kind of things are very practical and coming. >> I've got to ask for the younger generations, my sons super interested as I mentioned before you came on, quantum attracts the younger, smart kids coming into the workforce, engineering talent. What's the best path for someone who has an either advanced degree, or no degree, to get involved in quantum? Is there a certain advice you'd give someone? >> So the reality is, I mean, obviously having taken quantum mechanics in school and understanding the physics behind it to an extent, as much as you can understand the physics behind it, right? I think the other areas, there are programs at universities focused on quantum computing, there's a bunch of them. So, they can go into that direction. But even just regular computer science, or regular mechanical and electrical engineering are all neat. Mechanical around the cooling, and all that other stuff. Electrical, these are electrically-based machines, just like a classical computer is. And being able to code at low level is another area that's tremendously valuable right now. >> Got it. >> You mentioned best fit is coming, that use case. I mean, can you give us a sense of a timeframe? And people will say, "Oh, 10, 15, 20 years." But you're talking much sooner. >> Oh, I don't, I think it's sooner than that, I do. And it's hard for me to predict exactly when we'll have it. You can already do, with some of the annealing machines, like D- Wave, some of the best fit today, right? So it's a matter of people want to use a quantum computer because they need to do something fast, they don't care how much it costs, they need to do something fast. Or it's too expensive to do it on a classical computer, or you just can't do it at all on a classical computer. Today, there isn't much of that last one, you can't do it at all, but that's coming. As you get to around 52, 50, 52 cubits, it's very hard to simulate that on a classical computer. You're starting to reach the edge of what you can practically do on a classical computer. At about 125 cubits, you probably are at a point where you can't just simulate it anymore. >> But you're talking years, not decades, for this use case? >> Yeah, I think you're definitely talking years. I think, and you know, it's interesting, if you'd asked me two years ago how long it would take, I would've said decades. So that's how fast things are advancing right now, and I think that-- >> Yeah, and the computers just getting faster and faster. >> Yeah, but the ability to fabricate, the understanding, there's a number of architectures that are very well proven, it's just a matter of getting the error rates down, stability in place, the repeatable manufacturing in place, there's a lot of engineering problems. And engineering problems are good, we know how to do engineering problems, right? And we actually understand the physics, or at least we understand how the physics works. I won't claim that, what is it, "Spooky action at a distance," is what Einstein said for entanglement, right? And that's a core piece of this, right? And so, those are challenges, right? And that's part of the mystery of the quantum computer, I guess. >> So you're having fun? >> I am having fun, yeah. >> I mean, this is pretty intoxicating, technical problems, it's fun. >> It is. It is a lot of fun. Of course, the whole portfolio that I run over at AWS is just really a fun portfolio, between robotics, and autonomous systems, and IOT, and the advanced storage stuff that we do, and all the edge computing, and all the monitor and management systems, and all the real-time streaming. So like Kinesis Video, that's the back end for the Amazon ghost stores, and working with all that. It's a lot of fun, it really is, it's good. >> Well, Bill, we need an hour to get into that, so we may have to come up and see you, do a special story. >> Oh, definitely! >> We'd love to come up and dig in, and get a special feature program with you at some point. >> Yeah, happy to do that, happy to do that. >> Talk some robotics, some IOT, autonomous systems. >> Yeah, you can see all of it around here, we got it up and running around here, Dave. >> What a portfolio. >> Congratulations. >> Alright, thank you so much. >> Great news on the quantum. Quantum is here, quantum cloud is happening. Of course, theCUBE is going quantum. We've got a lot of cubits here. Lot of CUBE highlights, go to SiliconAngle.com. We got all the data here, we're sharing it with you. I'm John Furrier with Dave Vellante talking quantum. Want to give a shout out to Amazon Web Services and Intel for setting up this stage for us. Thanks to our sponsors, we wouldn't be able to make this happen if it wasn't for them. Thank you very much, and thanks for watching. We'll be back with more coverage after this short break. (upbeat music)

Published Date : Dec 4 2019

SUMMARY :

Brought to you by Amazon Web Services and Intel. It's so exciting, it gets bigger and more exciting. part of the quantum announcement that went out. Great to see you again. It's going to be the fastest thing in the world. You guys launched it. It provides for you the ability to do development And that gives you the ability to test them in parallel Depending on who you talk to, there's different versions. It's still early days, it would be day zero. we're about where computers were with tubes if you remember, can get in there, and all of those kind of things. And the problem we're solving with Braket But the challenge there was with that, And so that needs to evolve on quantum computing. Talk about the practicality. You agree with that. And when I say you can't do on classical computers, But, with something like quantum and the other partners in that space, as well. So I got to ask you, you get the classic cloud, you got the quantum cloud. here's the problem to calculate, we call it a shot and it's just going on all the time. quantum attracts the younger, smart kids And being able to code at low level is another area I mean, can you give us a sense of a timeframe? And it's hard for me to predict exactly when we'll have it. I think, and you know, it's interesting, Yeah, and the computers Yeah, but the ability to fabricate, the understanding, I mean, this is and the advanced storage stuff that we do, so we may have to come up and see you, and get a special feature program with you Yeah, happy to do that, Talk some robotics, some IOT, Yeah, you can see all of it We got all the data here, we're sharing it with you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

IBMORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Amazon Web ServicesORGANIZATION

0.99+

two machinesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

Cal TechORGANIZATION

0.99+

AMDORGANIZATION

0.99+

AndyPERSON

0.99+

BillPERSON

0.99+

Andy JassyPERSON

0.99+

EinsteinPERSON

0.99+

John FurrierPERSON

0.99+

40%QUANTITY

0.99+

DavePERSON

0.99+

Bill VassPERSON

0.99+

GoogleORGANIZATION

0.99+

20%QUANTITY

0.99+

NvidiaORGANIZATION

0.99+

IntelORGANIZATION

0.99+

80%QUANTITY

0.99+

last weekDATE

0.99+

AWSORGANIZATION

0.99+

an hourQUANTITY

0.99+

four hoursQUANTITY

0.99+

200 companiesQUANTITY

0.99+

10QUANTITY

0.99+

Las VegasLOCATION

0.99+

two-bitQUANTITY

0.99+

15QUANTITY

0.99+

TodayDATE

0.99+

125 cubitsQUANTITY

0.99+

200 petabytesQUANTITY

0.99+

20 yearsQUANTITY

0.99+

two different machinesQUANTITY

0.99+

oneQUANTITY

0.99+

50QUANTITY

0.99+

two different technologiesQUANTITY

0.99+

Eight yearsQUANTITY

0.98+

first timeQUANTITY

0.98+

Monte CarloTITLE

0.98+

todayDATE

0.98+

two years agoDATE

0.98+

52 cubitsQUANTITY

0.97+

BraketORGANIZATION

0.97+

x86COMMERCIAL_ITEM

0.97+

This yearDATE

0.96+

next morningDATE

0.96+

about 125 cubitsQUANTITY

0.95+

Graviton 1COMMERCIAL_ITEM

0.95+

DiracORGANIZATION

0.95+

Graviton 2COMMERCIAL_ITEM

0.94+

about 200 different companiesQUANTITY

0.93+

three different quantum computersQUANTITY

0.93+

Moore's LawTITLE

0.91+

seventh yearQUANTITY

0.9+

decadesQUANTITY

0.87+

secondsQUANTITY

0.86+

every secondQUANTITY

0.85+

re:EVENT

0.82+

half hourQUANTITY

0.81+

Day 2 Wrap Up | Pure Accelerate 2019


 

>> from Austin, Texas. It's Theo Cube covering your storage accelerate 2019. Brought to you by pure storage. >> Welcome back to the Q. We are wrapping up day two of two days of coverage. We're getting some applause. I'm pretty sure that's for us. At pure accelerate. 2019. Lisa Martin flanked by two gents Day Volante and Justin Warren. You probably know Justin, who's been on the Cube many times and less. Chief analyst. A pivot. Nine. Justin. You have been covering this event and well as an independent, so we want to get your take on this two days. We've had our 1st 2 day for the Cube covering pier storage. We've spoken with lots of people, cause Charlie kicks. I'm sure there's more nicknames that I'm forgetting customers. Partners. Dave. Let's do a quick recap of some of the trends and the themes that we've heard the last couple days. And then we'll get some independent analysis. Justin on Not just what you've heard the last three days, starting with a tech field day, but also just your history of covering and working with here. >> Well, so for my sample, its story of growth they even started pure starts all the press releases with the only company that's growing on the growth storage company. The growth in the first. So so this growth is a financial story there. Um pure is going for growth, the markets rewarding growth right now. So it's smart, double down on growth. That might change at some point on. We talked about Charlie Jean Carlo about this, and they'll decide what what they do at that point time. But But from a financial standpoint, growing fast, uh, like their balance sheet, be interesting to see if they can leverage it. Maur. But maybe they're using it for Optionality. They'll do 1.7 this fiscal year. 1.7 billion. That's good. They got 70% gross margins. It a little bit of free cash flow. Not much because they pour it back into the business. So story a growth that's number 12 was differentiation. Um, I think it it's pretty clear that their products are differentiated from the sort of big portfolio companies. I mean, it's it shows up in the numbers and the income statement, and it shows up when you talk to customers simplicity, the whole A P I thing. I guess the third is products. I mean, they're embracing the cloud, which is kind of interesting. I don't think they're gonna do a ton of business with block storage for AWS, but it's an interesting hedge, and I think it's really cool from an engineering standpoint on, I think you know, two other things. Culture but orange. They're different, They're cool. They're hip and customers, which at the end of the day, that's where the rubber meets the road. Customers happy you talk to companies are customers of companies like pure service now Splunk Nutanix >> Uh uh, >> and some others. And they're happy. They love it. It's transforming their business. Snowflake is another one. Really? How come you AI path is another one? These are the hottest companies in the business right now, and you can tell when you talk to their customers is good story >> and their customers articulate their differentiation for them pretty darn while what? You know, we've spoken to a number. I think four or five customers the last couple of days, and they're not talking about Flash Ray flash blade X M flashback. They're talking about their business and how the I T is benefiting from that and how the business is benefiting from that. You also see piers very vibrant culture being embraced organically by their customers. There's plenty of customers walking around and the brightest orange I think I've even seen here. So there they're differentiation. Their culture, their customer experience and their ability to really differentiate three that are were loud and clear for what I heard through the voice of the customer and the partners, Frankly, as well. >> So I guess, Justin, I mean, the other pieces Tam expansion 1st 10 years, Cloud New Way I workloads partnerships with backup companies growing. The Tim I've said the 1st 10 years is probably gonna be easier, and I know that's a terrible thing to say, but don't hate me for saying it pure. But then the next 10 because they're up against the flat footed E. M. C. That was getting pounded by Elliott management with pressures to go private, trying to hang on to its legacy business and then got acquired and distracted by Del. So that was a really tailwind for Pierre. Now it's like Cloud guys got their act together, you know? Aye, aye. Everybody's doing A s. Oh, so they get some challenges. But what's your take? I think I've >> still got an advantage. Talking to some customers, 11 in particular was quite clear. That they saw pure is having at least a 2 to 3 lead through 2 to 3 year lead on the technology from some of their competitors. So they shopped around and they had a look at some of his competitors, and they thought that actually they were trying to sell me technology that's 234 years old and they quite from them, was that this is something that I could do myself, so they clearly see that pure provides them with something that they can't do themselves. So pure has an advantage there. I also think that the way that the market is changing advantage is pure, a little bit as well. So you mentioned Cloud there, Dave and I think that we've all seen that people have realized that multi cloud is a thing and that not every workload is going to go to the cloud. A lot of it is going to stay on Prem, so now that that's kind of allowed, people are allowed to talk about that, That there are CEOs who would have been being pressured by boards and so on to say we have to go all in on the cloud. Now they can come back to them and say, Well, actually, weaken, stay on side. That means that we should be looking at some of these onside products, like pure so that we can go on put in storage. A race in a data center may not be our Dana Senate might be in Coehlo, but we have this on site method of doing things. Not everything has to go to the cloud. So I think that will help them with some of the growth. >> So I'm left thinking, What would Andy say? Okay to >> be >> It's the number one hottest company, you know, notwithstanding some smaller companies right now, cos moving the market is a W s obviously Microsoft with the trillion dollar valuation. But Amazon, to me, is the benchmark it. So I feel like Jassy would say, Well, so Hey, Andy, you've acknowledged hybrid, you know? Actually, yeah, I guess he uses that word. Um, and you're doing some stuff one prim, but I think he would say we still believe that the vast majority of workloads are gonna land in the public cloud. And what you just said is what everybody else believes. And to me, they're in conflict and I don't necessarily have the answer. But you got the big gorilla. Now the big claw gorilla is moving. The markets say with one philosophy and they've made some good calls and the entire i t industry. Yeah, the other the inspector. >> Except that AWS has outpost have a product that actually sits on site. And they did. And Jesse last year said that he did say that the boat inward, multi cloud, >> you know, So, uh, sorry. Used the word multi cloud used hybrid hybrid cloud. They don't say that. That's for Boden, but no. But my point is they've acknowledged hybrid, which they never used to talk about hybrid. So they capitulated there The end where capitulated on their claws on its cloud strategy. But he has not capitulated on the belief the firm belief that most workloads are gonna be in the club. I'm not sure he's wrong. >> That may be true, but on what Time horizon? So that's not going to happen next year. But I >> think for sure, >> I pointed out that the agile manifesto came out in 2001. That's 18 years ago. Not every shop is doing software in agile, so enterprises take a long time to change, so there's plenty of room for pure to grow. While that changes going on, even if it if it does go all their own cloud, it's gonna take a long time to get there. And people can make plenty of money in the meantime. >> But I believe you're sorry. I believe pure is growing in what is a crappy market. Yeah, I think the storage market is a crap market right now. It's one that's very difficult. The leader Deli emcees growing at 0%. And that's a goodness because they're gaining share. Ned ABS down last quarter, not minus 16% IBM, minus 21% hp thrilled with whatever 3% or whatever. They're at a minus three. I can't remember now. Here is the only one that showing any substantive growth on my premises there, doing that by having a superior product and business model, and they're stealing share. So and then I ask you this. I I believe in hybrid, by the way. But I'm just playing kind of devil's advocate here. Cloud is growing and it's consistently growing and everybody talks about repatriation. You don't see it in the numbers. Every talks about the large of the law of large numbers like in other words, they hit a wall. You don't see that in the numbers. What you see is the traditional IittIe spaces flattish. The new stuff that they're all developing is not growing fast enough to offset the old stuff. You see that? Certainly. See that IBM. You see that now? Adele, even though they had good bounce back last year. But now you're seeing that Adele Oracle ekes out 1% growth. So the big, uh, legacy companies are growing there, hanging on there, throwing off tons of cash. They got good, strong balance sheets, maybe taking on some cheap debt. But the cloud continues to grow at a pace that I think it's stealing share from traditional I t. >> That's that's a reasonable sort of announcing something. Yeah, whether or not we'll see an increase in growth of onsite, particularly things like EJ computing way, maybe you need Thio redefine what we think of as a data center, and maybe we're not thinking about a broad enough market. I actually think that a lot of those workloads that we would traditionally have said would go on site and cola. I don't think Cola Data Center is actually growing all that much, but I think we are going to see growth in things like EJ. >> So that's a really great point I want. I want to come back to that. But the big question is, then okay. Can cloud be before we get ahead, you can cloud be a tail wind for pure. They've embraced it. 20 years ago, the leaders of a company would say, Oh, no, it's cloud his crap about a peace Caesar of toys You remember that pure embracing cloud, I think, is impressive only from an engineering perspective but business model. So can they make in your opinion cloud a tail wind and an opportunity? Maybe that's where Multi Cloud comes in. >> Yeah, it's tricky. I think it will become more of an advantage once good things like kubernetes and containers matures a bit further and people are used to being able to deploy things in that way, both in Cloud and on site. I think that that's the portability play, and it's more about making onsite more cloudy rather than making the cloud more enterprising, which I think was one of the messages that we had here. Because enterprises a lot of what yours messaging so far. And it's product development, particularly around cloud block stories, to make the cloud look more like an enterprise. Where's what we actually needed it to go the other way. Pure is doing things in that in that regard with pure storage optimizer, which which takes a lot of the decision making a way that from the way you would normally do things on side the way we've gotten used to it, manually configuring things, it's actually turning it into software on just letting computers handle it. That integration with things like the M, where is making things operate a lot more like cloud? So once enterprises become used to operating on a lot more like clouds, I think that's going to be an advantage for pure. To be ableto have that operations be in cloud and then they'll bring in products into in time for that to happen. >> You have the opportunity just in a couple days ago to tend the technical field day, the TFT that pure dead. So you got that double click the day before all the press releases broke about. Some of you know, we talked about the expansion into cloud with aws Maur, their portfolio delivered as a service. The aye aye data hub. But if we look at one of the things that stuck out today was differentiation. We've talked about that a number of levels in the last minutes. But talk to us about the technical differentiation that you've not only heard this week from pure, but that you've been engaging with them for years. You have an interesting story of Of John Cosgrove caused their CEO and founder really describing something very unique. That seems to be quite a technical level of differentiation that you even said We don't see this from a lot of their competitors. Give us a little snapshot of that. >> Yeah, you don't sort of get that level of detail in some of the briefings as well. So it was another tech Field day event some years ago on was talking about flash array and we sat in a room, and they had a flash array in front of us, and I think they were talking about the newest kind of flash they were putting into this. But they described some of the technical decisions they made about the architecture inside the blade. So at that time, and I hope I'm getting all these details correct, they had designed and asic, so to go in front, off the flash so that they could essentially create a layer above above the flash that they could speak to within their software. That meant that it didn't matter which flash foundry they bought it from, because it's slut. There are certain differences around the way that flash works, and they do address the flash directly, unlike buying SS D's and putting them inside the box. So that gives them a performance advantage because you don't have a whole bunch of software translation going on to get into the flash. But that decision meant that they could then change flash foundry without changing the experience off the awful. The software developers up the stack inside their array, so that meant that there cadence of being able to bring out new products and gradually dropped down the cost of the supply of flash, which makes up a large amount of the calls on these particular devices. It provided them with better options so they could maintain, maintain optionality essentially and be very, very flexible and react to the things that they can't predict. So Charlie mentioned in the briefing yesterday that you know, in this industry, you might get a 20% drop in the cost of flash in one month, which will then affect them their revenues in coming months after that, because clearly they want to pass on some of those cost drops to customers. But it needs to be done a certain, more manage way. When you have that kind of dynamic behavior happening in the market, being able to react to that well in something where the hardware design time can be 18 months to two years, building that into your product so that it then provides you with business options as a technology, that's a really impressive way of thinking about how all the different pieces of your company have to interact with each other. So it's not just about the technology, it's about the business and the technology working hand in hand, >> and those lower flash prices should open up new markets for them. Flash a racy I think they call it, is still not at the price of hybrid, I wouldn't think, although they saying it will be. Hybrid arrays are priced around 70 60 70 cents a gigabyte today is according Thio Gardner analysis. Big >> Challenge with hybrid of rays Which flat, which flash around flash or a C wouldn't actually wouldn't have? This problem is the reliability of the Leighton see and predictability. So with an old flash array, you don't get Layton. See sparks if you suddenly exhaust the amount of flash that you have in a hybrid of rain that has to go back to the disk. So if you need that predictable performance, that's why people have gone with flesh arrayed very beginning, absolutely getting that as a capacity tear. I think that provides a lot of reliability, for particularly when you've got large amounts of data need to write flesh >> and the price is coming down and it's maybe it's double now on a per gigabyte basis, that'll come down further. But I welcome back to EJ because I think you bring up a good point And we didn't Thankfully, here a ton about EJ. I think we heard anything about EJ at this show. We didn't get inundated with edge, which we always do with these big shows. And I'm happy about that because I think that that a lot of the companies that we re attend I think they got it wrong. They're taking a box and they're throwing over the fence to trying to do a top down model to the edge. Hey, here's a server or here's a storage device and we're gonna put it at the edge. It's like, OK, well, I think the edge is going to evolve as a software development. You know, play not isn't over. The top is gonna be bottoms up innovation. Now, I don't know question about you know, whether Amazon at the edge vm wear at the edge. Um, but I don't see any traditional i t companies crushing it at the edge there talking about it. They're trying to build out ecosystems, and but nobody's has meaningful revenue today at the edge. But it's a new way to think about this. Distributed massive compute engine >> on. I think we'll start to see that mature as people start to bring out products that actually do operated the way heard from Nvidia about some of their ideas that they have about doing a I processing at the edge for things like image recognition systems, where you train your model on leg large data sets in a cloud or in a data center. And then you shoot those models out two devices that operate on a smaller data set. But for a lot of these things, you need to do data collection at the edge. So Formula One is a classic example based given for the F one racing team is an I. O. T. Company that is connected to a nail and analytics company. Really? >> Yeah, that's right. We did hear about EJ and that an actual use case is in college edge, so there's going to be >> a lot more of that. We have things like sensors are just all over the place, so you know, in anything in retail, if you have fridges in retail and you need to monitor the sensors in those to find out whether or not is the temperature going out out of control or outside of your control limits because that will affect the food that's in that. There's a whole bunch of kind of boring examples that are actually all I OT. So I think some of those will start to push more data into into devices at the edge. And as people's understanding of how to use machine learning and I matures away from the hype, I think we're pretty peak hype at the moment. Once we do actually drop that back a notch and we see that people they're doing really use riel riel world use cases with real world business value that will start to drive a lot more of the growth of practical. And that will drive growth in data, which will need to get close throughout the weather's device. >> I think you're right. I think that date is gonna be at the edge of a lot of that data. I would say most of that date is going to stay at the edge. It's probably it's not gonna says it. Probably it's definitely not going to sit in a million dollar storage array, and it's gonna comprise a lot of alternative processing arm, Uh, GP use versus conventional microprocessors. So >> and that's where I think he was thinking about, like the white pure One works, for example, pure. One works the same no matter what products you have from pure, and they have been very clear in stating that they want to make sure that when they bring out a new array or a new product, it works with pure one. So it's that consistency of experience for their customers, which I think is fairly unique in the industry, is a lot of other products that will come out. And they only partly supported, not full support for their entire race tagging. AMC struggle with that for a long time simply because it has so many products and needed to kill a whole bunch of them first. So when when you have that kind of engineering discipline built built into your company, when you go out and you have customers who have edge devices or you have stuff in the cloud and they have devices on their phones which they used to showing off a conference and say, Hey, come and have a look at my array, it runs on software on my phone that's pure one that software ability that pure has of being able to address this data wherever it is. I think >> there's >> a real opportunity for pure that put that kind of intelligence on to age things. Even if they don't actually sell any flash a raise to those people, they could start to sell them software. >> All right, guys. So 15 seconds each since arose at a time. Computers competitive. Positioning your thoughts in a quick summary about what you've heard the last few days and what Justin has >> to me if I would expect continued growth, forgetting about the macro for a moment, even in gonna grow faster than the market place. Um and yeah, they said they don't throw off as much cash as the big guys. So it's gonna be a game of the big guys do in stock buybacks, free cash flow and pure storage. Investing in growth. >> Excellent. Justin. >> Yes, I agree. I think they're going to double down on the R and D spend to make sure that they maintain a technological advantage over their competitors. The biggest risk of pure is if the other players, you know, the deal emcee other plays in that big online storage market. If they actually get their act together and start bringing out competitive products, that's the biggest threat to fuel. But pure has a big lead on them. I would say, >> Yeah, I think the last thing cloud, you know, kind of a question Mark. And I think the m where to me, Del. Of course I care about storage is huge business for them. They're all above the M where and to the extent that they can leverage VM where, you know, as a competitive weapon, they'll use it against anybody you know. Damn the ecosystem. >> Excellent. Well, thanks, guys, for a great wrap up to our two days here for Justin Warren and Day Volante. I'm Lisa Martin. Thank you for watching the cubes. Coverage of pure accelerate 2019.

Published Date : Sep 18 2019

SUMMARY :

Brought to you by pure storage. Let's do a quick recap of some of the I don't think they're gonna do a ton of business with block storage These are the hottest companies in the business right now, and you can tell when you talk to the brightest orange I think I've even seen here. So I guess, Justin, I mean, the other pieces Tam expansion 1st 10 years, So I think that will help them with some of the growth. It's the number one hottest company, you know, notwithstanding some smaller companies right And they did. But he has not capitulated on the belief So that's not going to happen next year. I pointed out that the agile manifesto came out in 2001. But the cloud continues to grow at a pace that I I actually think that a lot of those workloads that we would traditionally have said But the big question is, then okay. a lot of the decision making a way that from the way you would normally do things on side the way we've gotten used to You have the opportunity just in a couple days ago to tend the technical field day, So Charlie mentioned in the briefing yesterday that you know, in this industry, Flash a racy I think they call it, is still not at the price of hybrid, So if you need that predictable performance, over the fence to trying to do a top down model to the edge. And then you shoot those models out two devices that operate on a smaller data set. so there's going to be So I think some of those will start to push more data into into devices at the edge. I think that date is gonna be at the edge of a lot of that data. So it's that consistency of experience for their customers, which I think is fairly unique in the industry, a real opportunity for pure that put that kind of intelligence on to age So 15 seconds each since arose at a time. So it's gonna be a game of the big guys do in stock Excellent. and start bringing out competitive products, that's the biggest threat to fuel. to the extent that they can leverage VM where, you know, as a competitive weapon, Thank you for watching the cubes.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

JessePERSON

0.99+

DavePERSON

0.99+

JustinPERSON

0.99+

AmazonORGANIZATION

0.99+

AndyPERSON

0.99+

MicrosoftORGANIZATION

0.99+

IBMORGANIZATION

0.99+

2001DATE

0.99+

NvidiaORGANIZATION

0.99+

20%QUANTITY

0.99+

2QUANTITY

0.99+

1.7 billionQUANTITY

0.99+

18 monthsQUANTITY

0.99+

two daysQUANTITY

0.99+

Justin WarrenPERSON

0.99+

70%QUANTITY

0.99+

fourQUANTITY

0.99+

CharliePERSON

0.99+

next yearDATE

0.99+

one monthQUANTITY

0.99+

two devicesQUANTITY

0.99+

two yearsQUANTITY

0.99+

AWSORGANIZATION

0.99+

last yearDATE

0.99+

JassyPERSON

0.99+

2019DATE

0.99+

Thio GardnerPERSON

0.99+

Austin, TexasLOCATION

0.99+

Cola Data CenterORGANIZATION

0.99+

PierrePERSON

0.99+

yesterdayDATE

0.99+

3%QUANTITY

0.99+

threeQUANTITY

0.99+

oneQUANTITY

0.99+

AMCORGANIZATION

0.99+

last quarterDATE

0.99+

OracleORGANIZATION

0.99+

3 yearQUANTITY

0.99+

thirdQUANTITY

0.99+

MarkPERSON

0.99+

18 years agoDATE

0.99+

NineQUANTITY

0.98+

1st 2 dayQUANTITY

0.98+

five customersQUANTITY

0.98+

11QUANTITY

0.98+

firstQUANTITY

0.98+

1.7QUANTITY

0.98+

todayDATE

0.98+

AdelePERSON

0.98+

DelPERSON

0.98+

1st 10 yearsQUANTITY

0.97+

20 years agoDATE

0.97+

0%QUANTITY

0.97+

Dana SenateORGANIZATION

0.97+

F oneORGANIZATION

0.97+

LeightonORGANIZATION

0.96+

pureORGANIZATION

0.96+

minus threeQUANTITY

0.96+

this weekDATE

0.96+

Splunk NutanixORGANIZATION

0.95+

CoehloLOCATION

0.95+

couple days agoDATE

0.95+

agileTITLE

0.95+

two gentsQUANTITY

0.95+

minus 21%QUANTITY

0.94+

bothQUANTITY

0.94+

EJORGANIZATION

0.94+

around 70 60 70 centsQUANTITY

0.93+

234 years oldQUANTITY

0.92+

TimPERSON

0.91+

Day VolantePERSON

0.91+

day twoQUANTITY

0.9+

Day 1 Kick-off | Pure Accelerate 2019


 

>> from Austin, Texas. It's Theo Cube, covering your storage. Accelerate 2019. Brought to you by pure storage. >> Welcome to Austin, Texas. This is the Cube. Live at the fourth annual pure accelerate. I'm Lisa Martin with David, Dante, Dave or in Texas, >> Texas again. >> Austin, Texas. Very interesting venue for this fourth annual hear stories. >> A lot of construction, >> music, a >> lot of music. >> So we just came from the keynote and news announcements, customers on stage. But the first thing to point out is, this is here is about to celebrate their 10th anniversary. Charlie Giancarlo, CEO and chairman who's coming on the program with us, and just a few minutes talking about what they have innovated and delivered these 10 X improvements and 10 years kind of this overnight success in 10 years and what's coming? What was with the things that really stuck out at you, Nicky Note. >> Well, first of all, ironically, this is the 10th year of the Cube, not our 10th anniversary, but it's the 10th year of doing the Cube. And so our fourth year, I think it's pure accelerate about what 3000 people here, >> you know, the keynotes >> pure was laying out what their vision is of the modern data experience and that I felt like the keynotes least there were sort of, ah, speed date of what's coming. There was a couple of major announcements that we'll talk about, >> Uh, but >> they really are trying to differentiate as the modern storage company turn a deep position. The competition, as the old guard is to use this term that Andy Jassy uses pure, didn't use that term. But they really talked about it's time to go Modern. And so they were an overnight success. It took him 10 years, was one of the comments that was on stage. So I think this is worth pointing out. A couple of things. I mean, let me lay out. Sort of my thoughts on Pure is a company. They were the only storage company Ah, in the past. Let's call a decade to reach what I'll call escape velocity. They achieved a billion dollars a couple years ago. They're doing their due about a billion and 1/2 on a trailing 12 month basis. They'll do 1.7 billion this year and evaluations about 4.5 billion. So they got a a three ex valuation in that fluctuates. That's pretty good for a storage company. Billy on Lee major storage company. That's really growing rapidly. They got 28% growth. I did a breaking analysis on Lincoln, and I'll just share with you some of the numbers. Dallas flat at 0%. So Del is actually gaining share with no growth has got a scary NetApp minus 16% in the quarter H P E minus 3% IBM minus 21%. And so it is pure A 28%. So they're really crushing it in terms of growth. They've also got a 69% gross gross margin, even if it's in its heyday. E emcees gross margins weren't that high, you know. They were in the sort of mid sixties, and so, and they've also got a good balance sheet. About a billion dollars in cash A little. A little more than that, they got some debt. They're shifting their model to a deferred revenue model. Now the only thing is, you know they're growing much, much faster than the competition. But they're throwing off a lot less cash because they're much smaller. Just as an example, they probably throw off 5 to 6% of their revenues in cash. Netapp probably throws about 23% of its revenues, often catch the big Delta there, so the point is long winded. But but pure storage is in growth mode. And until the market rewards more consistent with a cash flow, they're gonna, I think, stay in huge growth mode. >> There was a great analysis. Dave and I saw an analysis that you did with some spends data, just a couple of your reverence. A little bit of that. There's there seems to be a tailwind behind here you mention the 28% wrote that they announced in Q two, and some of the things that also they talked about were there. Adding about in Q two of F Y 2020 about seven net new customers every business day, adding about 450 new customers just in that quarter. Like you said, 3000 folks expected here today. The momentum is behind them, but they're also a company of firsts. You talked about this a number of times. The first, with all flashed the first with envy me on the back and a couple of additional firsts announced today. Talk about the as a service model and how that youth, in your opinion, you think might continue that trajectory that they're on. >> Yes, so basically pure laid out today, said that vast majority are Pouliot Portfolio is gonna be available as a service. That's the cloud consumption mall is important because pure has about $600 million in deferred revenue, largely coming from their evergreen service. But there they are, slowly shifting their model to a subscription model. It's gonna be very interesting to see how that plays out. Um, we've seen a number of companies do a tableau in Adobe kind of pulled the band Aid off and did it Splunk has taken years to do. It will be interesting to see how how pure goes. For that. I'll >> bring it >> back to the cloud up yours largely an on Prem storage company. That's where most of the revenues come from. But we heard the gentleman from Amazon today. I think it was E ethan whiner, not Ethan, anyway, Mr Whiner, he said. That gardener did A survey last year showed 88% of customers said they have a cloud for a strategy, but 86% of those customers continue to spend on prim. So here you have the cloud. Amazon gorilla wants everybody to go to the cloud pure would much rather they make much more money on Prem? But they realize customers air pulling them in. So they have to move to that as a service model. One of the interesting things that pure is done, which, you know, that's not really a first. But it certainly is for the large storage companies they've announced. Ah, block storage on AWS. So basically what they're doing is they're taking the pure experience. It all looks like pure software, and they're front ending cheap s3 storage from Amazon with E. C. To compute instances, and they've architected using Amazon service. Is this basically a block storage array in the cloud so Amazon gets paid, pure, gets paid? It's a little bit of a premium, but you get higher availability. You get great right performance and you get the pure cloud experience pretty interesting strategy, >> and they're talking about it really as this. This positioning it rather as a bridge, a bridge to hybrid cloud. This numbers that the Amazon gentlemen, share that you mentioned Gardner were really interesting both sides recognizing there's a forcing function there and that forcing function is the customers from the enterprise to the small business who need to have data available immediately wherever it is people to extract this insights from it quickly so that those companies, whether it's a capital one or a Delta Airlines or a smaller organization, can act on it quickly to Dr Competitive Advantage. Same kind of challenge that your storage has. But really that forcing function of the customer, clearly bringing the giant AWS together with yet another story >> so pure as they say reached escape velocity. They and Nutanix were the only on a new entrance that reached a billion dollars Nutanix. I really don't consider a storage company. They're kind of hyper converged. And the way they did that as they drove a truck through E emcees install base with flash. So they were the first within all flash array. Maybe maybe they weren't the first, but they were the first to really drive it. They hired a bunch of DMC sales reps. They knew where all the skeletons were buried and they really took out a lot of old Symmetric Se's and Claire eons and V. Max is and all the old sort of GMC install base, and that helped them catapult their way there 1st 10 years. Now they got to do that again. They got to get to get They're on their way to two billion. But how did they get to five billion? Um, and and so the way they do that is they have to expand their tam. I mean, we'll talk to Charlie Jean Carlo about this. My feeling is a big job of the CEO is to expand the Tamil. How do they do that? They go after new workloads like a i. They go for cloud. They go from multi cloud. These are all very large markets in which they don't participate. Data protection. They'll partner with Lex, Kohi City and Rubric and Beam to to have data protection software running on their flash. A raise with very, very fast restores. That's something that's taking off. It's gonna be really interested in seeing as they say, they've got this subscription model that's coming in. They've got all this deferred revenue that in a way, it's going to slow him down a little bit just from an accounting standpoint, cause when you recognize deferred revenue, you recognize that, you know over 12 months over 36 months, so that's a little bit of a transition. The other thing that pure is facing in a tactical basis is Nande pricing. It's like this countervailing effects nan pricing is coming down, which means lower prices, lower costs but also lower revenue. But at the same time, it becomes more competitive with spinning disk. This is something else. We'll talk to Charlie Jean. Cholera right about it opens up new markets. So this tam expansion is critical for pure in terms of driving this modern data experience into these new workloads and fighting the competition, the competition is not sitting still. All those companies that I mentioned the H P ease, the the Delhi emcees, et cetera, are basically taking a page out of your swords narrative, talking about the cloud experience, talking about, you know, flexible pricing models, building cloud products on prime and hybrid cloud and multi cloud. So it's hard sometimes for customers to squint through that. And really, no, I guess the bottom line, the last thing I'll say is pure. Doesn't have as many feet on the street is these other guys. So it's gotta leverage the channel increasingly, and that's how it gets beyond two billion on its way to five billion. >> And that was one of the factors that they attributed the second quarter. 28% year on year growth is to not just innovation, but also to the channel. So they've done a good job of really pivoting. There's large enterprise deals to be covered, direct and then bringing in the channel for those smaller mid size business customers. Adding a lot of momentum in cute to you mentioned the nan pricing that in some of the political climate with the start of China, most of their businesses in the Americas so they're not facing as many of those challenges. So they did lower guidance for the rest of it is >> the second time they've >> lowered 20. However, they kind of attributed that thio the nan supply oversupply and they say happy Matt to flatten out quickly, say they're >> not worried about the macro. I mean, look, if if the economy is good and is booming and people are spending money on cap ex. That's good for even a high growth company. They're basically positioning to the street that if if the economy does turn down and there's a softness at the macro, they'll actually gain share more rapidly. Which, by the way, is probably true. But look at the rising tide lifts all boats. Nobody wants to see Ah recession. Having said that, well, it's interesting. When you saw Pure Lower, its guidance stock took a hit, and then net app, I'd be him. All these other company you have to see a deli emcee they announced in the market said, Wow, pure must be doing really well compared to these other guys. So it's come back in a big way. My opinion pure is going to in the e. T. Our data shows this from a spending intentions Pure is going to continue to gain share at a much, much more rapid pace of the other. The other guys, from a product standpoint, delicacies consolidating its product portfolio, trying to lower its cost. H. P E is really focused on limbo. IBM needs a mainframe product cycle to get back going, Ned APS facing its challenges and its kind of tweaking its go to market model. So all these other companies air dealing with sort of some structural changes. Where is pure is like put the put the foot on the gas and accelerate no pun intended. And so I think they're gonna continue to gain share for quite quite a number of quarters. >> I want to talk about sustainability before we break. And one of the things that Charlie talked about on his keynote is in terms of the modern data experience, he said. It was three things. It was simple, seamless and sustainable, an inch sustainable. You really started talking about the evergreen model that they launched a while ago that seems to be really sticky with organizations. He also talked about sustainability is a lot of other organization I need to adjust in terms of, you know, waste and carbon emissions and things like that. But I'm just curious, since Pierre is much smaller than the competitors that you mentioned and a lot more focus, obviously all in on flash. Where does the evergreen model, in your opinion, give them that tail winter? That advantage? >> Well, the Evergreen model was first of all brilliant marketing strategy and a business strategy Because if you think about the traditional storage vendors, they make so much money on maintenance, they would never have done this unless pure force them to do it. Because they're making so much cash on the maintenance. You know, it's it's you. You put the storage array in and we're just gonna charge you maintenance. And if you're not on the maintenance contract, sorry. You don't get all the software upgrades, everything else. So it's just this, you know, this lock in strategy, which is work brilliantly for two decades pure, comes along and says, Hey, where? Software driven. We're gonna allow you to get all the modern software. As long as you're got a subscription with us, we'll swap out your controller for free. You know, the competitors hate that. There's all kinds of nuances and stuff, but it worked, and customers love it. And so it's very strong, and it's a fundamental as they said, they got $600 million in deferred revenue, largely from that evergreen model. So they, you know, Charlie mentioned first for non disruptive upgrades. First for cloud management, first for a I ops first for always on que Os first with always on encryption, and if they're really the first, we're probably the first big company. They got a lot of attention there. Last thing, it's it's a four big announcements today. There's a I ready infrastructure, airy. They're doing some stuff they were first to announce with video. You know, a year or so ago, they got cloud offerings. Ah, block storage for AWS. And they've got clout Snap for Azure, which is actually pretty hot. It's backup on Azure, and they got product extensions. They got cheaper flash with a flash or a C for capacity. And then they have extended their all flashy raise their flash played etcetera with storage class, memory and and storage memory. And in this, this as a service model. Those are really the four big announcements that were gonna dig into all this week. >> We are, and we're gonna be talking with This is a great event. Two days. The cube is going to be here. We have seven pure customers to talk to you that I think kind of a record, at least in my cube experience of the last >> AWS always puts a lot of customers up too. You know. All >> right, well, there's no better validation than the success of a brand, whether we're talking about Evergreen or their first or the reaction of the market to bringing flash down to satya prices. So excited to dig into customer stories with you, Dave. Course we'll talk to some partners who got c'mon slung Cisco somebody else and probably forgetting. And, of course, some of the pure, exactly gonna be exciting two days with you and looking for two days >> looking forward to at least a great >> all right stick around. Dave and I will be right back with our first guest, Charlie Giancarlo, chairman and CEO of Pier Storage. Stick around, come back Mawston in just a minute.

Published Date : Sep 17 2019

SUMMARY :

Brought to you by This is the Cube. But the first thing to point out is, this is here is about to celebrate their the Cube. I felt like the keynotes least there were sort of, ah, speed date of what's coming. The competition, as the old guard is to use this term Dave and I saw an analysis that you did with some spends data, That's the cloud consumption mall is important because pure has about $600 million So they have to move to that as a service model. This numbers that the Amazon gentlemen, share that you mentioned Gardner were really interesting both My feeling is a big job of the CEO is to expand the Tamil. Adding a lot of momentum in cute to you mentioned the and they say happy Matt to flatten out quickly, say they're Where is pure is like put the put the foot on the gas and accelerate no You really started talking about the evergreen model that they launched a while ago that seems to be really sticky You put the storage array in and we're just gonna charge you maintenance. We have seven pure customers to talk to you that I think kind of a record, You know. of course, some of the pure, exactly gonna be exciting two days with you and looking for two days Dave and I will be right back with our first guest, Charlie Giancarlo,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Charlie GiancarloPERSON

0.99+

Lisa MartinPERSON

0.99+

TexasLOCATION

0.99+

CharliePERSON

0.99+

Andy JassyPERSON

0.99+

5QUANTITY

0.99+

$600 millionQUANTITY

0.99+

AmazonORGANIZATION

0.99+

1.7 billionQUANTITY

0.99+

five billionQUANTITY

0.99+

IBMORGANIZATION

0.99+

WhinerPERSON

0.99+

86%QUANTITY

0.99+

two billionQUANTITY

0.99+

10 yearsQUANTITY

0.99+

Two daysQUANTITY

0.99+

DavidPERSON

0.99+

AWSORGANIZATION

0.99+

28%QUANTITY

0.99+

69%QUANTITY

0.99+

Delta AirlinesORGANIZATION

0.99+

oneQUANTITY

0.99+

Austin, TexasLOCATION

0.99+

fourth yearQUANTITY

0.99+

10th yearQUANTITY

0.99+

12 monthQUANTITY

0.99+

last yearDATE

0.99+

NutanixORGANIZATION

0.99+

AmericasLOCATION

0.99+

two daysQUANTITY

0.99+

GMCORGANIZATION

0.99+

3000 folksQUANTITY

0.99+

sevenQUANTITY

0.99+

DMCORGANIZATION

0.99+

20QUANTITY

0.99+

2019DATE

0.99+

10 XQUANTITY

0.99+

CiscoORGANIZATION

0.99+

first guestQUANTITY

0.99+

0%QUANTITY

0.99+

firstQUANTITY

0.99+

Pier StorageORGANIZATION

0.99+

two decadesQUANTITY

0.99+

10th anniversaryQUANTITY

0.99+

both sidesQUANTITY

0.99+

three thingsQUANTITY

0.99+

todayDATE

0.99+

about $600 millionQUANTITY

0.98+

LexORGANIZATION

0.98+

second timeQUANTITY

0.98+

Symmetric SeORGANIZATION

0.98+

over 12 monthsQUANTITY

0.98+

EthanPERSON

0.98+

DeltaORGANIZATION

0.98+

FirstQUANTITY

0.98+

about 4.5 billionQUANTITY

0.98+

6%QUANTITY

0.98+

EPERSON

0.98+

PierrePERSON

0.98+

OneQUANTITY

0.97+

DantePERSON

0.97+

Caryn Woodruff, IBM & Ritesh Arora, HCL Technologies | IBM CDO Summit Spring 2018


 

>> Announcer: Live from downtown San Francisco, it's the Cube, covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody. We're at the Parc 55 in Union Square and this is the Cube, the leader in live tech coverage and we're covering exclusive coverage of the IBM CDO strategy summit. IBM has these things, they book in on both coasts, one in San Francisco one in Boston, spring and fall. Great event, intimate event. 130, 150 chief data officers, learning, transferring knowledge, sharing ideas. Cayn Woodruff is here as the principle data scientist at IBM and she's joined by Ritesh Ororo, who is the director of digital analytics at HCL Technologies. Folks welcome to the Cube, thanks for coming on. >> Thank you >> Thanks for having us. >> You're welcome. So we're going to talk about data management, data engineering, we're going to talk about digital, as I said Ritesh because digital is in your title. It's a hot topic today. But Caryn let's start off with you. Principle Data Scientist, so you're the one that is in short supply. So a lot of demand, you're getting pulled in a lot of different directions. But talk about your role and how you manage all those demands on your time. >> Well, you know a lot of, a lot of our work is driven by business needs, so it's really understanding what is critical to the business, what's going to support our businesses strategy and you know, picking the projects that we work on based on those items. So it's you really do have to cultivate the things that you spend your time on and make sure you're spending your time on the things that matter and as Ritesh and I were talking about earlier, you know, a lot of that means building good relationships with the people who manage the systems and the people who manage the data so that you can get access to what you need to get the critical insights that the business needs, >> So Ritesh, data management I mean this means a lot of things to a lot of people. It's evolved over the years. Help us frame what data management is in this day and age. >> Sure, so there are two aspects of data in my opinion. One is the data management, another the data engineering, right? And over the period as the data has grown significantly. Whether it's unstructured data, whether it's structured data, or the transactional data. We need to have some kind of governance in the policies to secure data to make data as an asset for a company so the business can rely on your data. What you are delivering to them. Now, the another part comes is the data engineering. Data engineering is more about an IT function, which is data acquisition, data preparation and delivering the data to the end-user, right? It can be business, it can be third-party but it all comes under the governance, under the policies, which are designed to secure the data, how the data should be accessed to different parts of the company or the external parties. >> And how those two worlds come together? The business piece and the IT piece, is that where you come in? >> That is where data science definitely comes into the picture. So if you go online, you can find Venn diagrams that describe data science as a combination of computer science math and statistics and business acumen. And so where it comes in the middle is data science. So it's really being able to put those things together. But, you know, what's what's so critical is you know, Interpol, actually, shared at the beginning here and I think a few years ago here, talked about the five pillars to building a data strategy. And, you know, one of those things is use cases, like getting out, picking a need, solving it and then going from there and along the way you realize what systems are critical, what data you need, who the business users are. You know, what would it take to scale that? So these, like, Proof-point projects that, you know, eventually turn into these bigger things, and for them to turn into bigger things you've got to have that partnership. You've got to know where your trusted data is, you've got to know that, how it got there, who can touch it, how frequently it is updated. Just being able to really understand that and work with partners that manage the infrastructure so that you can leverage it and make it available to other people and transparent. >> I remember when I first interviewed Hilary Mason way back when and I was asking her about that Venn diagram and she threw in another one, which was data hacking. >> Caryn: Uh-huh, yeah. >> Well, talk about that. You've got to be curious about data. You need to, you know, take a bath in data. >> (laughs) Yes, yes. I mean yeah, you really.. Sometimes you have to be a detective and you have to really want to know more. And, I mean, understanding the data is like the majority of the battle. >> So Ritesh, we were talking off-camera about it's not how titles change, things evolve, data, digital. They're kind of interchangeable these days. I mean we always say the difference between a business and a digital business is how they have used data. And so digital being part of your role, everybody's trying to get digital transformation, right? As an SI, you guys are at the heart of it. Certainly, IBM as well. What kinds of questions are our clients asking you about digital? >> So I ultimately see data, whatever we drive from data, it is used by the business side. So we are trying to always solve a business problem, which is to optimize the issues the company is facing, or try to generate more revenues, right? Now, the digital as well as the data has been married together, right? Earlier there are, you can say we are trying to analyze the data to get more insights, what is happening in that company. And then we came up with a predictive modeling that based on the data that will statically collect, how can we predict different scenarios, right? Now digital, we, over the period of the last 10 20 years, as the data has grown, there are different sources of data has come in picture, we are talking about social media and so on, right? And nobody is looking for just reports out of the Excel, right? It is more about how you are presenting the data to the senior management, to the entire world and how easily they can understand it. That's where the digital from the data digitization, as well as the application digitization comes in picture. So the tools are developed over the period to have a better visualization, better understanding. How can we integrate annotation within the data? So these are all different aspects of digitization on the data and we try to integrate the digital concepts within our data and analytics, right? So I used to be more, I mean, I grew up as a data engineer, analytics engineer but now I'm looking more beyond just the data or the data preparation. It's more about presenting the data to the end-user and the business. How it is easy for them to understand it. >> Okay I got to ask you, so you guys are data wonks. I am too, kind of, but I'm not as skilled as you are, but, and I say that with all due respect. I mean you love data. >> Caryn: Yes. >> As data science becomes a more critical skill within organizations, we always talk about the amount of data, data growth, the stats are mind-boggling. But as a data scientist, do you feel like you have access to the right data and how much of a challenge is that with clients? >> So we do have access to the data but the challenge is, the company has so many systems, right? It's not just one or two applications. There are companies we have 50 or 60 or even hundreds of application built over last 20 years. And there are some applications, which are basically duplicate, which replicates the data. Now, the challenge is to integrate the data from different systems because they maintain different metadata. They have the quality of data is a concern. And sometimes with the international companies, the rules, for example, might be in US or India or China, the data acquisitions are different, right? And you are, as you become more global, you try to integrate the data beyond boundaries, which becomes a more compliance issue sometimes, also, beyond the technical issues of data integration. >> Any thoughts on that? >> Yeah, I think, you know one of the other issues too, you have, as you've heard of shadow IT, where people have, like, servers squirreled away under their desks. There's your shadow data, where people have spreadsheets and databases that, you know, they're storing on, like a small server or that they share within their department. And so you know, you were discussing, we were talking earlier about the different systems. And you might have a name in one system that's one way and a name in another system that's slightly different, and then a third system, where it's it's different and there's extra granularity to it or some extra twist. And so you really have to work with all of the people that own these processes and figure out what's the trusted source? What can we all agree on? So there's a lot of... It's funny, a lot of the data problems are people problems. So it's getting people to talk and getting people to agree on, well this is why I need it this way, and this is why I need it this way, and figuring out how you come to a common solution so you can even create those single trusted sources that then everybody can go to and everybody knows that they're working with the the right thing and the same thing that they all agree on. >> The politics of it and, I mean, politics is kind of a pejorative word but let's say dissonance, where you have maybe of a back-end syst6em, financial system and the CFO, he or she is looking at the data saying oh, this is what the data says and then... I remember I was talking to a, recently, a chef in a restaurant said that the CFO saw this but I know that's not the case, I don't have the data to prove it. So I'm going to go get the data. And so, and then as they collect that data they bring together. So I guess in some ways you guys are mediators. >> [Caryn And Ritesh] Yes, yes. Absolutely. >> 'Cause the data doesn't lie you just got to understand it. >> You have to ask the right question. Yes. And yeah. >> And sometimes when you see the data, you start, that you don't even know what questions you want to ask until you see the data. Is that is that a challenge for your clients? >> Caryn: Yes, all the time. Yeah >> So okay, what else do we want to we want to talk about? The state of collaboration, let's say, between the data scientists, the data engineer, the quality engineer, maybe even the application developers. Somebody, John Fourier often says, my co-host and business partner, data is the new development kit. Give me the data and I'll, you know, write some code and create an application. So how about collaboration amongst those roles, is that something... I know IBM's gone on about some products there but your point Caryn, it's a lot of times it's the people. >> It is. >> And the culture. What are you seeing in terms of evolution and maturity of that challenge? >> You know I have a very good friend who likes to say that data science is a team sport and so, you know, these should not be, like, solo projects where just one person is wading up to their elbows in data. This should be something where you've got engineers and scientists and business, people coming together to really work through it as a team because everybody brings really different strengths to the table and it takes a lot of smart brains to figure out some of these really complicated things. >> I completely agree. Because we see the challenges, we always are trying to solve a business problem. It's important to marry IT as well as the business side. We have the technical expert but we don't have domain experts, subject matter experts who knows the business in IT, right? So it's very very important to collaborate closely with the business, right? And data scientist a intermediate layer between the IT as well as business I will say, right? Because a data scientist as they, over the years, as they try to analyze the information, they understand business better, right? And they need to collaborate with IT to either improve the quality, right? That kind of challenges they are facing and I need you to, the data engineer has to work very hard to make sure the data delivered to the data scientist or the business is accurate as much as possible because wrong data will lead to wrong predictions, right? And ultimately we need to make sure that we integrate the data in the right way. >> What's a different cultural dynamic that was, say ten years ago, where you'd go to a statistician, she'd fire up the SPSS.. >> Caryn: We still use that. >> I'm sure you still do but run some kind of squares give me some, you know, probabilities and you know maybe run some Monte Carlo simulation. But one person kind of doing all that it's your point, Caryn. >> Well you know, it's it's interesting. There are there are some students I mentor at a local university and you know we've been talking about the projects that they get and that you know, more often than not they get a nice clean dataset to go practice learning their modeling on, you know? And they don't have to get in there and clean it all up and normalize the fields and look for some crazy skew or no values or, you know, where you've just got so much noise that needs to be reduced into something more manageable. And so it's, you know, you made the point earlier about understanding the data. It's just, it really is important to be very curious and ask those tough questions and understand what you're dealing with. Before you really start jumping in and building a bunch of models. >> Let me add another point. That the way we have changed over the last ten years, especially from the technical point of view. Ten years back nobody talks about the real-time data analysis. There was no streaming application as such. Now nobody talks about the batch analysis, right? Everybody wants data on real-time basis. But not if not real-time might be near real-time basis. That has become a challenge. And it's not just that prediction, which are happening in their ERP environment or on the cloud, they want the real-time integration with the social media for the marketing and the sales and how they can immediately do the campaign, right? So, for example, if I go to Google and I search for for any product, right, for example, a pressure cooker, right? And I go to Facebook, immediately I see the ad within two minutes. >> Yeah, they're retargeting. >> So that's a real-time analytics is happening under different application, including the third-party data, which is coming from social media. So that has become a good source of data but it has become a challenge for the data analyst and the data scientist. How quickly we can turn around is called data analysis. >> Because it used to be you would get ads for a pressure cooker for months, even after you bought the pressure cooker and now it's only a few days, right? >> Ritesh: It's a minute. You close this application, you log into Facebook... >> Oh, no doubt. >> Ritesh: An ad is there. >> Caryn: There it is. >> Ritesh: Because everything is linked either your phone number or email ID you're done. >> It's interesting. We talked about disruption a lot. I wonder if that whole model is going to get disrupted in a new way because everybody started using the same ad. >> So that's a big change of our last 10 years. >> Do you think..oh go ahead. >> oh no, I was just going to say, you know, another thing is just there's so much that is available to everybody now, you know. There's not this small little set of tools that's restricted to people that are in these very specific jobs. But with open source and with so many software-as-a-service products that are out there, anybody can go out and get an account and just start, you know, practicing or playing or joining a cackle competition or, you know, start getting their hands on.. There's data sets that are out there that you can just download to practice and learn on and use. So, you know, it's much more open, I think, than it used to be. >> Yeah, community additions of software, open data. The number of open day sources just keeps growing. Do you think that machine intelligence can, or how can machine intelligence help with this data quality challenge? >> I think that it's it's always going to require people, you know? There's always going to be a need for people to train the machines on how to interpret the data. How to classify it, how to tag it. There's actually a really good article in Popular Science this month about a woman who was training a machine on fake news and, you know, it did a really nice job of finding some of the the same claims that she did. But she found a few more. So, you know, I think it's, on one hand we have machines that we can augment with data and they can help us make better decisions or sift through large volumes of data but then when we're teaching the machines to classify the data or to help us with metadata classification, for example, or, you know, to help us clean it. I think that it's going to be a while before we get to the point where that's the inverse. >> Right, so in that example you gave, the human actually did a better job from the machine. Now, this amazing to me how.. What, what machines couldn't do that humans could, you know last year and all of a sudden, you know, they can. It wasn't long ago that robots couldn't climb stairs. >> And now they can. >> And now they can. >> It's really creepy. >> I think the difference now is, earlier you know, you knew that there is an issue in the data. But you don't know that how much data is corrupt or wrong, right? Now, there are tools available and they're very sophisticated tools. They can pinpoint and provide you the percentage of accuracy, right? On different categories of data that that you come across, right? Even forget about the structure data. Even when you talk about unstructured data, the data which comes from social media or the comments and the remarks that you log or are logged by the customer service representative, there are very sophisticated text analytics tools available, which can talk very accurately about the data as well as the personality of the person who is who's giving that information. >> Tough problems but it seems like we're making progress. All you got to do is look at fraud detection as an example. Folks, thanks very much.. >> Thank you. >> Thank you very much. >> ...for sharing your insight. You're very welcome. Alright, keep it right there everybody. We're live from the IBM CTO conference in San Francisco. Be right back, you're watching the Cube. (electronic music)

Published Date : May 2 2018

SUMMARY :

Brought to you by IBM. of the IBM CDO strategy summit. and how you manage all those demands on your time. and you know, picking the projects that we work on I mean this means a lot of things to a lot of people. and delivering the data to the end-user, right? so that you can leverage it and make it available about that Venn diagram and she threw in another one, You need to, you know, take a bath in data. and you have to really want to know more. As an SI, you guys are at the heart of it. the data to get more insights, I mean you love data. and how much of a challenge is that with clients? Now, the challenge is to integrate the data And so you know, you were discussing, I don't have the data to prove it. [Caryn And Ritesh] Yes, yes. You have to ask the right question. And sometimes when you see the data, Caryn: Yes, all the time. Give me the data and I'll, you know, And the culture. and so, you know, these should not be, like, and I need you to, the data engineer that was, say ten years ago, and you know maybe run some Monte Carlo simulation. and that you know, more often than not And I go to Facebook, immediately I see the ad and the data scientist. You close this application, you log into Facebook... Ritesh: Because everything is linked I wonder if that whole model is going to get disrupted that is available to everybody now, you know. Do you think that machine intelligence going to require people, you know? Right, so in that example you gave, and the remarks that you log All you got to do is look at fraud detection as an example. We're live from the IBM CTO conference

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Ritesh OroroPERSON

0.99+

CarynPERSON

0.99+

John FourierPERSON

0.99+

RiteshPERSON

0.99+

IBMORGANIZATION

0.99+

USLOCATION

0.99+

50QUANTITY

0.99+

Cayn WoodruffPERSON

0.99+

BostonLOCATION

0.99+

San FranciscoLOCATION

0.99+

ChinaLOCATION

0.99+

IndiaLOCATION

0.99+

last yearDATE

0.99+

ExcelTITLE

0.99+

oneQUANTITY

0.99+

Caryn WoodruffPERSON

0.99+

Ritesh AroraPERSON

0.99+

Hilary MasonPERSON

0.99+

60QUANTITY

0.99+

130QUANTITY

0.99+

OneQUANTITY

0.99+

Monte CarloTITLE

0.99+

HCL TechnologiesORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

third systemQUANTITY

0.98+

todayDATE

0.98+

InterpolORGANIZATION

0.98+

ten years agoDATE

0.98+

two applicationsQUANTITY

0.98+

firstQUANTITY

0.98+

Parc 55LOCATION

0.98+

five pillarsQUANTITY

0.98+

one systemQUANTITY

0.98+

GoogleORGANIZATION

0.97+

two aspectsQUANTITY

0.97+

both coastsQUANTITY

0.97+

one personQUANTITY

0.96+

Ten years backDATE

0.96+

two minutesQUANTITY

0.95+

this monthDATE

0.95+

Union SquareLOCATION

0.95+

two worldsQUANTITY

0.94+

Spring 2018DATE

0.94+

Popular ScienceTITLE

0.9+

CTOEVENT

0.88+

daysQUANTITY

0.88+

one wayQUANTITY

0.87+

SPSSTITLE

0.86+

single trusted sourcesQUANTITY

0.85+

VennORGANIZATION

0.84+

few years agoDATE

0.84+

150 chief data officersQUANTITY

0.83+

last 10 20 yearsDATE

0.83+

Officer Strategy Summit 2018EVENT

0.82+

hundreds of applicationQUANTITY

0.8+

last 10 yearsDATE

0.8+

CubeCOMMERCIAL_ITEM

0.79+

IBM ChiefEVENT

0.79+

IBM CDO strategy summitEVENT

0.72+

last ten yearsDATE

0.7+

IBM CDO SummitEVENT

0.7+

fallDATE

0.68+

CubeTITLE

0.66+

springDATE

0.65+

last 20 yearsDATE

0.63+

minuteQUANTITY

0.49+

Becky Wanta, RSW1C Consulting - CloudNOW Awards 2017


 

(click) >> Hey, Lisa Martin on the ground with theCUBE at Google for the Sixth Annual CloudNOW Top Women in Cloud Awards Event, our second year covering this, very excited to be joined by tonight's emcee, Becky Wanta, the founder of RSW1C. Welcome to theCUBE. >> Thank you. >> It's great to have you here. So tell us a little bit about what you do and your background as a technology leader. >> So, I've been in technology for close to 40 years. I started out as a software. >> Sorry, I don't even, what? (laughing) >> Ha, ha, ha, it's a long time ago, yeah. So I started out as a developer back in the Department of Defense. So it wasn't rocket science in the early days when I began because it was back when computers took up whole rooms and I realized I had an affinity for that. So, I leveraged that, but then I got into, at that time, and I'm from northern California, if you remember right, the Department of Defense was drawing down. And so I decided I was going to leverage my experience in IT to get into either integrative financial services or healthcare, right. So I took over running all of tech for the Money Store at the time which you would have no idea who that is. And then that got acquired by Wells Fargo First Union, so I took over as their Global CTO for Wells Fargo. And what you'll see is, so let me just tell you about RSW1C because what it is is it's a technology consulting firm that's me. And the reason I have it is because tech changes so much that it's easy to stay current. And when I get brought into companies, and you'll look at me, so I've been the executive officer for tiny little companies like PepsiCo, Wells Fargo, Southwest Airlines. >> The small ones. >> Yeah, tiny, not really, MGM Resorts International, the largest worker's comp company in California, a company that, unborn midsize SMB in southern California that just wrapped up last year. And when I get brought into these companies, I get brought in to transform them. It's at a time in the maturation of these companies, these tiny little brands we've mentioned, where they're ready to jettison IT. So I take that very seriously because I know technology is that gateway to keep that competitive advantage. And the beauty is of that the companies I've mentioned, they're all number one in their markets. And when you're number one, there's only one direction to go, so they take that very seriously. >> How do you come in there and help an MGM Grand Resorts transform? >> So what happened in MGM's case and probably in the last five CIO positions that I've taken, they've met me as a consultant, again, from RSW1C. And then when I look into what needs to happen and I have the conversation, because everybody thinks they want to do digital transformation, and it's not an easy journey and if you don't have the executive sponsorship, don't even try it at home, right? And so, in MGM's case, they had been talking. MGM's the largest taxpayer in Nevada. People think about it as MGM Grand. It's 19 brands on The Strip. >> Is that right? >> It's Bellagio, MGM, so it's the largest taxpayer in Nevada. So it owns 44,860 rooms on The Strip. So if I just counted now, you have Circa Circa, Slots of Fun, Mirage, Bellagio, Monte Carlo, New York, New York, um, MGM Grand Las Vegas, MGM Grand Detroit. They're in the countries and so forth. So it's huge. And that includes Mandalay, ARIA, and all those, so it's huge, right? And so in MGM's case, they knew they wanted to do M life, so M life game changes their industry. And I put that in. This will be our nine year anniversary coming up on Valentine's Day. Thirty years they talked about it, and I put in with a great team And that was part of the transformation into a new way of running their business. >> Wow, we have a couple of minutes left. I'd love to get your perspective on being a female leader in tech. Who were your mentors back in the day? And who are your mentors now? >> So, I don't have any mentors. I never did. Because when I started in the industry, there wasn't a lot of women. And obviously, technology was fairly new which is why one of my passions is around helping the next generation be hugely successful. And one of the things that's important is in the space of tech, I like this mantra, this mantra that says, "How about brains "and beauty that gets you in the door? "How about having the confidence in yourself?" So I want to help a lot of the next generation be hugely successful. And that's what Jocelyn has built with CloudNow, her and Susan. And I'm a big proponent of this because I think it's a chance for us to give back and help the next generation of leaders in a non-traditional way be hugely successful in brands, in companies that are going to unleash their passion and show them how to do that. Because, the good news is that I'm a total bum, Lisa. I've never had a job. I love what I do, and I do it around the clock, so. >> Oh, if only more people could say that. That's so cool. But what we've seen with CloudNow, this is our second year covering it, I love talking to the winners and even the folks that are keynoting or helping to sponsor scholarships. There's so much opportunity. >> There really is. >> And it's so exciting when you can see someone whose life is changing as a result of finding a mentor or having enough conviction to say, "You know what? "I am interested in a STEM field. "I'm going to pursue that." >> Right. >> So, we thank you so much Becky for stopping by theCUBE. And your career is amazing. >> Thanks. >> And I'm sure you probably are mentors to countless, countless men and women out there. >> Absolutely. >> Well, thanks again for stopping by. >> Thank you, Lisa. >> Thank you for watching theCUBE. I'm Lisa Martin on the ground at Google with the CloudNow Sixth Annual Top Women in Cloud Awards Event. Stick around, we'll be right back.

Published Date : Dec 8 2017

SUMMARY :

Hey, Lisa Martin on the ground with theCUBE It's great to have you here. So, I've been in technology for close to 40 years. And the reason I have it is because tech changes so much And the beauty is of that the companies I've mentioned, And then when I look into what needs to happen And I put that in. And who are your mentors now? And one of the things that's important is and even the folks that are keynoting And it's so exciting when you can see someone And your career is amazing. And I'm sure you probably are mentors for stopping by. I'm Lisa Martin on the ground at Google

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
PepsiCoORGANIZATION

0.99+

MGMORGANIZATION

0.99+

CaliforniaLOCATION

0.99+

NevadaLOCATION

0.99+

Southwest AirlinesORGANIZATION

0.99+

Becky WantaPERSON

0.99+

JocelynPERSON

0.99+

Wells FargoORGANIZATION

0.99+

Lisa MartinPERSON

0.99+

MGM Resorts InternationalORGANIZATION

0.99+

LisaPERSON

0.99+

BeckyPERSON

0.99+

MGM GrandORGANIZATION

0.99+

SusanPERSON

0.99+

19 brandsQUANTITY

0.99+

44,860 roomsQUANTITY

0.99+

RSW1CORGANIZATION

0.99+

BellagioORGANIZATION

0.99+

last yearDATE

0.99+

Department of DefenseORGANIZATION

0.99+

northern CaliforniaLOCATION

0.99+

Thirty yearsQUANTITY

0.99+

second yearQUANTITY

0.99+

New YorkLOCATION

0.98+

southern CaliforniaLOCATION

0.98+

Valentine's DayEVENT

0.98+

Wells Fargo First UnionORGANIZATION

0.97+

theCUBEORGANIZATION

0.97+

Sixth Annual CloudNOW Top Women in Cloud Awards EventEVENT

0.97+

MGM Grand DetroitORGANIZATION

0.97+

oneQUANTITY

0.97+

GoogleORGANIZATION

0.97+

CloudNow Sixth Annual Top Women in Cloud Awards EventEVENT

0.95+

tonightDATE

0.95+

one directionQUANTITY

0.94+

ARIAORGANIZATION

0.93+

CloudNOW Awards 2017EVENT

0.88+

MGM Grand ResortsORGANIZATION

0.88+

nine year anniversaryQUANTITY

0.87+

closeQUANTITY

0.83+

Slots of FunORGANIZATION

0.83+

CloudNowORGANIZATION

0.81+

Circa CircaORGANIZATION

0.8+

40 yearsQUANTITY

0.78+

MandalayORGANIZATION

0.78+

MGM Grand Las VegasORGANIZATION

0.78+

Money StoreORGANIZATION

0.77+

MonteORGANIZATION

0.77+

MirageORGANIZATION

0.75+

five CIO positionsQUANTITY

0.75+

CarloLOCATION

0.74+

SMBORGANIZATION

0.71+

The StripLOCATION

0.71+

The StripORGANIZATION

0.66+

CloudORGANIZATION

0.63+

emceePERSON

0.63+

RSW1CEVENT

0.6+

Sharad Singhal, The Machine & Matthias Becker, University of Bonn | HPE Discover Madrid 2017


 

>> Announcer: Live from Madrid, Spain, it's theCUBE, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is theCUBE, the leader in live tech coverage and my name is Dave Vellante, and I'm here with Peter Burris, this is day two of HPE Hewlett Packard Enterprise Discover in Madrid, this is their European version of a show that we also cover in Las Vegas, kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sharad Singal is here, he covers software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker, who's a postdoctoral researcher at the University of Bonn. Gentlemen, thanks so much for coming in theCUBE. >> Thank you. >> No problem. >> You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about, you know, something just more important, right? We're talking about lives and the human condition and >> Peter: Hard problems to solve. >> Specifically, yeah, hard problems like Alzheimer's. So Sharad, why don't we start with you, maybe talk a little bit about what this initiative is all about, what the partnership is all about, what you guys are doing. >> So we started on a project called the Machine Project about three, three and a half years ago and frankly at that time, the response we got from a lot of my colleagues in the IT industry was "You guys are crazy", (Dave laughs) right. We said we are looking at an enormous amount of data coming at us, we are looking at real time requirements on larger and larger processing coming up in front of us, and there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data, and we have to rethink how we do computing, and the real question for those of us who are in research in Hewlett Packard Labs, was if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past, this should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years, in terms of computing architectures and what we will do there. In the last three years we have gone from ideas and paper study, paper designs, and things which were made out of plastic, to a real working system. We have around Las Vegas time, we'd basically announced that we had the entire system working with actual applications running on it, 160 terabytes of memory all addressable from any processing core in 40 computing nodes around it. And the reason is, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture, as opposed to the processor, and any processor can return to any part of the data directly as if it was doing, addressing in local memory. This provides us with a degree of flexibility and freedom in compute that we never had before, and as a software person, I work in software, as a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of us in the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers, where we said, you know, this algorithm I had written off decades ago because it didn't work, but now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden, a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications, ranging from just Spark on this kind of an environment, to how do you do large scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order of, oh I can get you a 30% improvement. We are saying in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. >> So many orders of magnitude. >> Many, many orders >> When you don't have to wait for the horrible storage stack. (laughs) >> That's right, right. And these kinds of results gave us the hope that as we look forward, all of us in these new computing architectures that we are thinking through right now, will take us through this data mountain, data tsunami that we are all facing, in terms of bringing all of the data back and essentially doing real-time work on those. >> Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. >> So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases, and in their mission they are facing all diseases like Alzheimer's, Parkinson's, Multiple Sclerosis, and so on. And in particular Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improve, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much we currently can do, so the DZNE is focusing on their research efforts together with the German government in this direction, and one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that will point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE they have started on a cohort study. In the area around Bonn, they are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years, and in this process we generate a lot of data, so of course we do the usual surveys to learn a bit about them, we learn about their environments. But we also do very more detailed analysis, so we take blood samples and we analyze the complete genome, and also we acquire imaging data from the brain, so we do an MRI at an extremely high resolution with some very advanced machines we have, and all this data is accumulated because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study, so that we can later analyze the time series when in 10 years someone develops Alzheimer's we can go back through the data and see, maybe there's something interesting in there, maybe there was one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we need something new to analyze this data, and to deal with this, and when we heard about the machine, we though immediately this is a system that we would need. >> Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts, I used to live there, in Framingham, Massachusetts, >> Dave: I was actually born in Framingham. >> You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and relationship between smoking and cancer, and other really interesting problems. But they used a paper-based study with an interview base, so for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte and a half of data. You just described a couple of gigabytes of data per person, 30,000, multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data, to then do something with it, so you can generate those important inferences. >> Exactly, so with all these large amounts of data we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we have captured with each other. So there's really a lot of things we have to parse and compare. >> This brings together the idea that it's not just the volume of data. I also have to do analytics and cross all of that data together, right, so every time a scientist, one of the people who is doing biology studies or informatic studies asks a question, and they say, I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease, they then want to go through all of that data, and analyze it as as they are asking the question. Now if the amount of compute it takes to actually answer their questions takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. >> But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions, and use a small amount of data. The technology to do that's been around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem, we're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I gotta presume that there's the quid pro quo for getting people into the study, is that, you know, 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on, have I got that right? >> So, we're trying to do that, but also there are limits to this, of course. >> Of course. >> For us it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies. >> To help future generations. >> So that's not necessarily quid pro quo. >> Okay, there isn't, okay. But still, the knowledge is enough for them. >> Yeah, their incentive is they're gonna help people who have this disease down the road. >> I mean if it is not me, if it helps society in general, people are willing to do a lot. >> Yeah of course. >> Oh sure. >> Now the machine is not a product yet that's shipping, right, so how do you get access to it, or is this sort of futures, or... >> When we started talking to one another about this, we actually did not have the prototype with us. But remember that when we started down this journey for the machine three years ago, we know back then that we would have hardware somewhere in the future, but as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. It does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack, using emulation and simulation environments, where we have some simulators with essentially instruction level simulator for what the machine does, or what that prototype would have done, and we were running code on top of those simulators. We also had performance simulators, where we'd say, if we write the application this way, this is how much we think we would gain in terms of performance, and all of those applications on all of that code we were writing was actually on our large memory machines, Superdome X to be precise. So by the time we started talking to them, we had these emulation environments available, we had experience using these emulation environments on our Superdome X platform. So when they came to us and started working with us, we took their software that they brought to us, and started working within those emulation environments to see how fast we could make those problems, even within those emulation environments. So that's how we started down this track, and most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdome X platform. So even in that emulated environment, which is emulating the machine now, on course in the emulation Superdome X, for example, I can only hold 24 terabytes of data in memory. I say only 24 terabytes >> Only! because I'm looking at much larger systems, but an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right, they won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us we then started looking for problems, and I'll let Matthias comment on the problems that they brought to us, and then we can talk about how we actually solved those problems. >> So we work a lot with genomics data, and usually what we do is we have a pipeline so we connect multiple tools, and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the new optimal solution. So prior work was taking up to six days for processing, they were able to cut it to 22 minutes, and we thought, okay, this is a perfect challenge for our collaboration, and we went ahead and we took this tool, we put it on the Superdome X that was already running and stepped five minutes instead of just 22, and then we started modifying the code and in the end we were able to shrink the time down to just 30 seconds, so that's two magnitudes faster. >> We took something which was... They were able to run in 22 minutes, and that was already had been optimized by people in the field to say "I want this answer fast", and then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment, so the serious coding was starting in around March. By June we had 9X improvement, which is already a factor of 10, and since June up to now, we have gotten another factor of 10 on that application. So I'm now at a 100X faster than what the application was able to do before. >> Dave: Two orders of magnitude in a year? >> Sharad: In a year. >> Okay, we're out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? >> For us, we're really aiming to analyze our data in real time. Oftentimes when we have biological questions that we address, we analyze our data set, and then in a discussion a new question comes up, and we have to say, "Sorry, we have to process the data, "come back in a week", and our idea is to be able to generate these answers instantaneously from our data. >> And those answers will lead to what? Just better care for individuals with Alzheimer's, or potentially, as you said, making Alzheimer's a memory. >> So the idea is to identify Alzheimer long before the first symptoms are shown, because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. >> Well thank you for your great work, gentlemen, and best of luck on behalf of society, >> Thank you very much >> really appreciate you coming on theCUBE and sharing your story. You're welcome. All right, keep it right there, buddy. Peter and I will be back with our next guest right after this short break. This is theCUBE, you're watching live from Madrid, HPE Discover 2017. We'll be right back.

Published Date : Nov 29 2017

SUMMARY :

brought to you by Hewlett Packard Enterprise. that we also cover in Las Vegas, So Sharad, why don't we start with you, and frankly at that time, the response we got When you don't have to computing architectures that we are thinking through and how this technology gives you possible hope and in this process we generate a lot of data, So Dave lives in Massachusetts, I used to live there, is the Framingham Heart Study, which tracked people that we have captured with each other. Now if the amount of compute it takes to actually the Framingham Heart Study, you know, there are limits to this, of course. and people are really willing to donate everything So that's not necessarily But still, the knowledge is enough for them. people who have this disease down the road. I mean if it is not me, if it helps society in general, Now the machine is not a product yet and most of the results we have shown in the study that they brought to us, and then we can talk about and in the end we were able to shrink the time based on the emulation results we were seeing underneath, and we have to say, "Sorry, we have to process the data, Just better care for individuals with Alzheimer's, So the idea is to identify Alzheimer Peter and I will be back with our next guest

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
NeilPERSON

0.99+

Dave VellantePERSON

0.99+

JonathanPERSON

0.99+

JohnPERSON

0.99+

Ajay PatelPERSON

0.99+

DavePERSON

0.99+

$3QUANTITY

0.99+

Peter BurrisPERSON

0.99+

Jonathan EbingerPERSON

0.99+

AnthonyPERSON

0.99+

Mark AndreesenPERSON

0.99+

Savannah PetersonPERSON

0.99+

EuropeLOCATION

0.99+

Lisa MartinPERSON

0.99+

IBMORGANIZATION

0.99+

YahooORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Paul GillinPERSON

0.99+

Matthias BeckerPERSON

0.99+

Greg SandsPERSON

0.99+

AmazonORGANIZATION

0.99+

Jennifer MeyerPERSON

0.99+

Stu MinimanPERSON

0.99+

TargetORGANIZATION

0.99+

Blue Run VenturesORGANIZATION

0.99+

RobertPERSON

0.99+

Paul CormierPERSON

0.99+

PaulPERSON

0.99+

OVHORGANIZATION

0.99+

Keith TownsendPERSON

0.99+

PeterPERSON

0.99+

CaliforniaLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

SonyORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

Andy JassyPERSON

0.99+

RobinPERSON

0.99+

Red CrossORGANIZATION

0.99+

Tom AndersonPERSON

0.99+

Andy JazzyPERSON

0.99+

KoreaLOCATION

0.99+

HowardPERSON

0.99+

Sharad SingalPERSON

0.99+

DZNEORGANIZATION

0.99+

U.S.LOCATION

0.99+

five minutesQUANTITY

0.99+

$2.7 millionQUANTITY

0.99+

TomPERSON

0.99+

John FurrierPERSON

0.99+

MatthiasPERSON

0.99+

MattPERSON

0.99+

BostonLOCATION

0.99+

JessePERSON

0.99+

Red HatORGANIZATION

0.99+

Josh Epstein and Eyal David, Kaminario | VMworld 2017


 

>> Announcer: Live from Las Vegas, it's theCUBE! Covering VMworld 2017! Brought to you by VMware and its ecosystem partners. (futuristic music) >> Welcome back everyone, we are live, here, in Las Vegas for VMworld 2017, I'm John Furrier, my cohost Dave Vellante, eighth year with theCUBE, proud to have two great guests, Josh Epstein, CMO of Kaminario and Eyal David, CTO of Kaminario, great to see you guys again! >> Likewise, great to be here! >> You guys had a great event in Boston recently, what's going on with you guys? Give me an update on the company. >> Sure, I'll go first. Kaminario's been around for awhile, but we've been, first of all, moved the headquarters over to east coast US, outside of Boston, Massachusetts, opened up a great new office space there. Got a lot going on from a product perspective, a lot going on from a go-to-market perspective, you see a lot happening in the all-flash space and the storage space in general, and just, really excited to take it to the next step. We see a lot of things happening here. >> It's a pretty big week this week. We saw Scott Dietzen from Pure Storage become the Chairman and Jean Carlo, ex-CISCO MNA guy from Silver Lake come in to be CEO, so Dave and I were speculating, All flash, a lot of, what's going on! A lot of people saying, woah, is it growing? Still a need for flash. What's the big hubbub about? >> So, we definitely see a change in the market, and the emergence of two different models. The way people used to buy storage, and the way next-generation application, cloud-scale application, software-to-servers, e-commerce, online businesses, need to buy storage. And their need for simplicity, performance and cost-efficiency at scale is still driving the need for flash storage, and we'll talk about this yet to come some more. >> And you guys see those as really distinct opportunities, is that right? Can you add some color to that, Josh? >> Yeah, I think that we see the flash space made up of two different markets, one is just the massive stocking function of traditional enterprise data centers, making the move en masse to flash. And there you have, obviously, the incumbent vendors with their flash solutions, you know. That's a dogfight, there's a lot of competition in there. There's this other market which we see growing more healthily, more organically, which is the growth of these cloud-scale applications. As Eyal said, flash provider, or, software-to-server providers, e-commerce providers, fintech, healthtech, these large, highly-scalable, database-driven cloud-scale applications. That means a different type of of scale, so that's where we see less competition from the incumbents and more opportunity -- >> What's different about that market, what's the requirement, what are they looking for that makes this a good engine for them? >> So one of the key requirements is agility and flexibility. One of the current characteristics is they don't really know what is going to be the next workload, how their workload is going to change in scale over time. So they need an infrastructure that can change and adapt to their needs, still deliver the same level of performance, still deliver the same level of simplicity. But have that flexibility to address their changing needs in capacity and performance, to address growth in customers, changing in workload application, without too much pre-planning. >> So I'd ask the question to you guys, I get this all the time. So since you guys are the gurus in the area. I get this question a lot, what is a modern data center? With all the action on private cloud happenings, true private cloud, they truly point out, people are re-tooling their data centers to operate like cloud, it's still on-premise. That's kind of the gateway to hypercloud, very clear. Public cloud, workloads, all bursting, that stuff's great. What's a modern architecture, what's a modern data center? When I hear that term, what do you guys mean? >> That's a great question. So the modern data center, or even the next generation data center is exactly that, one that allows enterprises to achieve the same levels of scalability, efficiency, as the hypercloud, but on-premise, or in a hybrid fashion. But it allows them to have that level of control against operation simplicity that's hard to come by, but on their own terms, adapting to their own needs. >> So without the need to build out a massive engineering team to build this from the ground up. >> So are the buyers different, are those two worlds coming together? I wonder if you could address that. >> Yeah, I think the buyers are, in fact, different. I think, now, you see a convergence over time as the classic enterprise data centers start to look more like a private cloud. But we see this growth in large-managed private cloud providers really exciting, and they come in different forms. You have the Telcos getting into the business, you have the outsourcers getting into the business, you have the traditional channel getting into the business. We have a great partnership with Vion, a big federal reseller, and using Kaminario as a flash service offering. And they start looking like a cloud provider, and they're thinking like a cloud provider. >> And what's the benefits then? Cause I was just looking at the gov cloud impact, I was just at the Amazon Public Sector Summit. Huge traction right now because it's so fast, you can get into the government cloud quickly. Why is that unique, why, as a service, and why are you guys really driving that? >> One, it fits with our architecture perfectly. But I think from a customer standpoint, the ability to procure, like, procuring from the cloud, but also to get the kind of services, you know, as people start re-engineering applications thinking about dev-ops, cloud-data-type applications, leveraging the same kind of utilities that they might get from an Amazon or an Ajer, from a managed private cloud provider, it becomes really important. >> And Al-fed ramp is there, you get all the federal information stuff going on around it. >> So I wonder how you deal with this problem, it's a relatively small company, you're up against the big guys, you say, it's like a rock fight. But you have an affinity to, let's say, SAS players. They like your product and it fits better with their vision. But then you have this big whale, saying, okay, I'm going to buy my HR software from, you know, some SAS provider, I'm going to do some, whatever, 70,000-person deployment, but, as a quid pro quo, you've got to buy my all-flash array. So you must see that all the time. When you peel back the covers, underneath that SAS provider, what do you really see? Like, they fence off, sort of, legacy-vendors' stuff, and they really drive in their core business with your modern platform? Or is it sort of just a mishmash? >> No, I think we're seeing a shift. I think what we're seeing is, some of the legacy architectures are running up against boundaries. Boundaries in terms of complexity, boundaries in terms of agility. Kaminario was built to scale from the get-go. It was built for performance and it was built for scale. And I think what we're seeing is, the main value of these SAS providers, as they're reaching scale, is the ability to deliver consistent performance, consistent cost-efficiency, and really, our predictability. The ability to sort of forecast in the future what cost structure's going to look like in order to continue to deliver high-performance to their own users. >> So the hypothetical example I gave, I'm sure you see it, but are you, you know, winning head-to-head in those environments, and your piece is growing, and that's sort of just a static one-time deal? >> That's exactly what we're seeing, so our main growth, our main focus is on these software-to-service companies or software-to-service departments within existing companies building these types of offerings to deliver this as a service consumption model. And you were asking about the back-end, in the back-end, these are often large-scale databases operating mixed types of workloads, for example, transaction processing, analytics, all at the same time. And the need to support these types of workloads requires an infrastructure that can deliver at-scale, consistent performance. And when we face off the legacy vendors in those environments, we win out. >> You have to be substantially better as a small company. You are, otherwise, you're out of business. >> Absolutely. >> And so, interesting thing about the flash market it, a lot of the big guys realized right away, wow, I'm way behind, so they went out and they bought a lot of startups. What happened, did they sort of pollute them, through the integration, or ... (laughing) >> I think the marketshare statistics are a little bit confusing, but what we see is, you know, the bulk of the legacy vendors, you know, push in what we call retro-fit flash, basically taking their old legacy architectures, their scale-up or scale-out architectures, and cramming flash into it, and basically, then, they don't bring the same kind of simplicity, same kind of agility, same kind of scalability as a built-for-flash-offering like Kaminario. >> Right, what about, you guys have some announcements this week? >> Yup, take that? >> Yeah, two weeks ago we announced our next-generation platform, K2.n, which is based on a fully-converged, NVIO mean over fabric back end. This is basically taking our core operating system, Vision OS, which is a mature and robust storage software stack with all the data services and enterprise features that enterprises need. And deliver it on an NVIO fabric backend which leverages the existing capability to aggregate capacity and compute, and take it to the next level, delivering a very scalable and agile storage cluster that allows you to mix and match different types of resources, to add and remove resources very dynamically, and make your data center responsive in minutes and not hours or days or even months. >> You guys are familiar with our service and research, and we're very excited about NVIO over fabric, because we've been talking about it since probably, maybe 2008, 2009, some type of ability to scale and to communicate, and that's here today, finally. How close are we to actually having a product in the field that I can actually deploy? >> We will actually be shipping this in Q1, the K2.N They added another layer on top of that, We also announced a new software platform called Kaminario Flex, which is a orchestration platform which rides on top of K2.N, and allows you to dynamically compose virtual arrays out of these NVME-connected resources. So I really take that, looking ahead, that the classic notion of a monolithic shared-storage array, is going to die over time. >> Well, here's the numbers. I mean, it's automatic, go ahead. >> Well no, this is the whole debate that we've been clearing up with the true, private cloud report. I mean, guys, no-brainer, check, as a service, as the future, so you're good there. (laughs) The true pilot board, too bad it shows the on-prem stuff is declining in general, that's settlement for buying boxes, and the old way of doing things. Labor's being automated away and shifted, that's pretty obvious. Enter your business model, right? I mean, this is perfect for any cloud deal. >> Right. >> The question is, track record, bulletproof, reliability, security, the table stays all shift, data protection, all these details, that's what they care. You guys check that box ... (laughing) >> So the disability takes vision away, so I'm going to take it to the next generation. Technology is what actually allows us to do that. Whether it's in a hypercloud or we're going into a managed cloud provider, that is becoming a very desired consumption model for a lot of the ads of service members, allows them to build such a flexible architecture, based on a mature software step. >> So you guys, really, from what I see is your strategy is, get this out there quickly from a tech standpoint, software, flex, and integration with cloud is critical. Because you can offload a lot of that heavy lifting on those unique requirements to the cloud guys, where the pre-existing tech exists. Did I get that right? >> Yeah and I think what we see is these managed cloud providers are going to want to have a say in it, they want to actually be part of the evolution of the platform, right? >> Yeah, go ahead, fine, it's your stop! You can always buy the servers more flash! (laughing) >> So talk about your channel, and you go to market, help us understand that a little bit better. >> Yeah, I think it's all about focus for Kaminario. I mean, let's face it, the flash space is competitive, right, if we're going to go head-to-head with everyone, kind of, pull one of these growth-at-all-cost models. And you see what the market values those types of companies. So we've been really focused in two ways. One, SAS providers, next-generation business. I mean, if we opportunistically find a VDI deal, okay, that's great, we have a great solution for VDI, but it's not something that we're going to go out and hunting day to day. The second is really to focus on channel partners. We've got a channel first model, really, effectively 100% of our new business in 2017 will come through a channel partner. Most of those channel partners are looking at developing some type of managed services offering as well, so you know, it's not just about the margin on the deal, it's about the longterm -- >> Cause they're trying to respond to the market transit and value. >> Exactly, so it's about focus on a relatively small number of channel partners that get it, that like our model, and again, it's just -- >> Hey, you'll make money from it, cause that's all, at the end of the day, you've got to get that leverage, because that's your David and Goliath story. >> Exactly, yeah. >> And, global footprint? Is it primarily US and Europe or -- >> Yeah, so it's been, we started in Israel, US has been a good focus, last year we opened up the UK and France, end of the great we opened up Korea, we're now in Singapore, we're moving into China through partners, and so yeah, this is a global story. Clearly, US is the, in terms of adoption of these server infrastructures, US is really the furthest ahead, but it's a global phenomenon. >> What do you make of the VMwear momentum? Because two years ago, VMwear was, the stock was sort of in the tank and there was no growth, and now it's on fire, the data center's on fire, you can't get data center space! (laughing) >> From my perspective, the fast adoption that VMwear had for new technologies, for adopting containers, for adopting cloud paradigms, for adopting this new delivery model, and enabling a fuller stack aligns very well with the kind of demands of the next-generation data system we talked about, where the management plane, the orchestration plane, is becoming more and more important in optimizing the way in this infrastructure gets delivered. So that's, I believe, what is driving that forward. >> Josh and Elay, thanks so much for coming out, coming our way, you guys, company watch, love the business model. The tech comes home, you get it with that integration, man there's not a leverage there, congratulations on your success! (laughing) Great business. TheCUBE bringing you the CUBE as a service, all flash content here! Back with more VMworld coverage after this short break. (futuristic music)

Published Date : Aug 29 2017

SUMMARY :

Brought to you by VMware and its ecosystem partners. what's going on with you guys? first of all, moved the headquarters over to east coast US, come in to be CEO, so Dave and I were speculating, and the emergence of two different models. making the move en masse to flash. One of the current characteristics is they don't really know So I'd ask the question to you guys, So the modern data center, or even the next generation team to build this from the ground up. So are the buyers different, are those two worlds as the classic enterprise data centers start to look and why are you guys really driving that? But I think from a customer standpoint, the ability to you get all the federal information stuff going on I'm going to buy my HR software from, you know, is the ability to deliver consistent performance, And the need to support these types of workloads You have to be substantially better as a small company. a lot of the big guys realized right away, wow, the bulk of the legacy vendors, you know, leverages the existing capability to aggregate and to communicate, and that's here today, finally. and allows you to dynamically compose virtual arrays Well, here's the numbers. and the old way of doing things. the table stays all shift, data protection, So the disability takes vision away, So you guys, really, So talk about your channel, and you go to market, I mean, let's face it, the flash space is competitive, to respond to the market transit and value. from it, cause that's all, at the end of the day, end of the great we opened up Korea, we're now in Singapore, of the next-generation data system we talked about, TheCUBE bringing you the CUBE as a service,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Josh EpsteinPERSON

0.99+

TelcosORGANIZATION

0.99+

ElayPERSON

0.99+

DavePERSON

0.99+

JoshPERSON

0.99+

2017DATE

0.99+

AmazonORGANIZATION

0.99+

BostonLOCATION

0.99+

Scott DietzenPERSON

0.99+

IsraelLOCATION

0.99+

Jean CarloPERSON

0.99+

John FurrierPERSON

0.99+

100%QUANTITY

0.99+

Eyal DavidPERSON

0.99+

VMwareORGANIZATION

0.99+

Las VegasLOCATION

0.99+

SingaporeLOCATION

0.99+

VionORGANIZATION

0.99+

last yearDATE

0.99+

CISCOORGANIZATION

0.99+

DavidPERSON

0.99+

2008DATE

0.99+

KaminarioORGANIZATION

0.99+

2009DATE

0.99+

Pure StorageORGANIZATION

0.99+

ChinaLOCATION

0.99+

SASORGANIZATION

0.99+

eighth yearQUANTITY

0.99+

two weeks agoDATE

0.98+

EuropeLOCATION

0.98+

Silver LakeLOCATION

0.98+

secondQUANTITY

0.98+

VMworld 2017EVENT

0.98+

this weekDATE

0.98+

GoliathPERSON

0.98+

oneQUANTITY

0.98+

VMwearORGANIZATION

0.97+

OneQUANTITY

0.97+

first modelQUANTITY

0.97+

USLOCATION

0.97+

two different modelsQUANTITY

0.97+

two years agoDATE

0.97+

70,000-personQUANTITY

0.96+

K2.NTITLE

0.96+

KoreaLOCATION

0.96+

Boston, MassachusettsLOCATION

0.95+

two waysQUANTITY

0.95+

FranceLOCATION

0.95+

two great guestsQUANTITY

0.94+

Q1DATE

0.94+

Amazon Public Sector SummitEVENT

0.94+

todayDATE

0.93+

firstQUANTITY

0.93+

UKLOCATION

0.92+

theCUBEORGANIZATION

0.91+

two different marketsQUANTITY

0.91+

two worldsQUANTITY

0.9+

VMworldORGANIZATION

0.88+

K2.nTITLE

0.87+

one-timeQUANTITY

0.86+

TheCUBEORGANIZATION

0.84+

NVIOTITLE

0.83+

Vision OSTITLE

0.82+

CTOPERSON

0.81+

KaminarioPERSON

0.79+

Kaminario FlexTITLE

0.77+

CUBETITLE

0.74+

MNAORGANIZATION

0.72+

EyalPERSON

0.67+

VMworldEVENT

0.61+