INSURANCE V1 | CLOUDERA

>>Good morning or good afternoon or good evening, depending on where you are and welcome to this session, reduce claims, fraud, we're data, very excited to have you all here. My name is Winnie castling and I'm Cloudera as managing director for the insurance vertical. First and foremost, we want to let you know that we know insurance. We have done it for a long time. Collectively, personally, I've done it for over 30 years. And, you know, as a proof of that, we want to let you know that we insure, we insure as well as we do data management work for the top global companies in the world, in north America, over property casualty, general insurance health, and, um, life and annuities. But besides that, we also take care of the data needs for some smaller insurance companies and specialty companies. So if you're not one of the huge Glomar conglomerates in the world, you are still perfectly fine with us. >>So >>Why are we having this topic today? Really digital claims and digital claims management is accelerating. And that's based on a couple of things. First and foremost, customers are asking for it. Customers are used to doing their work more digitally over the last descending year or two. And secondly, with the last year or almost two, by now with the changes that we made in our work processes and in society at large around cuvettes, uh, both regulators, as well as companies have enabled digital processing and the digital journey to a degree that they've never done before. Now that had some really good impacts for claims handling. It did meant that customers were more satisfied. They felt they have more control over their processes in the cloud and the claims experience. It also reduced in a lot of cases, both in commercial lines, as well as in personal lines, the, um, the, the time periods that it took to settle on a claim. However, um, the more digital you go, it, it opened up more access points for fraud, illicit activities. So unfortunately we saw indicators of fraud and fraud attempts, you know, creeping up over the last time period. So we thought it was a good moment to look at, you know, some use cases and some approaches insurers can take to manage that even better than they already >>Are. >>And this is how we plan to do that. And this is how we see this in action. On the left side, you see progress of data analytics and data utilization, um, around, in this case, we're talking about claims fraud, but it's a generic picture. And really what it means is most companies that start with data affords pretty much start around data warehousing and we eliminate analytics and all around BI and reporting, which pretty much is understanding what we know, right? The data that we already have utilizing data to understand better what we know already. Now, when we move to the middle blue collar, we get into different types of analytics. We get into exploratory data science, we get to predictions and we start getting in the space of describing what we can learn from what we know, but also start moving slowly into predicting. So first of all, learn and gather insights of what we already know, and then start augmenting with that with other data sets and other findings, so that we can start predicting for the future, what might happen. >>And that's the point where we get to AI, artificial intelligence and machine learning, which will help us predict which of our situations and claims are most likely to have a potential fraud or abuse scenario attached to it. So that's the path that insurers and other companies take in their data management and analytics environments. Now, if you look at the right side of this light, you see data complexity per use cases in this case in fraud. So the bubbles represent the types of data that are being used, or the specific faces that we discussed on the left side. So for reporting, we used a TPA data, policy verification, um, claims file staff data, that it tends to be heavily structured and already within the company itself. And when you go to the middle to the more descriptive basis, you start getting into unstructured data, you see a lot of instructor texts there, and we do a use case around that later. >>And this really enables us to better understand what the scenarios are that we're looking at and where the risks are around. In our example today, fraud, abuse and issues of resources. And then the more you go to the upper right corner, you see the outside of the baseball field, people refer to it, you see new unstructured data sources that are being used. You tend to see the more complex use cases. And we're looking at picture analysis, we're looking at voice analysis there. We're looking at geolocation. That's quite often the first one we look at. So this slide actually shows you the progress and the path in complexity and in utilization of data and analytical tool sets to manage data fraud, fraud, use cases, optimally. >>Now how we do that and how we look at at a Cloudera is actually not as complicated as, as this slight might want to, um, to, to give you an impression. So let's start at the left side at the left side, you see the enterprise data, which is data that you as an organization have, or that you have access to. It doesn't have to be internal data, but quite often it is now that data goes into a data journey, right? It gets collected first. It gets manipulated and engineered so that people can do something with it. It gets stored something, you know, people need to have access to it. And then they get into analytical capabilities who are inside gathering and utilization. Now, especially for insurance companies that all needs to be underpinned by a very, very strong security and governance, uh, environment. Because if not the most regulated industry in the world, insurance is awfully close. >>And if it's not the most regulated one, it's a close second. So it's critically important that insurers know, um, where the data is, who has access to it for Rodriguez, uh, what is being used for so terms like lineage, transparency are crucial, crucially important for insurance. And we manage that in the shared data experience. So it goes over the whole Cloudera platform and every application or tool or experience you use would include Dao. And on the right side, you see the use cases that tend to be deployed around claims and claims fraud, claims, fraud management. So over the last year or so, we've seen a lot of use cases around upcoding people get one treatment or one fix on a car, but it gets coded as a more expensive one. That's a fraud scenario, right? We see also the more classical fraud things and we see anti money laundering. So those are the types of use cases on the right side that we are supporting, um, on the platform, uh, around, um, claims fraud. >>And this is an example of how that actually looks like now, this is a one that it's actually a live one of, uh, a company that had, um, claims that dealt with health situations and being killers. So that obviously is relevant for health insurers, but you also see it in, um, in auto claims and counterclaims, right, you know, accidents. There are a lot of different claims scenarios that have health risks associated with it. And what we did in this one is we joined tables in a complex schema. So we have to look at the claimant, the physician, the hospital, all the providers that are involved procedures that are being deployed. Medically medicines has been utilized to uncover the full picture. Now that is a hard effort in itself, just for one claim and one scenario. But if you want to see if people are abusing, for example, painkillers in this scenario, you need to do that over every instant that is member. >>This claimant has, you know, with different doctors, with different hospitals, with different pharmacies or whatever that classically it's a very complicated and complex, um, the and costly data operation. So nowadays that tends to be done by graph databases, right? So you put fraud rings within a graph database and walk the graph. And if you look at it here in batch, you can see that in this case, that is a member that was shopping around for being killers and went through different systems and different providers to get, um, multiple of the same big LR stat. You know, obviously we don't know what he or she did with it, but that's not the intent of the system. And that was actually a fraud and abuse case. >>So I want to share some customer success stories and recent, uh, AML and fraud use cases. And we have a couple of them and I'm not going to go in an awful lot of detail, um, about them because we have some time to spend on one of them immediately after this. But one of them for example, is voice analytics, which is a really interesting one. And on the baseball slide that I showed you earlier, that would be a right upper corner one. And what happened there is that an insurance company utilized the, uh, the voice records they got from the customer service people to try to predict which one were potentially fraud list. And they did it in two ways. They look at actually the contents of what was being said. So they looked at certain words that were being used certain trigger words, but they also were looking at tone of voice pitch of voice, uh, speed of talking. >>So they try to see trends there and hear trends that would, um, that would bring them for a potential bad situation. Now good and bad news of this proof of concept was it's. We learned that it's very difficult just because every human is different to get an indicator for bad behavior out of the pitch or the tone or the voice, you know, or those types of nonverbal communication in voice. But we did learn that it was easier to, to predict if a specific conversation needed to be transferred to somebody else based on emotion. You know, obviously as we all understand life and health situations tend to come with emotions, or so people either got very sad or they got very angry or so the proof of concept didn't really get us to a firm understanding of potential driverless situation, but it did get us to a much better understanding of workflow around, um, claims escalation, um, in customer service to route people, to the right person, depending on what they need. >>And that specific time, another really interesting one was around social media, geo open source, all sorts of data that we put together. And we linked to the second one that I listed on slide here that was an on-prem deployment. And that was actually an analysis that regulators were asking for in a couple of countries, uh, for anti money laundering scams, because there were some plots out there that networks of criminals would all buy the low value policies, surrendered them a couple of years later. And in that way, God criminal money into the regular amount of monetary system whitewashed the money and this needed some very specific and very, very complex link analysis because there were fairly large networks of criminals that all needed to be tied together, um, with the actions, with the policies to figure out where potential pain points were. And that also obviously included ecosystems, such as lawyers, administrative offices, all the other things, no, but most, you know, exciting. >>I think that we see happening at the moment and we, we, you know, our partner, if analytics just went live with this with a large insurer, is that by looking at different types that insurers already have, um, unstructured data, um, um, their claims nodes, um, repour its claims, filings, um, statements, voice records, augmented with information that they have access to, but that's not their ours such as geo information obituary, social media Boyd on the cloud. And we can analyze claims much more effectively and efficiently for fraud and litigation and alpha before. And the first results over the last year or two showcasing a significant degree is significant degrees in claims expenses and, um, and an increase at the right moment of what a right amount in claims payments, which is obviously a good thing for insurers. Right? So having said all of that, I really would like to give Sri Ramaswami, the CEO of infinite Lytics, the opportunity to walk you through this use case and actually show you how this looks like in real life. So Sheree, here >>You go. So >>Insurers often ask us this question, can AI help insurance companies, lower loss expenses, litigation, and help manage reserves better? We all know that insurance industry is majority. Majority of it is unstructured data. Can AI analyze all of this historically and look for patterns and trends to help workflows and improve process efficiencies. This is exactly why we brought together industry experts at infill lyrics to create the industries where very first pre-trained and prebuilt insights engine called Charlie, Charlie basically summarizes all of the data structured and unstructured. And when I say unstructured, I go back to what money basically traded. You know, it is including documents, reports, third-party, um, it reports and investigation, uh, interviews, statements, claim notes included as well at any third party enrichment that we can legally get our hands on anything that helps the adjudicate, the claims better. That is all something that we can include as part of the analysis. And what Charlie does is takes all of this data and very neatly summarizes all of this. After the analysis into insights within our dashboard, our proprietary naturally language processing semantic models adds the explanation to our predictions and insights, which is the key element that makes all of our insights >>Actually. So >>Let's just get into, um, standing what these steps are and how Charlie can help, um, you know, with the insights from the historical patterns in this case. So when the claim comes in, it comes with a lot of unstructured data and documents that the, uh, the claims operations team have to utilize to adjudicate, to understand and adjudicate the claim in an efficient manner. You are looking at a lot of documents, correspondences reports, third party reports, and also statements that are recorded within the claim notes. What Charlie basically does is crunches all, all of this data removes the noise from that and brings together five key elements, locations, texts, sentiments, entities, and timelines in the next step. >>In the next step, we are basically utilizing Charlie's built-in proprietary, natural language processing models to semantically understand and interpret all of that information and bring together those key elements into curated insights. And the way we do that is by building knowledge, graphs, and ontologies and dictionaries that can help understand the domain language and convert them into insights and predictions that we can display on the dash. Cool. And if you look at what has been presented in the dashboard, these are KPIs and metrics that are very interesting for a management staff or even the operations. So the management team can basically look at the dashboard and start with the summarized data and start to then dig deeper into each of the problematic areas and look at patterns at that point. And these patterns that we learn from not only from what the system can provide, but also from the historic data can help understand and uncover some of these patterns in the newer claims that are coming in so important to learn from the historic learnings and apply those learnings in the new claims that are coming in. >>Let's just take a very quick example of what this is going to look like a claims manager. So here the claims manager discovers from the summarized information that there are some problems in the claims that basically have an attorney involved. They have not even gone into litigation and they still are, you know, I'm experiencing a very large, um, average amount of claim loss when they compare to the benchmark. So this is where the manager wants to dig deeper and understand the patterns behind it from the historic data. And this has to look at the wealth of information that is sitting in the unstructured data. So Charlie basically pulls together all these topics and summarizes these topics that are very specific to certain losses combined with entities and timelines and sentiments, and very quickly be able to show to the manager where the problematic areas are and what are those patterns leading to high, severe claims, whether it's litigation or whether it's just high, severe indemnity payments. >>And this is where the managers can adjust their workflows based on what we can predict using those patterns that we have learned and predict the new claims, the operations team can also leverage Charlie's deep level insights, claim level insights, uh, in the form of red flags, alerts and recommendations. They can also be trained using these recommendations and the operations team can mitigate the claims much more effectively and proactively using these kind of deep level insights that need to look at unstructured data. So at the, at the end, I would like to say that it is possible for us to achieve financial benefits, leveraging artificial intelligence platforms like Charlie and help the insurers learn from their historic data and being able to apply that to the new claims, to work, to adjust their workflows efficiently. >>Thank you very much for you. That was very enlightening as always. And it's great to see that actually, some of the technology that we all work so hard on together, uh, comes to fruition in, in cost savings and efficiencies and, and help insurers manage potential bad situations, such as claims fraud batter, right? So to close this session out as a next step, we would really urge you to a Sasha available data sources and advanced or predictive fraud prevention capabilities aligned with your digital initiatives to digital initiatives that we all embarked on over the last year are creating a lot of new data that we can use to learn more. So that's a great thing. If you need to learn more at one to learn more about Cloudera and our insurance work and our insurance efforts, um, you to call me, uh, I'm very excited to talk about this forever. So if you want to give me a call or find a place to meet when that's possible again, and schedule a meeting with us, and again, we love insurance. We'll gladly talk to anyone until they say in parts of the United States, the cows come home about it. And we're dad. I want to thank you all for attending this session and hanging in there with us for about half an hour. And I hope you have a wonderful rest of the day. >>Good afternoon, I'm wanting or evening depending on where you are and welcome to this breakout session around insurance, improve underwriting with better insights. >>So first and >>Foremost, let's summarize very quickly, um, who we're with and what we're talking about today. My name is goonie castling, and I'm the managing director at Cloudera for the insurance vertical. And we have a sizeable presence in insurance. We have been working with insurance companies for a long time now, over 10 years, which in terms of insurance, it's maybe not that long, but for technology, it really is. And we're working with, as you can see some of the largest companies in the world and in the continents of the world. However, we also do a significant amount of work with smaller insurance companies, especially around specialty exposures and the regionals, the mutuals in property, casualty, general insurance, life, annuity, and health. So we have a vast experience of working with insurers. And, um, we'd like to talk a little bit today about what we're seeing recently in the underwriting space and what we can do to support the insurance industry in there. >>So >>Recently what we have been seeing, and it's actually accelerated as a result of the recent pandemic that we all have been going through. We see that insurers are putting even more emphasis on accounting for every individual customers with lotta be a commercial clients or a personal person, personal insurance risk in a dynamic and a B spoke way. And what I mean with that is in a dynamic, it means that risks and risk assessments change very regularly, right? Companies go into different business situations. People behave differently. Risks are changing all the time and the changing per person they're not changing the narrow generically my risk at a certain point of time in travel, for example, it might be very different than any of your risks, right? So what technology has started to enable is underwrite and assess those risks at those very specific individual levels. And you can see that insurers are investing in that capability. The value of, um, artificial intelligence and underwriting is growing dramatically. As you see from some of those quotes here and also risks that were historically very difficult to assess such as networks, uh, vendors, global supply chains, um, works workers' compensation that has a lot of moving parts to it all the time and anything that deals with rapidly changing risks, exposures and people, and businesses have been supported more and more by technology such as ours to help, uh, gone for that. >>And this is a bit of a difficult slide. So bear with me for a second here. What this slide shows specifically for underwriting is how data-driven insights help manage underwriting. And what you see on the left side of this slide is the progress in make in analytical capabilities. And quite often the first steps are around reporting and that tends to be run from a data warehouse, operational data store, Starsky, Matt, um, data, uh, models and reporting really is, uh, quite often as a BI function, of course, a business intelligence function. And it really, you know, at a regular basis informs the company of what has been taken place now in the second phase, the middle dark, the middle color blue. The next step that is shore stage is to get into descriptive analytics. And what descriptive analytics really do is they try to describe what we're learning in reporting. >>So we're seeing sorts and events and sorts and findings and sorts of numbers and certain trends happening in reporting. And in the descriptive phase, we describe what this means and you know why this is happening. And then ultimately, and this is the holy grill, the end goal we like to get through predictive analytics. So we like to try to predict what is going to happen, uh, which risk is a good one to underwrite, you know, watch next policy, a customer might need or wants water claims as we discuss it. And not a session today, uh, might become fraud or lists or a which one we can move straight through because they're not supposed to be any issues with it, both on the underwriting and the claims side. So that's where every insurer is shooting for right now. But most of them are not there yet. >>Totally. Right. So on the right side of this slide specifically for underwriting, we would, we like to show what types of data generally are being used in use cases around underwriting, in the different faces of maturity and analytics that I just described. So you will see that on the reporting side, in the beginning, we start with rates, information, quotes, information, submission information, bounding information. Um, then if you go to the descriptive phase, we start to add risk engineering information, risk reports, um, schedules of assets on the commercial side, because some are profiles, uh, as a descriptions, move into some sort of an unstructured data environment, um, notes, diaries, claims notes, underwriting notes, risk engineering notes, transcripts of customer service calls, and then totally to the other side of this baseball field looking slide, right? You will see the relatively new data sources that can add tremendous value. >>Um, but I'm not Whitely integrated yet. So I will walk through some use cases around these specifically. So think about sensors, wearables, you know, sensors on people's bodies, sensors, moving assets for transportation, drone images for underwriting. It's not necessary anymore to send, uh, an inspection person and inspector or risk, risk inspector or engineer to every building, you know, be insurers now, fly drones over it, to look at the roofs, et cetera, photos. You know, we see it a lot in claims first notice of loss, but we also see it for underwriting purposes that policies out there. Now that pretty much say sent me pictures of your five most valuable assets in your home and we'll price your home and all its contents for you. So we start seeing more and more movements towards those, as I mentioned earlier, dynamic and bespoke types of underwriting. >>So this is how Cloudera supports those initiatives. So on the left side, you see data coming into your insurance company. There are all sorts of different data. There are, some of them are managed and controlled by you. Some orders you get from third parties, and we'll talk about Della medics in a little bit. It's one of the use cases. They move into the data life cycle, the data journey. So the data is coming into your organization. You collected, you store it, you make it ready for utilization. You plop it either in an operational environment for processing or in an analytical environment for analysis. And then you close on the loop and adjusted from the beginning if necessary, no specifically for insurance, which is if not the most regulated industry in the world it's coming awfully close, and it will come in as a, a very admirable second or third. >>Um, it's critically important that that data is controlled and managed in the correct way on the old, the different regulations that, that we are subject to. So we do that in the cloud era Sharon's data experiment experience, which is where we make sure that the data is accessed by the right people. And that we always can track who did watch to any point in time to that data. Um, and that's all part of the Cloudera data platform. Now that whole environment that we run on premise as well as in the cloud or in multiple clouds or in hybrids, most insurers run hybrid models, which are part of the data on premise and part of the data and use cases and workloads in the clouds. We support enterprise use cases around on the writing in risk selection, individualized pricing, digital submissions, quote processing, the whole quote, quote bound process, digitally fraud and compliance evaluations and network analysis around, um, service providers. So I want to walk you to some of the use cases that we've seen in action recently that showcases how this work in real life. >>First one >>Is to seize that group plus Cloudera, um, uh, full disclosure. This is obviously for the people that know a Dutch health insurer. I did not pick the one because I happen to be dodged is just happens to be a fantastic use case and what they were struggling with as many, many insurance companies is that they had a legacy infrastructure that made it very difficult to combine data sets and get a full view of the customer and its needs. Um, as any insurer, customer demands and needs are rapidly changing competition is changing. So C-SAT decided that they needed to do something about it. And they built a data platform on Cloudera that helps them do a couple of things. It helps them support customers better or proactively. So they got really good in pinging customers on what potential steps they need to take to improve on their health in a preventative way. >>But also they sped up rapidly their, uh, approvals of medical procedures, et cetera. And so that was the original intent, right? It's like serve the customers better or retain the customers, make sure what they have the right access to the right services when they need it in a proactive way. As a side effect of this, um, data platform. They also got much better in, um, preventing and predicting fraud and abuse, which is, um, the topic of the other session we're running today. So it really was a good success and they're very happy with it. And they're actually starting to see a significant uptick in their customer service, KPIs and results. The other one that I wanted to quickly mention is Octo. As most of you know, Optune is a very, very large telemedics provider, telematics data provider globally. It's been with Cloudera for quite some time. >>This one I want to showcase because it showcases what we can do with data in mass amounts. So for Octo, we, um, analyze on Cloudera 5 million connected cars, ongoing with 11 billion data points. And really what they're doing is the creating the algorithms and the models and insurers use to, um, to, um, run, um, tell them insurance, telematics programs made to pay as you drive pay when you drive, pay, how you drive. And this whole telemedics part of insurance is actually growing very fast too, in, in, still in sort of a proof of concept mini projects, kind of initiatives. But, um, what we're succeeding is that companies are starting to offer more and more services around it. So they become preventative and predictive too. So now you got to the program staff being me as a driver saying, Monique, you're hopping in the car for two hours. >>Now, maybe it's time you take a break. Um, we see that there's a Starbucks coming up on the ride or any coffee shop. That's part of a bigger chain. Uh, we know because you have that app on your phone, that you are a Starbucks user. So if you stop there, we'll give you a 50 cents discount on your regular coffee. So we start seeing these types of programs coming through to, again, keep people safe and keep cars safe, but primarily of course the people in it, and those are the types of use cases that we start seeing in that telematic space. >>This looks more complicated than it is. So bear with me for a second. This is a commercial example because we see a data work. A lot of data were going on in commercial insurance. It's not Leah personal insurance thing. Commercial is near and dear to my heart. That's where I started. I actually, for a long time, worked in global energy insurance. So what this one wheelie explains is how we can use sensors on people's outfits and people's clothes to manage risks and underwrite risks better. So there are programs now for manufacturing companies and for oil and gas, where the people that work in those places are having sensors as part of their work outfits. And it does a couple of things. It helps in workers' comp underwriting and claims because you can actually see where people are moving, what they are doing, how long they're working. >>Some of them even tracks some very basic health-related information like blood pressure and heartbeat and stuff like that, temperature. Um, so those are all good things. The other thing that had to us, it helps, um, it helps collect data on the specific risks and exposures. Again, we're getting more and more to individual underwriting or individual risk underwriting, who insurance companies that, that ensure these, these, um, commercial, commercial, um, enterprises. So they started giving discounts if the workers were sensors and ultimately if there is an unfortunate event and it like a big accident or big loss, it helps, uh, first responders very quickly identify where those workers are. And, and, and if, and how they're moving, which is all very important to figure out who to help first in case something bad happens. Right? So these are the type of data that quite often got implements in one specific use case, and then get broadly moved to other use cases or deployed into other use cases to help price risks, betters better, and keep, you know, risks, better control, manage, and provide preventative care. Right? >>So these were some of the use cases that we run in the underwriting space that are very excited to talk about. So as a next step, what we would like you to do is considered opportunities in your own companies to advance risk assessment specific to your individual customer's need. And again, customers can be people they can be enterprises to can be other any, any insurable entity, right? The please physical dera.com solutions insurance, where you will find all our documentation assets and thought leadership around the topic. And if you ever want to chat about this, please give me a call or schedule a meeting with us. I get very passionate about this topic. I'll gladly talk to you forever. If you happen to be based in the us and you ever need somebody to filibuster on insurance, please give me a call. I'll easily fit 24 hours on this one. Um, so please schedule a call with me. I promise to keep it short. So thank you very much for joining this session. And as a last thing, I would like to remind all of you read our blogs, read our tweets. We'd our thought leadership around insurance. And as we all know, insurance is sexy.

Published Date : Aug 4 2021

SUMMARY :

of the huge Glomar conglomerates in the world, you are still perfectly fine with us. So we thought it was a good moment to look at, you know, some use cases and some approaches The data that we already have utilizing data to understand better what we know already. And when you go to the middle to the more descriptive basis, So this slide actually shows you the progress So let's start at the left side at the left side, And on the right side, you see the use cases that tend So we have to look at the claimant, the physician, the hospital, So nowadays that tends to be done by graph databases, right? And on the baseball slide that I showed you earlier, or the tone or the voice, you know, or those types of nonverbal communication fairly large networks of criminals that all needed to be tied together, the opportunity to walk you through this use case and actually show you how this looks So That is all something that we can include as part of the analysis. So um, you know, with the insights from the historical patterns in this case. And the way we do that is by building knowledge, graphs, and ontologies and dictionaries So here the claims manager discovers from Charlie and help the insurers learn from their historic data So if you want to give me a call or find a place to meet Good afternoon, I'm wanting or evening depending on where you are and welcome to this breakout session And we're working with, as you can see some of the largest companies in the world of the recent pandemic that we all have been going through. And quite often the first steps are around reporting and that tends to be run from a data warehouse, And in the descriptive phase, we describe what this means So on the right side of this slide specifically for underwriting, So think about sensors, wearables, you know, sensors on people's bodies, sensors, And then you close on the loop and adjusted from the beginning if necessary, So I want to walk you to some of the use cases that we've seen in action recently So C-SAT decided that they needed to do something about it. It's like serve the customers better or retain the customers, make sure what they have the right access to So now you got to the program staff and keep cars safe, but primarily of course the people in it, and those are the types of use cases that we start So what this one you know, risks, better control, manage, and provide preventative care. So as a next step, what we would like you to do is considered opportunities

ENTITIES

Entity	Category	Confidence
Starbucks	ORGANIZATION	0.99+
Charlie	PERSON	0.99+
two hours	QUANTITY	0.99+
Winnie castling	PERSON	0.99+
C-SAT	ORGANIZATION	0.99+
goonie castling	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
50 cents	QUANTITY	0.99+
Monique	PERSON	0.99+
24 hours	QUANTITY	0.99+
United States	LOCATION	0.99+
north America	LOCATION	0.99+
today	DATE	0.99+
Sharon	PERSON	0.99+
infinite Lytics	ORGANIZATION	0.99+
first steps	QUANTITY	0.99+
Sheree	PERSON	0.99+
First	QUANTITY	0.99+
second one	QUANTITY	0.99+
two ways	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
second phase	QUANTITY	0.99+
Optune	ORGANIZATION	0.99+
two	QUANTITY	0.99+
over 10 years	QUANTITY	0.99+
Sri Ramaswami	PERSON	0.99+
one scenario	QUANTITY	0.99+
last year	DATE	0.99+
Starsky	ORGANIZATION	0.99+
third	QUANTITY	0.99+
over 30 years	QUANTITY	0.98+
one treatment	QUANTITY	0.98+
one fix	QUANTITY	0.98+
one claim	QUANTITY	0.98+
11 billion data points	QUANTITY	0.97+
first	QUANTITY	0.97+
First one	QUANTITY	0.97+
first one	QUANTITY	0.97+
secondly	QUANTITY	0.96+
Glomar	ORGANIZATION	0.96+
second	QUANTITY	0.96+
5 million connected cars	QUANTITY	0.94+
Cloudera	TITLE	0.93+
about half an hour	QUANTITY	0.93+
Leah	ORGANIZATION	0.93+
five most valuable assets	QUANTITY	0.92+
a couple of years later	DATE	0.89+
each	QUANTITY	0.89+
first results	QUANTITY	0.88+
first notice	QUANTITY	0.88+
one of them	QUANTITY	0.82+
pandemic	EVENT	0.81+
Octo	TITLE	0.78+
Sasha	PERSON	0.77+
five key elements	QUANTITY	0.76+
Cloudera	PERSON	0.75+
use case	QUANTITY	0.73+
Dao	TITLE	0.7+
Matt	ORGANIZATION	0.69+
first responders	QUANTITY	0.69+
couple	QUANTITY	0.68+
Octo	ORGANIZATION	0.67+
Charlie	TITLE	0.66+

MANUFACTURING V1b | CLOUDERA

>>Welcome to our industry. Drill-downs from manufacturing. I'm here with Michael Gerber, who is the managing director for automotive and manufacturing solutions at cloud era. And in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data connected trucks are fundamental to optimizing fleet performance costs and delivering new services to fleet operators. And what's going to happen here is Michael's going to present some data and information, and we're gonna come back and have a little conversation about what we just heard. Michael, great to see you over to you. >>Oh, thank you, Dave. And I appreciate having this conversation today. Hey, um, you know, this is actually an area connected trucks. You know, this is an area that we have seen a lot of action here at Cloudera. And I think the reason is kind of important, right? Because, you know, first of all, you can see that, you know, this change is happening very, very quickly, right? 150% growth is forecast by 2022. Um, and the reasons, and I think this is why we're seeing a lot of action and a lot of growth is that there are a lot of benefits, right? We're talking about a B2B type of situation here. So this is truck made truck makers providing benefits to fleet operators. And if you look at the F the top fleet operator, uh, the top benefits that fleet operators expect, you see this in the graph over here. >>Now almost 80% of them expect improved productivity, things like improved routing rates. So route efficiencies and improve customer service decrease in fuel consumption, but better technology. This isn't technology for technology sake, these connected trucks are coming onto the marketplace because Hey, it can provide for Mendez value to the business. And in this case, we're talking about fleet operators and fleet efficiencies. So, you know, one of the things that's really important to be able to enable this right, um, trucks are becoming connected because at the end of the day, um, we want to be able to provide fleet deficiencies through connected truck, um, analytics and machine learning. Let me explain to you a little bit about what we mean by that, because what, you know, how this happens is by creating a connected vehicle analytics machine learning life cycle, and to do that, you need to do a few different things, right? >>You start off of course, with connected trucks in the field. And, you know, you can have many of these trucks cause typically you're dealing at a truck level and at a fleet level, right? You want to be able to do analytics and machine learning to improve performance. So you start off with these trucks. And the first you need to be able to do is connect to those products, right? You have to have an intelligent edge where you can collect that information from the trucks. And by the way, once you conducted the, um, this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now what I'm going to show you the ability to take this real-time action is actually the result of your machine learning license. Let me explain to you what I mean by that. >>So we have this trucks, we start to collect data from it right at the end of the day. Well we'd like to be able to do is pull that data into either your data center or into the cloud where we can start to do more advanced analytics. And we start with being able to ingest that data into the cloud, into that enterprise data lake. We store that data. We want to enrich it with other data sources. So for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've connected collected from those trucks. And you want to augment that with your dealership, say service information. Now you have, you know, you have sensor data and there was salting repair orders. You're now equipped to do things like predict one day maintenance will work correctly for all the data sets that you need to be able to do that. >>So what do you do here? Like I said, you adjusted your storage, you're enriching it with data, right? You're processing that data. You're aligning say the sensor data to that transactional system data from your, uh, from your, your pair maintenance systems, you know, you're bringing it together so that you can do two things you can do. First of all, you could do self-service BI on that date, right? You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to do create machine learning models. So if you have the sensor right values and the need, for example, for, for a dealership repair, or as you could start to correlate, which sensor values predicted the need for maintenance, and you could build out those machine learning models. And then as I mentioned to you, you could push those machine learning models back out to the edge, which is how you would then take those real-time action. >>I mentioned earlier as that data that then comes through in real-time, you're running it against that model, and you can take some real time actions. This is what we are, this, this, this, this analytics and machine learning model, um, machine learning life cycle is exactly what Cloudera enables this end-to-end ability to ingest, um, stroke, you know, store it, um, put a query, lay over it, um, machine learning models, and then run those machine learning models. Real-time now that's what we, that's what we do as a business. Now when such customer, and I just wanted to give you one example, um, a customer that we have worked with to provide these types of results is Navistar and Navistar was kind of an early, early adopter of connected truck analytics. And they provided these capabilities to their fleet operators, right? And they started off, uh, by, um, by, you know, connecting 475,000 trucks to up to well over a million now. >>And you know, the point here is with that, they were centralizing data from their telematics service providers, from their trucks, from telematics service providers. They're bringing in things like weather data and all those types of things. Um, and what they started to do was to build out machine learning models, aimed at predictive maintenance. And what's really interesting is that you see that Navistar, um, made tremendous strides in reducing the need or the expense associated with maintenance, right? So rather than waiting for a truck to break and then fixing it, they would predict when that truck needs service, condition-based monitoring and service it before it broke down so that you could do that in a much more cost-effective manner. And if you see the benefits, right, they, they reduced maintenance costs 3 cents a mile, um, from the, you know, down from the industry average of 15 cents a mile down to 12 cents cents a mile. >>So this was a tremendous success for Navistar. And we're seeing this across many of our, um, um, you know, um, uh, truck manufacturers. We were working with many of the truck OEMs and they are all working to achieve, um, you know, very, very similar types of, um, benefits to their customers. So just a little bit about Navistar. Um, now we're gonna turn to Q and a, Dave's got some questions for me in a second, but before we do that, if you want to learn more about our, how we work with connected vehicles and autonomous vehicles, please go to our lives or to our website, what you see up, uh, up on the screen, there's the URLs cloudera.com for slash solutions for slash manufacturing. And you'll see a whole slew of, um, um, lateral and information, uh, in much more detail in terms of how we connect, um, trucks to fleet operators who provide analytics, use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >>Thank you. Uh, Michael, that's a great example. You've got, I love the life cycle. You can visualize that very well. You've got an edge use case you do in both real time inference, really at the edge. And then you're blending that sensor data with other data sources to enrich your models. And you can push that back to the edge. That's that lifecycle. So really appreciate that, that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into. >>Yeah, that's really, that's a great question. They, you know, cause you know, everybody always thinks about machine learning. Like this is the first thing you go, well, actually it's not right for the first thing you really want to be able to go around. Many of our customers are doing slow. Let's simply connect our trucks or our vehicles or whatever our IOT asset is. And then you can do very simple things like just performance monitoring of the, of the piece of equipment in the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how has the, how has the driver performing? Is there a lot of idle time spent, um, you know, what's, what's route efficiencies looking like, you know, by connecting the vehicles, right? You get insights, as I said into the truck and into the driver and that's not machine learning. >>Right. But that, that, that monitoring piece is really, really important. The first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, uh, what I call the machine learning and AI models, where you're using inference on the edge. And then you start to see things like, uh, predictive maintenance happening, um, kind of route real-time, route optimization and things like that. And you start to see that evolution again, to those smarter, more intelligent dynamic types of decision-making, but let's not, let's not minimize the value of good old fashioned monitoring that site to give you that kind of visibility first, then moving to smarter use cases as you, as you go forward. >>You know, it's interesting. I'm, I'm envisioning when you talked about the monitoring, I'm envisioning a, you see the bumper sticker, you know, how am I driving this all the time? If somebody ever probably causes when they get cut off it's snow and you know, many people might think, oh, it's about big brother, but it's not. I mean, that's yeah. Okay, fine. But it's really about improvement and training and continuous improvement. And then of course the, the route optimization, I mean, that's, that's bottom line business value. So, so that's, I love those, uh, those examples. Um, I wonder, I mean, one of the big hurdles that people should think about when they want to jump into those use cases that you just talked about, what are they going to run into, uh, you know, the blind spots they're, they're going to, they're going to get hit with, >>There's a few different things, right? So first of all, a lot of times your it folks aren't familiar with the kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set, right? That's very specialized hardware in the car and things like that. And protocols that's number one, that that's the classic, it OT kind of conundrum that, um, you know, uh, many of our customers struggle with, but then more fundamentally is, you know, if you look at the way these types of connected truck or IOT solutions started, you know, oftentimes they were, the first generation were very custom built, right? So they were brittle, right? They were kind of hardwired. And as you move towards, um, more commercial solutions, you had what I call the silo, right? You had fragmentation in terms of this capability from this vendor, this capability from another vendor, you get the idea, you know, one of the things that we really think that we need with that, that needs to be brought to the table is first of all, having an end to end data management platform, that's kind of integrated, it's all tested together. >>You have the data lineage across the entire stack, but then also importantly, to be realistic, we have to be able to integrate to, um, industry kind of best practices as well in terms of, um, solution components in the car, how the hardware and all those types things. So I think there's, you know, it's just stepping back for a second. I think that there is, has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of art, um, offerings. Um, our job as a software maker is to make that easier and connect those dots. So customers don't have to do it all on all on their own. >>And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about, you know, new types of hardware coming in, you guys are optimizing for that. We see the it and the OT worlds blending together, no question. And then that end to end management piece, you know, this is different from your right, from it, normally everything's controlled or the data center, and this is a metadata, you know, rethinking kind of how you manage metadata. Um, so in the spirit of, of what we talked about earlier today, uh, uh, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >>Yeah, I'm really glad you're asking that because we actually embarked on a product on a project called project fusion, which really was about integrating with, you know, when you look at that connected vehicle life cycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Cloudera is Peter piece of this was ingesting data and all the things I talked about being storing and the machine learning, right? And so we provide that end to end. But what we wanted to do is we wanted to partner with some key partners and the partners that we did with, um, integrate with or NXP NXP provides the service oriented gateways in the car. So that's a hardware in the car when river provides an in-car operating system, that's Linux, right? >>That's hardened and tested. We then ran ours, our, uh, Apache magnify, which is part of flood era data flow in the vehicle, right on that operating system. On that hardware, we pump the data over into the cloud where we did them, all the data analytics and machine learning and, and builds out these very specialized models. And then we used a company called Arabic equity. Once we both those models to do, you know, they specialize in automotive over the air updates, right? So they can then take those models and update those models back to the vehicle very rapidly. So what we said is, look, there's, there's an established, um, you know, uh, ecosystem, if you will, of leaders in this space, what we wanted to do is make sure that our, there was part and parcel of this ecosystem. And by the way, you mentioned Nvidia as well. We're working closely with Nvidia now. So when we're doing the machine learning, we can leverage some of their hardware to get some further acceleration in the machine learning side of things. So, uh, yeah, you know, one of the things I always say about this types of use cases, it does take a village. And what we've really tried to do is build out that, that, uh, an ecosystem that provides that village so that we can speed that analytics and machine learning, um, lifecycle just as fast as it can be. This >>Is again another great example of, of data intensive workloads. It's not your, it's not your grandfather's ERP. That's running on, you know, traditional, you know, systems it's, these are really purpose-built, maybe they're customizable for certain edge use cases. They're low cost, low, low power. They can't be bloated, uh, ended you're right. It does take an ecosystem. You've got to have, you know, API APIs that connect and, and that's that, that takes a lot of work and a lot of thoughts. So that, that leads me to the technologies that are sort of underpinning this we've talked we've we talked a lot in the cube about semiconductor technology, and now that's changing and the advancements we're seeing there, what do you see as the, some of the key technical technology areas that are advancing this connected vehicle machine learning? >>You know, it's interesting, I'm seeing it in a few places, just a few notable ones. I think, first of all, you know, we see that the vehicle itself is getting smarter, right? So when you look at, we look at that NXP type of gateway that we talked about that used to be kind of a, a dumb gateway. That was really all it was doing was pushing data up and down and provided isolation, um, as a gateway down to the, uh, down from the lower level subsistence. So it was really security and just basic, um, you know, basic communication that gateway now is becoming what they call a service oriented gate. So it can run. It's not that it's bad desk. It's got memories that always, so now you could run serious compute in the car, right? So now all of these things like running machine learning, inference models, you have a lot more power in the corner at the same time. >>5g is making it so that you can push data fast enough, making low latency computing available, even on the cloud. So now you now you've got credible compute both at the edge in the vehicle and on the cloud. Right. And, um, you know, and then on the, you know, on the cloud, you've got partners like Nvidia who are accelerating, it's still further through better GPU based compute. So I mean the whole stack, if you look at it, that that machine learning life cycle we talked about, no, David seems like there's improvements and EV every step along the way, we're starting to see technology, um, optimum optimization, um, just pervasive throughout the cycle. >>And then real quick, it's not a quick topic, but you mentioned security. If it was seeing a whole new security model emerge, there is no perimeter anymore in this use case like this is there. >>No there isn't. And one of the things that we're, you know, remember where the data management platform platform and the thing we have to provide is provide end-to-end link, you know, end end-to-end lineage of where that data came from, who can see it, you know, how it changed, right? And that's something that we have integrated into from the beginning of when that data is ingested through, when it's stored through, when it's kind of processed and people are doing machine learning, we provide, we will provide that lineage so that, um, you know, that security and governance is a short throughout the, throughout the data learning life cycle, it >>Federated across in this example, across the fleet. So, all right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it, >>Dave. Thank you. And thank you. Thanks for the audience for listening in today. Yes. Thank you for watching. >>Okay. We're here in the second manufacturing drill down session with Michael Gerber. He was the managing director for automotive and manufacturing solutions at Cloudera. And we're going to continue the discussion with a look at how to lower costs and drive quality in IOT analytics with better uptime. And look, when you do the math, that's really quite obvious when the system is down, productivity is lost and it hits revenue and the bottom line improve quality drives, better service levels and reduces loss opportunities. Michael. Great to see you >>Take it away. All right. Thank you so much. So I'd say we're going to talk a little bit about connected manufacturing, right. And how those IOT IOT around connected manufacturing can do as Dave talked about improved quality outcomes for manufacturing improve and improve your plant uptime. So just a little bit quick, quick, little indulgent, quick history lesson. I promise to be quick. We've all heard about industry 4.0, right? That is the fourth industrial revolution. And that's really what we're here to talk about today. First industrial revolution, real simple, right? You had steam power, right? You would reduce backbreaking work. Second industrial revolution, massive assembly line. Right. So think about Henry Ford and motorized conveyor belts, mass automation, third industrial revolution. Things got interesting, right? You started to see automation, but that automation was done, essentially programmed a robot to do something. It did the same thing over and over and over irrespective about it, of how your outside operations, your outside conditions change fourth industrial revolution, very different breakfast. >>Now we're connecting, um, equipment and processes and getting feedback from it. And through machine learning, we can make those, um, those processes adaptive right through machine learning. That's really what we're talking about in the fourth industrial revolution. And it is intrinsically connected to data and a data life cycle. And by the way, it's important, not just for a little bit of a slight issue. There we'll issue that, but it's important, not for technology sake, right? It's important because it actually drives and very important business outcomes. First of all, quality, right? If you look at the cost of quality, even despite decades of, of, of, of, uh, companies, um, and manufacturers moving to improve while its quality promise still accounted to 20% of sales, right? So every fifth of what you meant or manufactured from a revenue perspective, you've got quality issues that are costing you a lot. >>Plant downtime, cost companies, $50 billion a year. So when we're talking about using data and these industry 4.0 types of use cases, connected data types of use cases, we're not doing it just merely to implement technology. We're doing it to move these from drivers, improving quality, reducing downtime. So let's talk about how a connected manufacturing data life cycle, what like, right, because this is actually the business that cloud era is, is in. Let's talk a little bit about that. So we call this manufacturing edge to AI, this, this analytics life cycle, and it starts with having your plants, right? Those plants are increasingly connected. As I said, sensor prices have come down two thirds over the last decade, right? And those sensors have connected over the internet. So suddenly we can collect all this data from your, um, ma manufacturing plants. What do we want to be able to do? >>You know, we want to be able to collect it. We want to be able to analyze that data as it's coming across. Right? So, uh, in scream, right, we want to be able to analyze it and take intelligent real-time actions. Right? We might do some simple processing and filtering at the edge, but we really want to take real-time actions on that data. But, and this is the inference part of things, right? Taking the time. But this, the ability to take these real-time actions, um, is actually the result of a machine learning life cycle. I want to walk you through this, right? And it starts with, um, ingesting this data for the first time, putting it into our enterprise data lake, right in that data lake enterprise data lake can be either within your data center or it could be in the cloud. You've got, you're going to ingest that data. >>You're going to store it. You're going to enrich it with enterprise data sources. So now you'll have say sensor data and you'll have maintenance repair orders from your maintenance management systems. Right now you can start to think about do you're getting really nice data sets. You can start to say, Hey, which sensor values correlate to the need for machine maintenance, right? You start to see the data sets. They're becoming very compatible with machine learning, but so you, you bring these data sets together. You process that you align your time series data from your sensors to your timestamp data from your, um, you know, from your enterprise systems that your maintenance management system, as I mentioned, you know, once you've done that, we could put a query layer on top. So now we can start to do advanced analytics query across all these different types of data sets. >>But as I mentioned, you, and what's really important here is the fact that once you've stored long histories that say that you can build out those machine learning models I talked to you about earlier. So like I said, you can start to say, which sensor values drove the need, a correlated to the need for equipment maintenance for my maintenance management systems, right? And you can build out those models and say, Hey, here are the sensor values of the conditions that predict the need for Maples. Once you understand that you can actually then build out those models for deploy the models out the edge, where they will then work in that inference mode that we talked about, I will continuously sniff that data as it's coming and say, Hey, which are the, are we experiencing those conditions that PR that predicted the need for maintenance? If so, let's take real-time action, right? >>Let's schedule a work order or an equipment maintenance work order in the past, let's in the future, let's order the parts ahead of time before that piece of equipment fails and allows us to be very, very proactive. So, you know, we have, this is a, one of the Mo the most popular use cases we're seeing in terms of connecting connected manufacturing. And we're working with many different manufacturers around the world. I want to just highlight. One of them is I thought it's really interesting. This company is bought for Russia, for SIA, for ACA is the, um, is the, was, is the, um, the, uh, a supplier associated with Peugeot central line out of France. They are huge, right? This is a multi-national automotive parts and systems supplier. And as you can see, they operate in 300 sites in 35 countries. So very global, they connected 2000 machines, right. >>Um, and then once be able to take data from that. They started off with learning how to ingest the data. They started off very well with, um, you know, with, uh, manufacturing control towers, right? To be able to just monitor data firms coming in, you know, monitor the process. That was the first step, right. Uh, and, you know, 2000 machines, 300 different variables, things like, um, vibration pressure temperature, right? So first let's do performance monitoring. Then they said, okay, let's start doing machine learning on some of these things to start to build out things like equipment, um, predictive maintenance models or compute. And what they really focused on is computer vision while the inspection. So let's take pictures of, um, parts as they go through a process and then classify what that was this picture associated with the good or bad Bali outcome. Then you teach the machine to make that decision on its own. >>So now, now the machine, the camera is doing the inspections. And so they both had those machine learning models. They took that data, all this data was on-prem, but they pushed that data up to the cloud to do the machine learning models, develop those machine learning models. Then they push the machine learning models back into the plants where they, where they could take real-time actions through these computer vision, quality inspections. So great use case. Um, great example of how you can start with monitoring, moved to machine learning, but at the end of the day, or improving quality and improving, um, uh, equipment uptime. And that is the goal of most manufacturers. So with that being said, um, I would like to say, if you want to learn some more, um, we've got a wealth of information on our website. You see the URL in front of you, please go there and you'll learn. There's a lot of information there in terms of the use cases that we're seeing in manufacturing, a lot more detail, and a lot more talk about a lot more customers we'll work with. If you need that information, please do find it. Um, with that, I'm going to turn it over to Dave, to Steve. I think you had some questions you want to run by. >>I do, Michael, thank you very much for that. And before I get into the questions, I just wanted to sort of make some observations that was, you know, struck by what you're saying about the phases of industry. We talk about industry 4.0, and my observation is that, you know, traditionally, you know, machines have always replaced humans, but it's been around labor and, and the difference with 4.0, and what you talked about with connecting equipment is you're injecting machine intelligence. Now the camera inspection example, and then the machines are taking action, right? That's, that's different and, and is a really new kind of paradigm here. I think the, the second thing that struck me is, you know, the cost, you know, 20% of, of sales and plant downtime costing, you know, many tens of billions of dollars a year. Um, so that was huge. I mean, the business case for this is I'm going to reduce my expected loss quite dramatically. >>And then I think the third point, which we turned in the morning sessions, and the main stage is really this, the world is hybrid. Everybody's trying to figure out hybrid, get hybrid, right. And it certainly applies here. Uh, this is, this is a hybrid world you've got to accommodate, you know, regardless of, of where the data is. You've gotta be able to get to it, blend it, enrich it, and then act on it. So anyway, those are my big, big takeaways. Um, so first question. So in thinking about implementing connected manufacturing initiatives, what are people going to run into? What are the big challenges that they're going to, they're going to hit, >>You know, there's, there's, there, there's a few of the, but I think, you know, one of the ones, uh, w one of the key ones is bridging what we'll call the it and OT data divide, right. And what we mean by the it, you know, your, it systems are the ones, your ERP systems, your MES systems, right? Those are your transactional systems that run on relational databases and your it departments are brilliant, are running on that, right? The difficulty becomes an implementing these use cases that you also have to deal with operational technology, right? And those are, um, all of the, that's all the equipment in your manufacturing plant that runs on its proprietary network with proprietorial pro protocols. That information can be very, very difficult to get to. Right. So, and it's, it's a much more unstructured than from your OT. So th the key challenge is being able to bring these data sets together in a single place where you can start to do advanced analytics and leverage that diverse data to do machine learning. Right? So that is one of the, if I boil it down to the single hardest thing in this, uh, in this, in this type of environment, nectar manufacturing is that that operational technology has kind of run on its own in its own world. And for a long time, the silos, um, uh, the silos a, uh, bound, but at the end of the day, this is incredibly valuable data that now can be tapped, um, um, to, to, to, to move those, those metrics we talked about right around quality and uptime. So a huge, >>Well, and again, this is a hybrid team and you, you've kind of got this world, that's going toward an equilibrium. You've got the OT side and, you know, pretty hardcore engineers. And we know, we know it. A lot of that data historically has been analog data. Now it's getting, you know, instrumented and captured. Uh, so you've got that, that cultural challenge. And, you know, you got to blend those two worlds. That's critical. Okay. So, Michael, let's talk about some of the use cases you touched on, on some, but let's peel the onion a bit when you're thinking about this world of connected manufacturing and analytics in that space, when you talk to customers, you know, what are the most common use cases that you see? >>Yeah, that's a good, that's a great question. And you're right. I did allude to a little bit earlier, but there really is. I want people to think about, there's a spectrum of use cases ranging from simple to complex, but you can get value even in the simple phases. And when I talk about the simple use cases, the simplest use cases really is really around monitoring, right? So in this, you monitor your equipment or monitor your processes, right? And you just make sure that you're staying within the bounds of your control plan, right. And this is much easier to do now. Right? Cause some of these sensors are a more sensors and those sensors are moving more and more towards internet types of technology. So, Hey, you've got the opportunity now to be able to do some monitoring. Okay. No machine learning, but just talking about simple monitoring next level down, and we're seeing is something we would call quality event forensic analysis. >>And now on this one, you say, imagine I've got warranty plans in the, in the field, right? So I'm starting to see warranty claims kick up. And what you simply want to be able to do is do the forensic analysis back to what was the root cause of within the manufacturing process that caused it. So this is about connecting the dots. What about warranty issues? What were the manufacturing conditions of the day that caused it? Then you could also say which other tech, which other products were impacted by those same conditions. And we call those proactively rather than, and, and selectively rather than say, um, recalling an entire year's fleet of the car. So, and that, again, also not machine learning, we're simply connecting the dots from a warranty claims in the field to the manufacturing conditions of the day, so that you could take corrective actions, but then you get into a whole slew of machine learning, use dates, you know, and that ranges from things like Wally or say yield optimization. >>We start to collect sensor values and, um, manufacturing yield, uh, values from your ERP system. And you're certain start to say, which, um, you know, which on a sensor values or factors drove good or bad yield outcomes, and you can identify those factors that are the most important. So you, um, you, you measure those, you monitor those and you optimize those, right. That's how you optimize your, and then you go down to the more traditional machine learning use cases around predictive maintenance. So the key point here, Dave is, look, there's a huge, you know, depending on a customer's maturity around big data, you could start simply with, with monitoring, get a lot of value, start then bringing together more diverse data sets to do things like connect the.analytics then and all the way then to, to, to the more advanced machine learning use cases, there's this value to be had throughout. >>I remember when the, you know, the it industry really started to think about, or in the early days, you know, IOT and IOT. Um, it reminds me of when, you know, there was, uh, the, the old days of football field, we were grass and, and the new player would come in and he'd be perfectly white uniform, and you had it. We had to get dirty as an industry, you know, it'll learn. And so, so I question it relates to other technology partners that you might be working with that are maybe new in this space that, that to accelerate some of these solutions that we've been talking about. >>Yeah. That's a great question. And it kind of goes back to one of the things I alluded to alluded upon earlier. We've had some great partners, a partner, for example, litmus automation, whose whole world is the OT world. And what they've done is for example, they built some adapters to be able to catch it practically every industrial protocol. And they've said, Hey, we can do that. And then give a single interface of that data to the Idera data platform. So now, you know, we're really good at ingesting it data and things like that. We can leverage say a company like litmus that can open the flood gates of that OT data, making it much easier to get that data into our platform. And suddenly you've got all the data you need to, to, to implement those types of, um, industry for porno, our analytics use cases. And it really boils down to, can I get to that? Can I break down that it OT, um, you know, uh, a barrier that we've always had and, and bring together those data sets that we can really move the needle in terms of improving manufacturing performance. >>Okay. Thank you for that last question. Speaking to moving the needle, I want to li lead this discussion on the technology advances. I'd love to talk tech here. Uh, what are the key technology enablers and advancers, if you will, that are going to move connected manufacturing and machine learning forward in this transportation space. Sorry, manufacturing. Yeah. >>Yeah. I know in the manufacturing space, there's a few things, first of all, I think the fact that obviously I know we touched upon this, the fact that sensor prices have come down and have become ubiquitous that number one, we can, we've finally been able to get to the OT data, right? That's that's number one, you know, numb number two, I think, you know, um, we, we have the ability that now to be able to store that data a whole lot more efficiently, you know, we've got, we've got great capabilities to be able to do that, to put it over into the cloud, to do the machine learning types of workloads. You've got things like if you're doing computer vision, while in analyst respect GPU's to make those machine learning models much more, uh, much more effective, if that 5g technology that starts to blur at least from a latency perspective where you do your computer, whether it be on the edge or in the cloud, you've, you've got more, the super business critical stuff. >>You probably don't want to rely on, uh, any type of network connection, but from a latency perspective, you're starting to see, uh, you know, the ability to do compute where it's the most effective now. And that's really important. And again, the machine learning capabilities, and they believed a book to build a GP, you know, GPU level machine learning, build out those models and then deployed by over the air updates to, to your equipment. All of those things are making this, um, there's, you know, the advanced analytics and machine learning, uh, data life cycle just faster and better. And at the end of the day, to your point, Dave, that equipment and processor getting much smarter, uh, very much more quickly. Yeah, we got >>A lot of data and we have way lower cost, uh, processing platforms I'll throw in NP use as well. Watch that space neural processing units. Okay. Michael, we're going to leave it there. Thank you so much. Really appreciate your time, >>Dave. I really appreciate it. And thanks. Thanks for, uh, for everybody who joined us. Thanks. Thanks for joining today. Yes. Thank you for watching. Keep it right there.

Published Date : Aug 4 2021

SUMMARY :

Michael, great to see you over to you. And if you look at the F the top fleet operator, uh, the top benefits that So, you know, one of the things that's really important to be able to enable this right, And by the way, once you conducted the, um, this information from the trucks, you want to be able to analyze And you want to augment that with your dealership, say service information. So what do you do here? And they started off, uh, by, um, by, you know, connecting 475,000 And you know, the point here is with that, they were centralizing data from their telematics service providers, many of our, um, um, you know, um, uh, truck manufacturers. And you can push that back to the edge. And then you can do very simple things like just performance monitoring And then you start to see things like, uh, predictive maintenance happening, uh, you know, the blind spots they're, they're going to, they're going to get hit with, it OT kind of conundrum that, um, you know, So I think there's, you know, it's just stepping back for a second. the data center, and this is a metadata, you know, rethinking kind of how you manage metadata. with, you know, when you look at that connected vehicle life cycle, there are some core vendors And by the way, you mentioned Nvidia as well. and now that's changing and the advancements we're seeing there, what do you see as the, um, you know, basic communication that gateway now is becoming um, you know, and then on the, you know, on the cloud, you've got partners like Nvidia who are accelerating, And then real quick, it's not a quick topic, but you mentioned security. And one of the things that we're, you know, remember where the data management Thank you so much for that great information. Thank you for watching. And look, when you do the math, that's really quite obvious when the system is down, productivity is lost and it hits Thank you so much. So every fifth of what you meant or manufactured from a revenue So we call this manufacturing edge to AI, I want to walk you through this, um, you know, from your enterprise systems that your maintenance management system, And you can build out those models and say, Hey, here are the sensor values of the conditions And as you can see, they operate in 300 sites in They started off very well with, um, you know, great example of how you can start with monitoring, moved to machine learning, I think the, the second thing that struck me is, you know, the cost, you know, 20% of, And then I think the third point, which we turned in the morning sessions, and the main stage is really this, And what we mean by the it, you know, your, it systems are the ones, You've got the OT side and, you know, pretty hardcore engineers. And you just make sure that you're staying within the bounds of your control plan, And now on this one, you say, imagine I've got warranty plans in the, in the field, look, there's a huge, you know, depending on a customer's maturity around big data, I remember when the, you know, the it industry really started to think about, or in the early days, you know, uh, a barrier that we've always had and, if you will, that are going to move connected manufacturing and machine learning forward that starts to blur at least from a latency perspective where you do your computer, and they believed a book to build a GP, you know, GPU level machine learning, Thank you so much. Thank you for watching.

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave	PERSON	0.99+
Michael	PERSON	0.99+
France	LOCATION	0.99+
Michael Gerber	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
300 sites	QUANTITY	0.99+
12 cents	QUANTITY	0.99+
David	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
2000 machines	QUANTITY	0.99+
first	QUANTITY	0.99+
2000 machines	QUANTITY	0.99+
Peugeot	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
2022	DATE	0.99+
150%	QUANTITY	0.99+
today	DATE	0.99+
second thing	QUANTITY	0.99+
35 countries	QUANTITY	0.99+
first generation	QUANTITY	0.99+
first step	QUANTITY	0.99+
Peter	PERSON	0.99+
475,000 trucks	QUANTITY	0.99+
first question	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
first time	QUANTITY	0.99+
NXP	ORGANIZATION	0.99+
Russia	LOCATION	0.99+
single	QUANTITY	0.99+
first session	QUANTITY	0.98+
third point	QUANTITY	0.98+
SIA	ORGANIZATION	0.98+
Linux	TITLE	0.98+
3 cents a mile	QUANTITY	0.98+
decades	QUANTITY	0.98+
Apache	ORGANIZATION	0.98+
one example	QUANTITY	0.98+
litmus	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one	QUANTITY	0.97+
15 cents a mile	QUANTITY	0.97+
one day	QUANTITY	0.97+
300 different variables	QUANTITY	0.97+
ACA	ORGANIZATION	0.96+
cloudera.com	OTHER	0.96+
two things	QUANTITY	0.95+
fourth industrial revolution	EVENT	0.95+
Mendez	PERSON	0.95+
two worlds	QUANTITY	0.94+
$50 billion a year	QUANTITY	0.94+
almost 80%	QUANTITY	0.94+
fourth industrial revolution	EVENT	0.93+
Idera	ORGANIZATION	0.93+
First industrial revolution	QUANTITY	0.93+
two thirds	QUANTITY	0.92+
4.0 types	QUANTITY	0.92+

COMMUNICATIONS V1 | CLOUDERA

>>Hi today, I'm going to talk about network analytics and what that means for, for telecommunications as we go forward. Um, thinking about, uh, 5g, what the impact that's likely to have on, on network analytics and the data requirement, not just to run the network and to understand the network a little bit better. Um, but also to, to inform the rest of the operation of the telecommunications business. Um, so as we think about where we are in terms of network analytics and what that is over the last 20 years, the telecommunications industry has evolved its management infrastructure, uh, to abstract away from some of the specific technologies in the network. So what do we mean by that? Well, uh, in the, in the initial, uh, telecommunications networks were designed, there were management systems that were built in, um, eventually fault management systems, uh, assurance systems, provisioning systems, and so on were abstracted away. >>So it didn't matter what network technology had, whether it was a Nokia technology or Erickson technology or Huawei technology or whatever it happened to be. You could just look at your fault management system, understand where false, what happened as we got into the last sort of 10, 15 years or so. Telecommunication service providers become became more sophisticated in terms of their approach to data analytics and specifically network analytics, and started asking questions about why and what if in relation to their network performance and network behavior. And so network analytics as a, as a bit of an independent function was born and over time, more and more data began to get loaded into the network analytics function. So today just about every carrier in the world has a network analytics function that deals with vast quantities of data in big data environments that are now being migrated to the cloud. >>As all telecommunications carriers are migrating as many it workloads as possible, um, to the cloud. So what are the things that are happening as we migrate to the cloud that drive, uh, uh, enhancements in use cases and enhancements and scale, uh, in telecommunications network analytics? Well, 5g is the big thing, right? So 5g, uh, it's not just another G in that sense. I mean, in some cases, in some senses, it is 5g means greater bandwidth, lower latency and all those good things. So, you know, we can watch YouTube videos with less interference and, and less sluggish bandwidth and so on and so forth. But 5g is really about the enterprise and enterprise services. Transformation, 5g is more secure, kind of a network, but 5g is also a more pervasive network 5g, a fundamentally different network topology than previous generations. So there's going to be more masts and that means that you can have more pervasive connectivity. >>Uh, so things like IOT and edge applications, autonomous cars, smart cities, these kinds of things, um, are all much better served because you've got more masks that of course means that you're going to have a lot more data as well. And we'll get to that. The second piece is immersive digital services. So with more masks, with more connectivity, with lower latency with higher man, the potential, uh, is, is, is, is immense for services innovation. And we don't know what those services are going to be. We know that technologies like augmented reality, virtual reality, things like this have great potential. Um, but we, we have yet to see where those commercial applications are going to be, but the innovation and the innovation potential for 5g is phenomenal. Um, it certainly means that we're going to have a lot more, uh, edge devices, um, uh, and that again is going to lead to an increase in the amount of data that we have available. >>And then the idea of pervasive connectivity when it comes to smart, smart cities, uh, autonomous, autonomous currents, um, uh, integrated traffic management systems, um, all of this kind of stuff, those of those kind of smart environments thrive where you've got this kind of pervasive connectivity, this persistent, uh, connection to the network. Um, again, that's going to drive, um, um, uh, more innovation. And again, because you've got these new connected devices, you're going to get even more data. So this rise, this exponential rise in data is really what's driving the change in, in network analytics. And there are four major vectors that are driving this increase in data in terms of both volume and in terms of speed. So the first is more physical elements. So we said already that 5g networks are going to have a different apology. 5g networks will have more devices, more and more masks. >>Um, and so with more physical elements in the network, you're going to get more physical data coming off those physical networks. And so that needs to be aggregated and collected and managed and stored and analyzed and understood when, so that we can, um, have a better understanding as to why things happened the way they do, why the network behaves in which they do in, in, in, in ways that it does and why devices that are connected to the network. And ultimately of course, consumers, whether they be enterprises or retail customers, um, behave in the way they do in relation to their interaction within our edge nodes and devices, we're going to have a, uh, an explosion in terms of the number of devices. We've already seen IOT devices with your different kinds of trackers and, uh, and, and sensors that are hanging off the edge of the network, whether it's to make buildings smarter car smarter, or people smarter, um, in, in terms of having the, the, the measurements and the connectivity and all that sort of stuff. >>So the numbers of devices on the agent beyond the age, um, are going to be phenomenal. One of the things that we've been trying to with as an industry over the last few years is where does the telco network end, and where does the enterprise, or even the consumer network begin. You used to be very clear that, you know, the telco network ended at the router. Um, but now it's not, it's not that clear anymore because in the enterprise space, particularly with virtualized networking, which we're going to talk about in a second, um, you start to see end to end network services being deployed. Um, uh, and so are they being those services in some instances are being managed by the service provider themselves, and in some cases by the enterprise client, um, again, the line between where the telco network ends and where the enterprise or the consumer network begins, uh, is not clear. >>Uh, so, so those edge, the, the, the proliferation of devices at the age, um, uh, in terms of, um, you know, what those devices are, what the data yield is and what the policies are, their need to govern those devices, um, in terms of security and privacy, things like that, um, that's all going to be really, really important virtualized services. We just touched on that briefly. One of the big, big trends that's happening right now is not just the shift of it operations onto the cloud, but the shift of the network onto the cloud, the virtualization of network infrastructure, and that has two major impacts. First of all, it means that you've got the agility and all of the scale, um, uh, benefits that you get from migrating workloads to the cloud, the elasticity and the growth and all that sort of stuff. But arguably more importantly for the telco, it means that with a virtualized network infrastructure, you can offer entire networks to enterprise clients. >>So if you're selling to a government department, for example, is looking to stand up a system for certification of, of, you know, export certification, something like that. Um, you can not just sell them the connectivity, but you can sell them the networking and the infrastructure in order to serve that entire end to end application. You could sentence, you could offer them in theory, an entire end-to-end communications network, um, and with 5g network slicing, they can even have their own little piece of the 5g bandwidth that's been allocated against the carrier, um, uh, and, and have a complete end to end environment. So the kinds of services that can be offered by telcos, um, given virtualize network infrastructure, uh, are, are many and varied. And it's a, it's a, it's a, um, uh, an outstanding opportunity. But what it also means is that the number of network elements virtualized in this case is also exploding. >>That means the amount of data that we're getting on, uh, informing us as to how those network elements are behaving, how they're performing, um, uh, is, is, is going to go up as well. And then finally, AI complexity. So on the demand side, um, while historically, uh, um, network analytics, big data, uh, has been, has been driven by, um, returns in terms of data monetization, uh, whether that's through cost avoidance, um, or service assurance, uh, or even revenue generation through data monetization and things like that. AI is transforming telecommunications and every other industry, the potential for autonomous operations, uh, is extremely attractive. And so understanding how the end-to-end telecommunication service delivering delivery infrastructure works, uh, is essential, uh, as a training ground for AI models that can help to automate a huge amount of telecommunications operating, um, processes. So the AI demand for data is just going through the roof. >>And so all of these things combined to mean big data is getting explosive. It is absolutely going through the roof. So that's a huge thing that's happening. So as telecommunications companies around the world are looking at their network analytics infrastructure, which was initially designed for service insurance primarily, um, and how they migrate that to the cloud. These things are impacting on those decisions because you're not just looking at migrating a workload to operate in the cloud that used to work in the, in the data center. Now you're looking at, um, uh, migrating a workload, but also expanding the use cases in that work and bear in mind, many of those, those are going to need to remain on prem. So they'll need to be within a private cloud or at best a hybrid cloud environment in order to satisfy a regulatory jurisdictional requirements. So let's talk about an example. >>So LGU plus is a Finastra fantastic service provider in Korea. Um, huge growth in that business over the last, uh, over the last 10, 15 years or so. Um, and obviously most people will be familiar with LG, the electronics brand, maybe less so with, uh, with LG plus, but they've been doing phenomenal work. And we're the first, uh, business in the world who launch commercial 5g in 2019. And so a huge milestone that they achieved. And at the same time they deploy the network real-time analytics platform or in rep, uh, from a combination of Cloudera and our partner calmer. Now, um, there were a number of things that were driving, uh, the requirement for it, for the, for the analytics platform at the time. Um, clearly the 5g launch was that was the big thing that they had in mind, but there were other things that re so within the 5g launch, um, uh, they were looking for, for visibility of services, um, and service assurance and service quality. >>So, you know, what services have been launched? How are they being taken up? What are the issues that are arising, where are the faults happening? Um, where are the problems? Because clearly when you launch a new service, but then you want to understand and be on top of the issues as they arise. Um, so that was really, really important. The second piece was, and, you know, this is not a new story to any telco in the world, right. But there are silos in operation. Uh, and so, um, taking advantage of, um, or eliminating redundancies through the process, um, of, of digital transformation, it was really important. And so particular, the two silos between wired and the wireless sides of the business come together so that there would be an integrated network management system, um, for, uh, for LGU plus, as they rolled out 5g. So eliminating redundancy and driving cost savings through the, the integration of the silos is really, really important. >>And that's a process and the people thing every bit, as much as it is a systems and a data thing. So, um, another big driver and the fourth one, you know, we've talked a little bit about some of these things, right? 5g brings huge opportunity for enterprise services, innovation. So industry 4.0 digital experience, these kinds of use cases, um, are very important in the south Korean marketing and in the, um, in the business of LGU plus. And so, uh, um, looking at AI and how can you apply AI to network management? Uh, again, there's a number of use cases, really, really exciting use cases that have gone live now, um, in LG plus since, uh, since we did this initial deployment and they're making fantastic strides there, um, big data analytics for users across LGU plus, right? So it's not just for, um, uh, it's not just for the immediate application of 5g or the support or the 5g network. >>Um, but also for other data analysts and data scientists across the LGU plus business network analytics, while primarily it's primary it's primary use case is around network management, um, LGU plus, or, or network analytics, um, has applications across the entire business, right? So, um, you know, for customer churn or next best offer for understanding customer experience and customer behavior really important there for digital advertising, for product innovation, all sorts of different use cases and departments within the business needed access to this information. So collaboration sharing across the network, the real-time network analytics platform, um, it was very important. And then finally, as I mentioned, LG group is much bigger than just LG plus it's because the electronics and other pieces, and they had launched a major group wide digital transformation program in 2019, and still being a part of that was, well, some of them, the problems that they were looking to address. >>Um, so first of all, the integration of wired and wireless data service data sources, and so getting your assurance data sources, your network, data sources, uh, and so on integrated with is really, really important scale was massive for them. Um, you know, they're talking about billions of transactions in under a minute, uh, being processed, um, and hundreds of terabytes per day. So, uh, you know, phenomenal scale, uh, that needed to be available out of the box as it were, um, real time indicators and alarms. And there was lots of KPIs and thresholds set that, you know, w to make, make it to meet certain criteria, certain standards, um, customer specific, real time analysis of 5g, particularly for the launch root cause analysis, an AI based prediction on service, uh, anomalies and service service issues was, was, was a core use case. Um, as I talked about already the provision of service of data services across the organization, and then support for 5g, uh, served the business service, uh, impact, uh, was extremely important. >>So it's not just understand well, you know, that you have an outage in a particular network element, but what is the impact on the business of LGU plus, but also what is the impact on the business of the customer, uh, from an outage or an anomaly or a problem on, on, on the network. So being able to answer those kinds of questions really, really important, too. And as I said, between Cloudera and Kamarck, uh, uh, and LGU plus, uh, really themselves an intrinsic part of the solution, um, uh, this is, this is what we, we ended up building. So a big complicated architecture space. I really don't want to go into too much detail here. Um, uh, you can see these things for yourself, but let me skip through it really quickly. So, first of all, the key data sources, um, you have all of your wireless network information, other data sources. >>This is really important because sometimes you kind of skip over this. There are other systems that are in place like the enterprise data warehouse that needed to be integrated as well, southbound and northbound interfaces. So we get our data from the network and so on, um, and network management applications through file interfaces. CAFCA no fire important technologies. And also the RDBMS systems that, uh, you know, like the enterprise data warehouse that we're able to feed that into the system. And then northbound, um, you know, we spoke already about me making network analytics services available across the enterprise. Um, so, uh, you know, uh, having both the file and the API interface available, um, for other systems and other consumers across the enterprise is very important. Um, lots of stuff going on then in the platform itself to petabytes and persistent storage, um, Cloudera HDFS, 300 nodes for the, the raw data storage, um, uh, and then, uh, could do for real time storage for real-time indicator analysis, alarm generation, um, uh, and other real time, um, processes. >>Uh, so there, that was the, the core of the solution, uh, spark processes for ETL key quality indicators and alarming, um, and also a bunch of work done around, um, data preparation, data generation for transferal to, to third party systems, um, through the northbound interfaces, um, uh, Impala, API queries, um, for real-time systems, uh, there on the right hand side, and then, um, a whole bunch of clustering classification, prediction jobs, um, through the, uh, the, the, the, the ML processes, the machine learning processes, uh, again, another key use case, and we've done a bunch of work on that. And, um, I encourage you to have a look at the Cloudera website for more detail on some of the work that we did here. Um, so this is some pretty cool stuff. Um, and then finally, just the upstream services, some of these there's lots more than, than, than simply these ones, but service assurance is really, really important. So SQM cm and SED grade. So the service quality management customer experience, autonomous controllers, uh, really, really important consumers of, of the, of the real-time analytics platform, uh, and your conventional service assurance, um, functions like faulted performance management. Uh, these things are as much consumers of the information and the network analytics platform as they are providers of data to the network, uh, analytics >>Platform. >>Um, so some of the specific use cases, uh, that, uh, have been, have been stood up and that are delivering value to this day and lots of more episodes, but these are just three that we pulled out. Um, so first of all, um, uh, sort of specific monitoring and customer quality analysis, Karen response. So again, growing from the initial 5g launch and then broadening into broader services, um, understanding where there are the, where there are issues so that when people complaining, when people have an issue, um, that, um, uh, that we can answer the, the concerns of the client, um, in a substantive way, um, uh, AI functions around root cause analysis or understanding why things went wrong when they went wrong. Um, uh, and also making recommendations as to how to avoid those occurrences in the future. Uh, so we know what preventative measures can be taken. Um, and then finally the, uh, the collaboration function across LGU plus extremely important and continues to be important to this day where data is shared throughout the enterprise, through the API Lira through file interfaces and other things, and through interface integrations with, uh, with upstream systems. >>So, um, that's kind of the, the, uh, real quick run through of LGU plus the numbers are just stave staggering. Um, you know, we've seen, uh, upwards of a billion transactions in under 40 seconds being, um, uh, being tested. Um, and, and we've gone beyond those thresholds now, already, um, and we're started and, and, and, and this isn't just a theoretical sort of a benchmarking test or something like that. We're seeing these kinds of volumes of data and not too far down the track. So, um, with those things that I mentioned earlier with the proliferation of, of, um, of network infrastructure, uh, in the 5g context with virtualized elements, with all of these other bits and pieces are driving massive volumes of data towards the, uh, the, the, the network analytics platform. So phenomenal scale. Um, this is just one example we work with, with service providers all over the world is over 80% of the top 100 telecommunication service providers run on Cloudera. >>They use Cloudera in the network, and we're seeing those customers, all migrating legacy cloud platforms now onto CDP onto the Cloudera data platform. Um, they're increasing the, the, the jobs that they do. So it's not just warehousing, not just ingestion ETL, and moving into things like machine learning. Um, and also looking at new data sources from places like NWTF the network data analytics function in 5g, or the management and orchestration layer in, in software defined networks, network, function, virtualization. So, you know, new use cases coming in all the time, new data sources coming in all the time growth in, in, in, in the application scope from, as we say, from edge to AI. Um, and so it's, it's really exciting to see how the, the, the, the footprint is growing and how, uh, the applications in telecommunications are really making a difference in, in facilitating, um, network transformation. And that's covering that. That's me covered for today. I hope you found that helpful, um, by all means, please reach out, uh, there's a couple of links here. You can follow me on Twitter. You can connect to the telecommunications page, reach out to me directly at Cloudera. I'd love to answer your questions, um, uh, and, uh, and talk to you about how big data is transforming networks, uh, and how network transformation is, is accelerating telcos, uh, throughout >>Jamie Sharath with Liga data, I'm primarily on the delivery side of the house, but I also support our new business teams. I'd like to spend a minute really just kind of telling you about the legal data, where basically a Silicon valley startup, uh, started in 2014, and, uh, our lead iron, our executive team, basically where the data officers at Yahoo before this, uh, we provide managed data services, and we provide products that are focused on telcos. So we have some experience in non telco industry, but our focus for the last seven years or so is specifically on telco. So again, something over 200 employees, we have a global presence in north America, middle east Africa, Asia, and Europe. And we have folks in all of those places, uh, I'd like to call your attention to the, uh, the middle really of the screen there. So here is where we have done some partnership with Cloudera. >>So if you look at that and you can see we're in Holland and Jamaica, and then a lot to throughout Africa as well. Now, the data fabric is the product that we're talking about. And the data fabric is basically a big data type of data warehouse with a lot of additional functionality involved. The data fabric is comprised of, uh, some something called a flare, which we'll talk about in a minute below there, and then the Cloudera data platform underneath. So this is how we're partnering together. We, uh, we, we have this tool and it's, uh, it's functioning and delivering in something over 10 up. So flare now, flare is a piece of that legal data IP. The rest is there. And what flare does is that basically pulls in data, integrates it to an event streaming platform. It's, uh, it is the engine behind the data fabric. >>Uh, it's also a decisioning platform. So in real time, we're able to pull in data. We're able to run analytics on it, and we're able to alert are, do whatever is needed in a real-time basis. Of course, a lot of clients at this point are still sending data in batch. So it handles that as well, but we call that a CA picture Sanchez. Now Sacho is a very interesting app. It's an AI analytics app for executives. What it is is it runs on your mobile phone. It ties into your data. Now this could be the data fabric, but it couldn't be a standalone product. And basically it allows you to ask, you know, human type questions to say, how are my gross ads last week? How are they comparing against same time last week before that? And even the same time 60 days ago. So as an executive or as an analyst, I can pull it up and I can look at it instantly in a meeting or anywhere else without having to think about queries or anything like that. >>So that's pretty much for us at legal data, not really to set the context of where we are. So this is a traditional telco environments. So you see the systems of record, you see the cloud, you see OSS and BSS data. So one of the things that the next step above which calls we call the system of intelligence of the data fabric does, is it mergers that BSS and OSS data. So the longer we have any silos or anything that's separated, it's all coming into one area to allow business, to go in or allow data scientists go in and do that. So if you look at the bottom line, excuse me, of the, uh, of the system of intelligence, you can see that flare is the tools that pulls in the data. So it provides even streaming capabilities. It preserves entity states, so that you can go back and look at it state at any time. >>It does stream analytics that is as the data is coming in, it can perform analytics on it. And it also allows real-time decisioning. So that's something that, uh, that's something that business users can go in and create a system of, uh, if them's, it looks very much like the graph database, where you can create a product that will allow the user to be notified if a certain condition happens. So for instance, a bundle, so a real-time offer or user is succinct to run out of is ongoing, and an offer can be sent to him right on the fly. And that's set up by the business user as opposed to programmers, uh, data infrastructure. So the fabric has really three areas. That data is persistent, obviously there's the data lake. So the data lake stores that level of granularity that is very deep years and years of history, data, scientists like that, uh, and, uh, you know, for a historical record keeping and requirements from the government, that data would be stored there. >>Then there's also something we call the business semantics layer and the business semantics layer contains something over 650 specific telco KPIs. These are initially from PM forum, but they also are included in, uh, various, uh, uh, mobile operators that we've delivered at. And we've, we've grown that. So that's there for business data lake is there for data scientists, analytical stores, uh, they can be used for many different reasons. There are a lot of times RDBMS is, are still there. So these, this, this basically platform, this cloud they're a platform can tie into analytical data stores as well via flair access and reporting. So graphic visualizations, API APIs are a very key part of it. A third-party query tools, any kind of grid tools can be used. And those are the, of course, the, uh, the ones that are highly optimized and allow, you know, search of billions of records. >>And then if you look at the top, it's the systems of engagement, then you might vote this use cases. So teleco reporting, hundreds of KPIs that are, that are generated for users, segmentation, basically micro to macro segmentation, segmentation will play a key role in a use case. We talked about in a minute monetization. So this helps teleco providers monetize their specific data, but monetize it in. Okay, how to, how do they make money off of it, but also how might you leverage this data to engage with another client? So for instance, in some where it's allowed a DPI is used, and the fabric tracks exactly where each person goes each, uh, we call it a subscriber, goes within his, uh, um, uh, internet browsing on the, on the four or 5g. And, uh, the, all that data is stored. Uh, whereas you can tell a lot of things where the segment, the profile that's being used and, you know, what are they propensity to buy? Do they spend a lot of time on the Coca-Cola page? There are buyers out there that find that information very valuable, and then there's signs of, and we spoke briefly about Sanchez before that sits on top of the fabric or it's it's alone. >>So, so the story really that we want to tell is, is one, this is, this is one case out of it. This is a CVM type of case. So there was a mobile operator out there that was really offering, you know, packages, whether it's a bundle or whether it's a particular tool to subscribers, they, they were offering kind of an abroad approach that it was not very focused. It was not depending on the segments that were created around the profiling earlier, uh, the subscriber usage was somewhat dated and this was causing a lot of those. A lot of those offers to be just basically not taken and, and not, not, uh, audited. Uh, there was limited segmentation capabilities really before the, uh, before the, uh, fabric came in. Now, one of the key things about the fabric is when you start building segments, you can build that history. >>So all of that data stored in the data lake can be used in terms of segmentation. So what did we do about that? The, the, the envy and, oh, the challenge this, uh, we basically put the data fabric in and the data fabric was running Cloudera data platform and that, uh, and that's how we team up. Uh, we facilitated the ability to personalize campaign. So what that means is, uh, the segments that were built and that user fell within that segment, we knew exactly what his behavior most likely was. So those recommendations, those offers could be created then, and we enable this in real time. So real-time ability to even go out to the CRM system and gather further information about that. All of these tools, again, we're running on top of the Cloudera data platform, uh, what was the outcome? Willie, uh, outcome was that there was a much more precise offer given to the client that is, that was accepted, no increase in cross sell and upsell subscriber retention. >>Uh, our clients came back to us and pointed out that, uh, it was 183% year on year revenue increase. Uh, so this is a, this is probably one of the key use cases. Now, one thing to really mention is there are hundreds and hundreds of use cases running on the fabric. And I would even say thousands. A lot of those have been migrated. So when the fabric is deployed, when they bring the Cloudera and the legal data solution in there's generally a legacy system that has many use cases. So many of those were, were migrated virtually all of them in pen, on put on the cloud. Uh, another issue is that new use cases are enabled again. So when you get this level of granularity and when you have campaigns that can now base their offers on years of history, as opposed to 30 days of history, the campaigns campaign management response systems, uh, are, are, uh, are enabled quite a bit to do all, uh, to be precise in their offers. Okay. >>Okay. So this is a technical slide. Uh, one of the things that we normally do when we're, when we're out there talking to folks, is we talk and give an overview and that last little while, and then we give a deep technical dive on all aspects of it. So sometimes that deep dive can go a couple of hours. I'm going to do this slide and a couple of minutes. So if you look at it, you can see over on the left, this is the, uh, the sources of the data. And they go through this tool called flare that runs on the cloud. They're a data platform, uh, that can either be via cues or real-time cues, or it can be via a landing zone, or it can be a data extraction. You can take a look at the data quality that's there. So those are built in one of the things that flare does is it has out of the box ability to ingest data sources and to apply the data quality and validation for telco type sources. >>But one of the reasons this is fast to market is because throughout those 10 or 12, uh, opcos that we've done with Cloudera, where we have already built models, so models for CCN, for air for, for most mediation systems. So there's not going to be a type of, uh, input that we haven't already seen are very rarely. So that actually speeds up deployment very quickly. Then a player does the transformations, the, uh, the metrics, continuous learning, we call it continuous decisioning, uh, API access. Uh, we, uh, you know, for, for faster response, we use distributed cash. I'm not going to go too deeply in there, but the layer in the business semantics layer again, are, are sitting on top of the Cloudera data platform. You see the Kafka CLU, uh, Q1, the right as well. >>And all of that, we're calling the fabric. So the fabric is Cloudera data platform and the cloud and flair and all of this runs together. And, and by the way, there've been many, many, many, many hundreds of hours testing flare with Cloudera and, uh, and the whole process, the results, what are the results? Well, uh, there are, there are four I'm going to talk about, uh, we saw the one for the, it was called my pocket pocket, but it's a CDM type, a use case. Uh, the subscribers of that mobile operator were 14 million plus there was a use case for 24 million plus that a year on year revenue was 130%, uh, 32 million plus for 38%. These are, um, these are different CVM pipe, uh, use cases, as well as network use cases. And then there were 44%, uh, telco with 76 million subscribers. So I think that there are a lot more use cases that we could talk about, but, but in this case, this is the ones we're looking at, uh, again, 183%. This is something that we find consistently. And these figures come from our, uh, our actual end client. How do we unlock the full potential of this? Well, I think to start is to arrange a meeting and, uh, it would be great to, to, uh, for you to reach out to me or to Anthony. Uh, we're working at the junction on this, and we can set up a, uh, we can set up a meeting and we can go through this initial meeting. And, uh, I think that's the very beginning. Uh, again, you can get additional information from Cloudera website and from the league of data website, Anthony, that's the story. Thank you. >>No, that's great. Jeremy, thank you so much. It's a, it's, it's wonderful to go deep. And I know that there are hundreds of use cases being deployed in MTN, um, but great to go deep on one. And like you said, it can, once you get that sort of architecture in place, you can do so many different things. The power of data is tremendous, but it's great to be able to see how you can, how you can track it end to end from collecting the data, processing it, understanding it, and then applying it in a commercial context and bringing actual revenue back into the business. So there is your ROI straight away. Now you've got a platform that you can transform your business on. That's, that's, it's a tremendous story, Jamie, and thank you for your part. Sure. Um, that's a, that's, that's our story for today. Like Jamie says, um, please do flee, uh, feel free to reach out to us. Um, the, the website addresses are there and our contact details, and we'd be delighted to talk to you a little bit more about some of the other use cases, perhaps, um, and maybe about your own business and, uh, and how we might be able to make it, make it perform a little better. So thank you.

Published Date : Aug 4 2021

SUMMARY :

Um, thinking about, uh, So it didn't matter what network technology had, whether it was a Nokia technology or Erickson technology the cloud that drive, uh, uh, enhancements in use cases uh, and that again is going to lead to an increase in the amount of data that we have available. So the first is more physical elements. And so that needs to be aggregated and collected and managed and stored So the numbers of devices on the agent beyond the age, um, are going to be phenomenal. the agility and all of the scale, um, uh, benefits that you get from migrating So the kinds of services So on the demand side, um, So they'll need to be within a private cloud or at best a hybrid cloud environment in order to satisfy huge growth in that business over the last, uh, over the last 10, 15 years or so. And so particular, the two silos between And so, uh, um, the real-time network analytics platform, um, it was very important. Um, so first of all, the integration of wired and wireless data service data sources, So, first of all, the key data sources, um, you have all of your wireless network information, And also the RDBMS systems that, uh, you know, like the enterprise data warehouse that we're able to feed of the information and the network analytics platform as they are providers of data to the network, Um, so some of the specific use cases, uh, Um, you know, we've seen, Um, and also looking at new data sources from places like NWTF the network data analytics So here is where we have done some partnership with So if you look at that and you can see we're in Holland and Jamaica, and then a lot to throughout And even the same time So the longer we have any silos data, scientists like that, uh, and, uh, you know, for a historical record keeping and requirements of course, the, uh, the ones that are highly optimized and allow, the segment, the profile that's being used and, you know, what are they propensity to buy? Now, one of the key things about the fabric is when you start building segments, So all of that data stored in the data lake can be used in terms of segmentation. So when you get this level of granularity and when you have campaigns that can now base their offers So if you look at it, you can see over on the left, this is the, uh, the sources of the data. So there's not going to be a type of, uh, input that we haven't already seen are very rarely. So the fabric is Cloudera data platform and the cloud uh, and how we might be able to make it, make it perform a little better.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Jeremy	PERSON	0.99+
Holland	LOCATION	0.99+
Jamie Sharath	PERSON	0.99+
Anthony	PERSON	0.99+
Korea	LOCATION	0.99+
38%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
2014	DATE	0.99+
2019	DATE	0.99+
183%	QUANTITY	0.99+
Europe	LOCATION	0.99+
24 million	QUANTITY	0.99+
14 million	QUANTITY	0.99+
LG	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
30 days	QUANTITY	0.99+
Jamaica	LOCATION	0.99+
Nokia	ORGANIZATION	0.99+
Huawei	ORGANIZATION	0.99+
today	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
130%	QUANTITY	0.99+
32 million	QUANTITY	0.99+
Asia	LOCATION	0.99+
last week	DATE	0.99+
Erickson	ORGANIZATION	0.99+
Finastra	ORGANIZATION	0.99+
three	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Africa	LOCATION	0.99+
north America	LOCATION	0.99+
telco	ORGANIZATION	0.99+
Silicon valley	LOCATION	0.99+
first	QUANTITY	0.99+
each person	QUANTITY	0.99+
Willie	PERSON	0.99+
10	QUANTITY	0.99+
44%	QUANTITY	0.99+
over 80%	QUANTITY	0.99+
one	QUANTITY	0.98+
76 million subscribers	QUANTITY	0.98+
60 days ago	DATE	0.98+
over 200 employees	QUANTITY	0.98+
LGU plus	ORGANIZATION	0.98+
Cloudera	TITLE	0.98+
Sacho	TITLE	0.98+
middle east Africa	LOCATION	0.97+
First	QUANTITY	0.97+
Liga data	ORGANIZATION	0.97+
four major vectors	QUANTITY	0.97+
under 40 seconds	QUANTITY	0.97+
YouTube	ORGANIZATION	0.97+
one example	QUANTITY	0.97+
One	QUANTITY	0.97+
two silos	QUANTITY	0.97+
each	QUANTITY	0.96+
Karen	PERSON	0.96+
one case	QUANTITY	0.96+
billions of records	QUANTITY	0.96+
three areas	QUANTITY	0.96+
under a minute	QUANTITY	0.95+
CAFCA	ORGANIZATION	0.95+
one thing	QUANTITY	0.95+
both	QUANTITY	0.94+
12	QUANTITY	0.94+
LG plus	ORGANIZATION	0.94+
Twitter	ORGANIZATION	0.94+
one area	QUANTITY	0.93+
fourth one	QUANTITY	0.93+
hundreds and	QUANTITY	0.92+
a year	QUANTITY	0.92+

PUBLIC SECTOR V1 | CLOUDERA

>>Hi, this is Cindy Mikey, vice president of industry solutions at caldera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and shad we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud. Isn't an attempt to obtain something about value through unwelcome misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud, um, and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external, uh, perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically about 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from permit out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's a broad stroke areas. What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at, you know, social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're gonna use focused specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of, um, unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has it it's, um, uh, underpinnings inquiry, like you different on government agencies and difficult, different analytical methods, and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models. We're typically looking at historical type information, but if we're actually trying to look at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case that shad is going to talk about later is how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that, uh, behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So for quite a different variety of data and the, the breadth and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a looking at a more extensive, uh, data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes on increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under-reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like a constituent, are there areas where we're seeing, uh, >>Um, other >>Aspects of, of fraud potentially being occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, uh, agent-based modeling techniques, where we're looking at simulation Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, uh, the public sector. Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to chef to talk about, uh, the reference architecture for, uh, doing these buckets. >>Thanks, Cindy. Um, so I'm gonna walk you through an example, reference architecture for fraud detection using, uh, Cloudera's underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or novelists behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. We need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create our normal behavior profiles. And these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, thinking, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jason or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on a patch NIFA in mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to, uh, you know, downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geolocation that's in that transaction data can be enriched with previously known locations of that very same individual. And all of that enriched data can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stricted to Kafka and coffin. It's going to serve as that central repository of syndicated services or a buffer zone, right? >>So coffee is going to pretty much provide you with, uh, extremely fast resilient and fault tolerance storage. And it's also gonna give you the consumer APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transformed data within your buffer zone, uh, allowed that, you know, 17. So you can store that data in a distributed file system, give you that historical context that you're going to need later on for machine learning, right? So the next step in the architecture is to leverage a cluttered SQL stream builder, which enables us to write, uh, streaming SQL jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer in real time. Uh I'll you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage kudu, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed and we've explored our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, uh, even deep learning techniques with neural networks. And these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real-time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. >>Uh, and this entire pipeline is powered by clutters technology, right? And so, uh, the IRS is one of, uh, clutter's customers. That's leveraging our platform today and implementing, uh, a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their spark based analytics and their machine learning, uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, um, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection, uh, looking at neural network analysis, time series information. So next steps we'd love to have additional conversation with you. You can also find on some additional information around, I have caught areas working in the, the federal government by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us Sheva and I today. We greatly appreciate your time and look forward to future progress. >>Good day, everyone. Thank you for joining me. I'm Sydney. Mike joined by Rick Taylor of Cloudera. Uh, we're here to talk about predictive maintenance for the public sector and how to increase assets, service, reliability on today's agenda. We'll talk specifically around how to optimize your equipment maintenance, how to reduce costs, asset failure with data and analytics. We'll go into a little more depth on, um, what type of data, the analytical methods that we're typically seeing used, um, the associated, uh, Brooke, we'll go over a case study as well as a reference architecture. So by basic definition, uh, predictive maintenance is about determining when an asset should be maintained and what specific maintenance activities need to be performed either based upon an assets of actual condition or state. It's also about predicting and preventing failures and performing maintenance on your time on your schedule to avoid costly unplanned downtime. >>McKinsey has looked at analyzing predictive maintenance costs across multiple industries and has identified that there's the opportunity to reduce overall predictive maintenance costs by roughly 50% with different types of analytical methods. So let's look at those three types of models. First, we've got our traditional type of method for maintenance, and that's really about our corrective maintenance, and that's when we're performing maintenance on an asset, um, after the equipment fails. But the challenges with that is we end up with unplanned. We end up with disruptions in our schedules, um, as well as reduced quality, um, around the performance of the asset. And then we started looking at preventive maintenance and preventative maintenance is really when we're performing maintenance on a set schedule. Um, the challenges with that is we're typically doing it regardless of the actual condition of the asset, um, which has resulted in unnecessary downtime and expense. Um, and specifically we're really now focused on pre uh, condition-based maintenance, which is looking at leveraging predictive maintenance techniques based upon actual conditions and real time events and processes. Um, within that we've seen organizations, um, and again, source from McKenzie have a 50% reduction in downtime, as well as an overall 40% reduction in maintenance costs. Again, this is really looking at things across multiple industries, but let's look at it in the context of the public sector and based upon some activity by the department of energy, um, several years ago, >>Um, they've really >>Looked at what does predictive maintenance mean to the public sector? What is the benefit, uh, looking at increasing return on investment of assets, reducing, uh, you know, reduction in downtime, um, as well as overall maintenance costs. So corrective or reactive based maintenance is really about performing once there's been a failure. Um, and then the movement towards, uh, preventative, which is based upon a set schedule or looking at predictive where we're monitoring real-time conditions. Um, and most importantly is now actually leveraging IOT and data and analytics to further reduce those overall downtimes. And there's a research report by the, uh, department of energy that goes into more specifics, um, on the opportunity within the public sector. So, Rick, let's talk a little bit about what are some of the challenges, uh, regarding data, uh, regarding predictive maintenance. >>Some of the challenges include having data silos, historically our government organizations and organizations in the commercial space as well, have multiple data silos. They've spun up over time. There are multiple business units and note, there's no single view of assets. And oftentimes there's redundant information stored in, in these silos of information. Uh, couple that with huge increases in data volume data growing exponentially, along with new types of data that we can ingest there's social media, there's semi and unstructured data sources and the real time data that we can now collect from the internet of things. And so the challenge is to collect all these assets together and begin to extract intelligence from them and insights and, and that in turn then fuels, uh, machine learning and, um, and, and what we call artificial intelligence, which enables predictive maintenance. Next slide. So >>Let's look specifically at, you know, the, the types of use cases and I'm going to Rick and I are going to focus on those use cases, where do we see predictive maintenance coming into the procurement facility, supply chain, operations and logistics. Um, we've got various level of maturity. So, you know, we're talking about predictive maintenance. We're also talking about, uh, using, uh, information, whether it be on a, um, a connected asset or a vehicle doing monitoring, uh, to also leveraging predictive maintenance on how do we bring about, uh, looking at data from connected warehouses facilities and buildings all bring on an opportunity to both increase the quality and effectiveness of the missions within the agencies to also looking at re uh, looking at cost efficiency, as well as looking at risk and safety and the types of data, um, you know, that Rick mentioned around, you know, the new types of information, some of those data elements that we typically have seen is looking at failure history. >>So when has that an asset or a machine or a component within a machine failed in the past? Uh, we've also looking at bringing together a maintenance history, looking at a specific machine. Are we getting error codes off of a machine or assets, uh, looking at when we've replaced certain components to looking at, um, how are we actually leveraging the assets? What were the operating conditions, uh, um, pulling off data from a sensor on that asset? Um, also looking at the, um, the features of an asset, whether it's, you know, engine size it's make and model, um, where's the asset located on to also looking at who's operated the asset, uh, you know, whether it be their certifications, what's their experience, um, how are they leveraging the assets and then also bringing in together, um, some of the, the pattern analysis that we've seen. So what are the operating limits? Um, are we getting service reliability? Are we getting a product recall information from the actual manufacturer? So, Rick, I know the data landscape has really changed. Let's, let's go over looking at some of those components. Sure. >>So this slide depicts sort of the, some of the inputs that inform a predictive maintenance program. So, as we've talked a little bit about the silos of information, the ERP system of record, perhaps the spares and the service history. So we want, what we want to do is combine that information with sensor data, whether it's a facility and equipment sensors, um, uh, or temperature and humidity, for example, all this stuff is then combined together, uh, and then use to develop machine learning models that better inform, uh, predictive maintenance, because we'll do need to keep, uh, to take into account the environmental factors that may cause additional wear and tear on the asset that we're monitoring. So here's some examples of private sector, uh, maintenance use cases that also have broad applicability across the government. For example, one of the busiest airports in Europe is running cloud era on Azure to capture secure and correlate sensor data collected from equipment within the airport, the people moving equipment more specifically, the escalators, the elevators, and the baggage carousels. >>The objective here is to prevent breakdowns and improve airport efficiency and passenger safety. Another example is a container shipping port. In this case, we use IOT data and machine learning, help customers recognize how their cargo handling equipment is performing in different weather conditions to understand how usage relates to failure rates and to detect anomalies and transport systems. These all improve for another example is Navistar Navistar, leading manufacturer of commercial trucks, buses, and military vehicles. Typically vehicle maintenance, as Cindy mentioned, is based on miles traveled or based on a schedule or a time since the last service. But these are only two of the thousands of data points that can signal the need for maintenance. And as it turns out, unscheduled maintenance and vehicle breakdowns account for a large share of the total cost for vehicle owner. So to help fleet owners move from a reactive approach to a more predictive model, Navistar built an IOT enabled remote diagnostics platform called on command. >>The platform brings in over 70 sensor data feeds for more than 375,000 connected vehicles. These include engine performance, trucks, speed, acceleration, cooling temperature, and break where this data is then correlated with other Navistar and third-party data sources, including weather geo location, vehicle usage, traffic warranty, and parts inventory information. So the platform then uses machine learning and advanced analytics to automatically detect problems early and predict maintenance requirements. So how does the fleet operator use this information? They can monitor truck health and performance from smartphones or tablets and prioritize needed repairs. Also, they can identify that the nearest service location that has the relevant parts, the train technicians and the available service space. So sort of wrapping up the, the benefits Navistar's helped fleet owners reduce maintenance by more than 30%. The same platform is also used to help school buses run safely. And on time, for example, one school district with 110 buses that travel over a million miles annually reduce the number of PTOs needed year over year, thanks to predictive insights delivered by this platform. >>So I'd like to take a moment and walk through the data. Life cycle is depicted in this diagram. So data ingest from the edge may include feeds from the factory floor or things like connected vehicles, whether they're trucks, aircraft, heavy equipment, cargo vessels, et cetera. Next, the data lands on a secure and governed data platform. Whereas combined with data from existing systems of record to provide additional insights, and this platform supports multiple analytic functions working together on the same data while maintaining strict security governance and control measures once processed the data is used to train machine learning models, which are then deployed into production, monitored, and retrained as needed to maintain accuracy. The process data is also typically placed in a data warehouse and use to support business intelligence, analytics, and dashboards. And in fact, this data lifecycle is representative of one of our government customers doing condition-based maintenance across a variety of aircraft. >>And the benefits they've discovered include less unscheduled maintenance and a reduction in mean man hours to repair increased maintenance efficiencies, improved aircraft availability, and the ability to avoid cascading component failures, which typically cost more in repair cost and downtime. Also, they're able to better forecast the requirements for replacement parts and consumables and last, and certainly very importantly, this leads to enhanced safety. This chart overlays the secure open source Cloudera platform used in support of the data life cycle. We've been discussing Cloudera data flow, the data ingest data movement and real time streaming data query capabilities. So data flow gives us the capability to bring data in from the asset of interest from the internet of things. While the data platform provides a secure governed data lake and visibility across the full machine learning life cycle eliminates silos and streamlines workflows across teams. The platform includes an integrated suite of secure analytic applications. And two that we're specifically calling out here are Cloudera machine learning, which supports the collaborative data science and machine learning environment, which facilitates machine learning and AI and the cloud era data warehouse, which supports the analytics and business intelligence, including those dashboards for leadership Cindy, over to you, Rick, >>Thank you. And I hope that, uh, Rick and I provided you some insights on how predictive maintenance condition-based maintenance is being used and can be used within your respective agency, bringing together, um, data sources that maybe you're having challenges with today. Uh, bringing that, uh, more real-time information in from a streaming perspective, blending that industrial IOT, as well as historical information together to help actually, uh, optimize maintenance and reduce costs within the, uh, each of your agencies, uh, to learn a little bit more about Cloudera, um, and our, what we're doing from a predictive maintenance please, uh, business@cloudera.com solutions slash public sector. And we look forward to scheduling a meeting with you, and on that, we appreciate your time today and thank you very much.

Published Date : Aug 4 2021

SUMMARY :

So as we look at fraud, Um, the types of fraud that we see is specifically around cyber crime, So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, the breadth and the opportunity really comes about when you can integrate and Some of the techniques that we use and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, I'm going to turn it over to chef to talk about, uh, the reference architecture for, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. It could be in the data center or even on edge devices, and this data needs to be collected At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage a cluttered SQL stream builder, obtain the accuracy of the performance, the scores that we want, Um, and one of the neat things with the IRS the analysis, the information that Sheva and I have provided, um, to give you some insights on the analytical methods that we're typically seeing used, um, the associated, doing it regardless of the actual condition of the asset, um, uh, you know, reduction in downtime, um, as well as overall maintenance costs. And so the challenge is to collect all these assets together and begin the types of data, um, you know, that Rick mentioned around, you know, the new types on to also looking at who's operated the asset, uh, you know, whether it be their certifications, So we want, what we want to do is combine that information with So to help fleet So the platform then uses machine learning and advanced analytics to automatically detect problems So data ingest from the edge may include feeds from the factory floor or things like improved aircraft availability, and the ability to avoid cascading And I hope that, uh, Rick and I provided you some insights on how predictive

ENTITIES

Entity	Category	Confidence
Cindy Mikey	PERSON	0.99+
Rick	PERSON	0.99+
Rick Taylor	PERSON	0.99+
Molly	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
2017	DATE	0.99+
PWC	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
110 buses	QUANTITY	0.99+
Europe	LOCATION	0.99+
50%	QUANTITY	0.99+
Cindy	PERSON	0.99+
Mike	PERSON	0.99+
Joe	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Today	DATE	0.99+
today	DATE	0.99+
Navistar	ORGANIZATION	0.99+
First	QUANTITY	0.99+
two	QUANTITY	0.99+
more than 30%	QUANTITY	0.99+
over $51 billion	QUANTITY	0.99+
NIFA	ORGANIZATION	0.99+
over $65 billion	QUANTITY	0.99+
IRS	ORGANIZATION	0.99+
over a million miles	QUANTITY	0.99+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
Jason	PERSON	0.98+
Azure	TITLE	0.98+
Brooke	PERSON	0.98+
Avro	PERSON	0.98+
one school district	QUANTITY	0.98+
SQL	TITLE	0.97+
both	QUANTITY	0.97+
$148 billion	QUANTITY	0.97+
Sheva	PERSON	0.97+
three types	QUANTITY	0.96+
each	QUANTITY	0.95+
McKenzie	ORGANIZATION	0.95+
more than 375,000 connected vehicles	QUANTITY	0.95+
Cloudera	TITLE	0.95+
about 57 billion	QUANTITY	0.95+
salty	PERSON	0.94+
several years ago	DATE	0.94+
single technology	QUANTITY	0.94+
eight times	QUANTITY	0.93+
91 billion	QUANTITY	0.93+
eight X	QUANTITY	0.92+
business@cloudera.com	OTHER	0.92+
McKinsey	ORGANIZATION	0.92+
zero changes	QUANTITY	0.92+
Monte Carlo	TITLE	0.92+
caldera	ORGANIZATION	0.91+
couple	QUANTITY	0.9+
over 70 sensor data feeds	QUANTITY	0.88+
Richmond	LOCATION	0.84+
Navistar Navistar	ORGANIZATION	0.82+
single view	QUANTITY	0.81+
17	OTHER	0.8+
single common format	QUANTITY	0.8+
thousands of data points	QUANTITY	0.79+
Sydney	LOCATION	0.78+

FINANCIAL SERVICES V1b | Cloudera

>>Uh, hi, I'm Joe Rodriguez, managing director of financial services at Cloudera. Uh, welcome to the fight fraud with a data session, uh, at Cloudera, we believe that fighting fraud with, uh, uh, begins with data. Um, so financial services is Cloudera's largest industry vertical. We have approximately 425 global financial services customers, uh, which consists of 82 out of a hundred of the largest global banks of which we have 27 that are globally systemic banks, uh, four out of the five top, uh, stock exchanges, uh, eight out of the top 10 wealth management firms and all four of the top credit card networks. So as you can see most financial services institutions, uh, utilize Cloudera for data analytics and machine learning, uh, we also have over 20 central banks and a dozen or so financial regulators. So it's an incredible footprint which gives Cloudera lots of insight into the many innovations, uh, that our customers are coming up with. Uh, criminals can steal thousands of dollars before a fraudulent transaction is detected. So the cost of, uh, to purchase a, your account data is well worth the price to fraudsters. Uh, according to Experian credit and debit card account information sells on the dark web for a mere $5 with the CVV number and up to $110. If it comes with all the bank information, including your name, social security number, date of birth, uh, complete account numbers and, and other personal data. >>Um, our customers have several key data and analytics challenges when it comes to fighting financial crime. The volume of data that they need to deal with is, is huge and growing exponentially. Uh, all this data needs to be evaluated in real time. Uh, there is, uh, there are new sources of, of streaming data that need to be integrated with existing, uh, legacy data sources. This includes, um, biometrics data and enhanced, uh, authentication, uh, video surveillance call center data. And of course all that needs to be integrated with existing legacy data sources. Um, there is an analytics arms race between the banks and the criminals and the criminal networks never stop innovating. They also we'll have to deal with, uh, disjointed security and governance, security and governance policies are often set per data source, uh, or application requiring redundant work, work across workloads. And, and they have to deal with siloed environments, um, the specialized nature of platforms and people results in disparate data sources and data management processes, uh, this duplicates efforts and, uh, divides the, the business risk and crime teams, limiting collaboration opportunities between CDP enhances financial crime solutions, uh, to be holistic by eliminating data gaps between siloed solutions with, uh, an enterprise data approach, uh, advanced, uh, data analytics and machine learning, uh, by deploying an enterprise wide data platform, you reduce siloed divisions between business risk and crime teams and enable better collaboration through industrialized machine learning. >>Uh, you tighten up the loop between, uh, detection and new fraud patterns. Cloudera provides the data platform on which a best of breed applications can run and leverage integrated machine learning cloud Derrick stands rather than replaces your existing fraud modeling applications. So Oracle SAS Actimize to, to name a few, uh, integrate with an enterprise data hub to scale the data increased speed and flexibility and improve efficacy of your entire fraud system. It also centralizes the fraud workload on data that can be used for other use cases in applications like enhanced KYC and a customer 360 4 example. >>I just, I wanted to highlight a couple of our partners in financial crime prevention, uh, semi dine, and Quintex, uh, uh, so send me nine provides fraud simulation using agent-based modeling, uh, machine learning techniques, uh, to generate synthetic transaction data. This data simulates potential fraud scenarios in a cost-effective, uh, GDPR compliant, virtual environment, significantly improved financial crime detection systems, semi dine identifies future fraud topologies, uh, from millions of, of simulations that can be used to dynamically train, uh, new machine learning algorithms for enhanced fraud identification and context, um, uh, connects the dots within your data, using dynamic entity resolution, and advanced network analytics to create context around your customers. Um, this enables you to see the bigger picture and automatically assesses potential criminal beads behavior. >>Now let's go some of our, uh, customers, uh, and how they're using cloud caldera. Uh, first we'll talk about, uh, United overseas bank, or you will be, um, you'll be, is a leading full service bank in, uh, in Asia. It, uh, with, uh, a network of more than 500 offices in, in 19 countries and territories in Asia, Pacific, Western Europe and north America UA, um, UOB built a modern data platform on Cloudera that gives it the flexibility and speed to develop new AI and machine learning solutions and to create a data-driven enterprise. Um, you'll be set up, uh, set up it's big data analytics center in 2017. Uh, it was Singapore's first centralized big data unit, uh, within a bank to deepen the bank's data analytic capabilities and to use data insights to enhance, uh, the banks, uh, uh, performance essential to this work was implementing a platform that could cost efficiently, bring together data from dozens of separate systems and incorporate a range of unstructured data, including, uh, voice and text, um, using Cloudera CDP and machine learning. >>UOB gained a richer understanding of its customer preferences, uh, to help make their, their banking experience simpler, safer, and more reliable. Working with Cloudera UOB has a big data platform that gives business staff and data scientists faster access to relevant and quality data for, for self-service analytics, machine learning and, uh, emerging artificial intelligence solutions. Um, with new self-service analytics and machine learning driven insights, you'll be, uh, has realized improvements in, in digital banking, asset management, compliance, AML, and more, uh, advanced AML detection capabilities, help analysts detect suspicious transactions either based on hidden relationships of shell companies and, uh, high risk individuals, uh, with, uh, Cloudera and machine learning, uh, technologies. You you'll be, uh, was able to enhance AML detection and reduce the time to identify new links from months 2, 3, 3 weeks. >>Excellent mass let's speak about MasterCard. So MasterCard's principle businesses to process payments between banks and merchants and the credit issuing banks and credit unions of the purchasers who use the MasterCard brand debit and credit cards to make purchases MasterCard chose Cloudera enterprise for fraud detection, and to optimize their DW infrastructure, delivering deepens insights and best practices in big data security and compliance. Uh, next let's speak about, uh, bank Rakka yet, uh, in Indonesia or Bri. Um, it, VRI is one of the largest and oldest banks in Indonesia and engages in the provision of general banking services. Uh, it's headquartered in Jakarta Indonesia. Uh, Bri is well known for its focus on financing initiatives and serves over 75 million customers through it's more than 11,000 offices and rural service outposts. Uh, Bri required better insight to understand customer activity and identify fraudulent transactions. Uh, the bank needed a solid foundation that allowed it to leverage the power of advanced analytics, artificial intelligence, and machine learning to gain better understanding of customers and the market. >>Uh, Bri used, uh, Cloudera enterprise data platform to build an agile and reliable, predictive augmented intelligence solution, uh, to enhance its credit scoring system and to address the rising concern around data security from regulators, uh, and customers, uh, Bri developed a real-time fraud detection service, uh, powered by Cloudera and Kafka. Uh, Bri's data scientists developed a machine learning model for fraud detection by creating a behavioral scoring model based on customer savings, uh, loan transactions, deposits, payroll and other financial, um, uh, real-time time data. Uh, this led to improvements in its fraud detection and credit scoring capabilities, as well as the development of a, of a new digital microfinancing product, uh, with the enablement of real-time fraud detection, VRI was able to reduce the rate of fraud by 40%. Uh, it improved, uh, relationship manager productivity by two and a half fold. Uh, it improved the credit score scoring system to cut down on micro-financing loan processing times from two weeks to two days to now two minutes. So fraud prevention is a good area to start with a data focus. If you haven't already, it offers a quick return on investment, uh, and it's a focused area. That's not too entrenched across the company, uh, to learn more about fraud prevention, uh, go to kroger.com and to schedule, and you should schedule a meeting with Cloudera, uh, to learn even more. Uh, and with that, thank you for listening and thank you for your time. >>Welcome to the customer. Obsession begins with data session. Uh, thank you for, for attending. Um, at Cloudera, we believe that a custom session begins with, uh, with, with data, um, and, uh, you know, financial services is Cloudera is largest industry vertical. We have approximately 425 global financial services customers, uh, which consists of 82 out of a hundred of the largest global banks of which we have 27 that are globally systemic banks, uh, four out of the five top stock exchanges, eight out of the 10 top wealth management firms and all four of the top credit card networks. Uh, so as you can see most financial services institutions utilize Cloudera for data analytics and machine learning. Uh, we also have over 20 central banks and it doesn't or so financial regulators. So it's an incredible footprint, which glimpse Cloudera, lots of insight into the many innovations that our customers are coming up with. >>Customers have grown more independent and demanding. Uh, they want the ability to perform many functions on their own and, uh, be able to do it. Uh, he do them on their mobile devices, uh, in a recent Accenture study, more than 50% of customers, uh, are focused on, uh, improving their customer experience through more personalized offers and advice. The study found that 75% of people are actually willing to share their data for better personalized offers and more efficient and intuitive services to get it better, better understanding of your customers, use all the data available to develop a complete view of your customer and, uh, and better serve them. Uh, this also breaks down, uh, costly silos, uh, shares data in, in accordance with privacy laws and assists with regulatory advice. It's so different organizations are going to be at different points in their data analytics and AI journey. >>Uh, there are several degrees of streaming and batch data, both structured and unstructured. Uh, you need a platform that can handle both, uh, with common, with a common governance layer, um, near real time. And, uh, real-time sources help make data more relevant. So if you look at this graphic, looking at it from left to right, uh, normal streaming and batch data comes from core banking and, uh, and lending operations data in pretty much a structured format as financial institutions start to evolve. Uh, they start to ingest near real-time streaming data that comes not only from customers, but also from, from newsfeeds for example, and they start to capture more behavioral data that they can use to evolve their models, uh, and customer experience. Uh, ultimately they start to ingest more real time streaming data, not only, um, standard, uh, sources like market and transaction data, but also alternative sources such as social media and connected sources, such as wearable devices, uh, giving them more, more data, better data, uh, to extract intelligence and drive personalized actions based on data in real time at the right time, um, and use machine learning and AI, uh, to drive anomaly detection and protect and predict, uh, present potential outcomes. >>So this is another way to look at it. Um, this slide shows the progression of the big data journey as it relates to a customer experience example, um, the dark blue represents, um, visibility or understanding your customer. So we have a data warehouse and are starting to develop some analytics, uh, to know your customer and start to provide a better customer 360 experience. Uh, the medium blue area, uh, is a customer centric or where we learn, uh, the customer's behavior. Uh, at this point we're improving our analytics, uh, gathering more customer centric information to perform, uh, some more exploratory, uh, data sciences. And we can start to do things like cross sell or upsell based on the customer's behavior, which should improve, uh, customer retention. The light blue area is, uh, is proactive customer inter interactions, or where we now have the ability, uh, to predict customers needs and wants and improve our interaction with the customer, uh, using applied machine learning and, and AI, uh, the Cloudera data platform, um, you know, business use cases require enabling, uh, the end-to-end journey, which we referred to as the data life cycle, uh, what the data life cycle, what is the data life cycle that our customers want, uh, to take their data through, to enable the end to end data journey. >>If you ask our customers, they want different types of analytics, uh, for their diverse user bases to help them implement their, their, their use cases while managed by a centralized security and governance later layer. Uh, in other words, um, the data life cycle to them provides multifunction analytics, uh, at each stage, uh, within the data journey, uh, that, uh, integrated and centralized, uh, security, uh, and governance, for example, uh, enterprise data consists of real time and transactional type type data. Examples include, uh, click stream data, web logs, um, machine generated, data chat bots, um, call center interactions, uh, transactions, uh, within legacy applications, market data, et cetera. We need to manage, uh, that data life cycle, uh, to provide real enterprise data insights, uh, for use cases around enhanced them, personalized customer experience, um, customer journey analytics next best action, uh, sentiment and churn analytics market, uh, campaign optimization, uh, mortgage, uh, processing optimization and so on. >>Um, we bring a diverse set of data then, um, and then enrich it with other data about our customers and products, uh, provide reports and dashboards such as customer 360 and use predictions from machine models to provide, uh, business decisions and, and offers of, uh, different products and services to customers and maintain customer satisfaction, um, by using, um, sentiment and churn analytics. These examples show that, um, the whole data life cycle is involved, um, and, uh, is in continuous fashion in order to meet these types of use cases, uh, using a single cohesive platform that can be, uh, that can be served by CDP, uh, the data, the Cloudera data platform. >>Okay. Uh, let's talk about, uh, some of the experiences, uh, from our customers. Uh, first we'll talk about Bunco suntan there. Um, is a major global bank headquartered in Spain, uh, with, uh, major operations and subsidiaries all over Europe and north and, and south America. Uh, one of its subsidiaries, something there UK wanted to revolutionize the customer experience with the use of real time data and, uh, in app analytics, uh, for mobile users, however, like many financial institutions send them there had a, he had a, had a large number of legacy data warehouses spread across many business use, and it's within consistent data and different ways of calculating the same metrics, uh, leading to different results. As a result, the company couldn't get the comprehensive customer insights it needed. And, uh, and business staff often worked on multiple versions of the truth. Sometime there worked with Cloudera to improve a single data platform that could support all its workloads, including self-service analytics, uh, operational analytics and data science processes, processing processing, 10 million transactions daily or 30,000 transactions per second at peak times. >>And, uh, bringing together really, uh, nearly two to two petabytes of data. The platform provides unprecedented, uh, customer insight and business value across the organization, uh, over 80 cents. And there has realized impressive, uh, benefits spanning, uh, new revenues, cost savings and risk reductions, including creating analytics for, for corporate customers with near real-time shopping behavior, um, and, and helping identify 7,000 new corporate, uh, customer prospects, uh, reducing capital expenditures by, uh, 3.2 million annually and decreasing operating expenses by, uh, 650,000, um, enabling marketing to realize, uh, 2.4 million in annual savings on, on cash, on commercial transactions, um, and protecting 3.7 million customers from financial crime impacts through 95, new proactive control alerts, improving risk and capital calculations to reduce the amount of money. It must set aside, uh, as part of a, as part of risk mandates. Uh, for example, in one instance, the risk team was able to release a $5.2 million that it had withheld for non-performing credit card loans by properly identifying healthy accounts miscategorized as high risk next, uh, let's uh, talk about, uh, Rabobank. >>Um, Rabobank is one of the largest banks in the Netherlands, uh, with approximately 8.3 million customers. Uh, it was founded by farmers in the late 19th century and specializes in agricultural financing and sustainability oriented banking, uh, in order to help its customers become more self-sufficient and, uh, improve their financial situations such as debt settlement, uh, rebel bank needed to access, uh, to a varied mix of high quality, accurate, and timely customer data, the talent, uh, to provide this insight, however, was the ability to execute sophisticated and timely data analytics at scale Rabobank was also faced with the challenge of, uh, shortening time to market. Uh, it needed easier access to customer data sets to ensure that they were using and receiving the right financial support at the right time with, with, uh, data quality and speed of processing. Um, highlighted as two vital areas of improvement, Rabobank was looking to incorporate, um, or create new data in an environment that would not only allow the organization to create a centralized repository of high quality data, but also allow them to stream and, uh, conduct data analytics on the fly, uh, to create actionable insights and deliver a strong customer experience bank level Cloudera due to its ability to cope with heavy pressures on data processing and its capability of ingesting large quantities of real time streaming data. >>They were able to quickly create a new data lake that allowed for faster queries of both historical and real time data to analyze customer loan repayment patterns, uh, to up to the minute transaction records, um, Robert bank and, and its customers could now immediately access, uh, the valuable data needed to help them understand, um, the status of their financial situation in this enabled, uh, rebel bank to spot financial disasters before they happened, enabling them to gain deep and timely insights into which customers were at risk of defaulting on loans. Um, having established the foundation of a modern data architecture Rabobank is now able to run sophisticated machine learning algorithms and, uh, financial models, uh, to help customers manage, um, financial, uh, obligations, um, including, uh, long repayments and are able to generate accurate, uh, current real liquidity. I refuse, uh, next, uh, let's uh, speak about, um, uh, OVO. >>Uh, so OVO is the leading digital payment rewards and financial services platform in Indonesia, and is present in 115 million devices across the company across the country. Excuse me. Um, as the volume of, of products within Obos ecosystem increases, the ability to ensure marketing effectiveness is critical to avoid unnecessary waste of time and resources, unlike competitors, uh, banks, w which use traditional mass marketing, uh, to reach customers over, oh, decided to embark on a, on a bold new approach to connect with customers via, uh, ultra personalized marketing, uh, using the Cloudera stack. The team at OVO were able to implement a change point detection algorithm, uh, to discover customer life stage changes. This allowed OVO, uh, to, uh, build a segmentation model of one, uh, the contextual offer engine Bill's recommendation algorithms on top of the product, uh, including collaborative and context-based filters, uh, to detect changes in consumer consumption patterns. >>As a result, OVO has achieved a 15% increase in revenue, thanks to this, to this project, um, significant time savings through automation and eliminating the chance of human error and have reduced engineers workloads by, by 30%. Uh, next let's talk about, uh, bank Bri, uh, bank Bri is one of the largest and oldest, uh, banks in Indonesia, um, engaging in, in general banking services, uh, for its customers. Uh, they are headquartered in, in Jakarta Indonesia, uh, PR is a well-known, uh, for its, uh, focused on micro-financing initiative initiatives and serves over 75 million customers through more than 11,000 offices and rural outposts, um, Bri needed to gain better understanding of their customers and market, uh, to improve the efficiency of its operations, uh, reduce losses from non-performing loans and address the rising concern around data security from regulators and consumers, uh, through enhanced fraud detection. This would require the ability to analyze the vast amounts of, uh, historical financial data and use those insights, uh, to enhance operations and, uh, deliver better service. >>Um, Bri used Cloudera's enterprise data platform to build an agile and reliable, uh, predictive augmented intelligence solution. Uh, Bri was now able to analyze 124 years worth of historical financial data and use those insights to enhance its operations and deliver better services. Um, they were able to, uh, enhance their credit scoring system, um, the solution analyzes customer transaction data, and predicts the probability of a customer defaulting on, on payments. Um, the following month, it also alerts Bri's loan officers, um, to at-risk customers, prompting them to take the necessary action to reduce the likelihood of the net profit lost, uh, this resulted in improved credit, improved credit scoring system, uh, that cut down the approval of micro financing loans, uh, from two weeks to two days to, to two minutes and, uh, enhanced fraud detection. >>All right. Uh, this example shows a tabular representation, uh, the evolution of a customer retention use case, um, the evolution of data and analytics, uh, journey that, uh, that for that use case, uh, from aware, uh, text flirtation, uh, to optimization, to being transformative, uh, with every level, uh, data sources increase. And, uh, for the most part, uh, are, are less, less standard, more dynamic and less structured, but always adding more value, more insights into the customer, uh, allowing us to continuously improve our analytics, increase the velocity of the data we ingest, uh, from, from batch, uh, to, uh, near real time, uh, to real-time streaming, uh, the volume of data we ingest continually increases and we progress, uh, the value of the data on our customers, uh, is continuously improving, allowing us to interact more proactively and more efficiently. And, and with that, um, I would, uh, you know, ask you to consider and assess if you are using all the, uh, the data available to understand, uh, and service your customers, and to learn more about, about this, um, you know, visit cloudera.com and schedule a meeting with Cloudera to learn more. And with that, thank you for your time. And thank you for listening.

Published Date : Aug 4 2021

SUMMARY :

So the cost of, uh, to purchase a, approach, uh, advanced, uh, data analytics and machine learning, uh, integrate with an enterprise data hub to scale the data increased uh, semi dine, and Quintex, uh, uh, so send me nine provides fraud uh, the banks, uh, uh, performance essential to this uh, to help make their, their banking experience simpler, safer, uh, bank Rakka yet, uh, in Indonesia or Bri. the company, uh, to learn more about fraud prevention, uh, go to kroger.com uh, which consists of 82 out of a hundred of the largest global banks of which we have 27 this also breaks down, uh, costly silos, uh, uh, giving them more, more data, better data, uh, to extract to develop some analytics, uh, to know your customer and start to provide We need to manage, uh, and offers of, uh, different products and services to customers and maintain customer satisfaction, the same metrics, uh, leading to different results. as high risk next, uh, let's uh, on the fly, uh, to create actionable insights and deliver a strong customer experience next, uh, let's uh, speak about, um, uh, This allowed OVO, uh, to, uh, build a segmentation model uh, to improve the efficiency of its operations, uh, reduce losses from reduce the likelihood of the net profit lost, uh, to being transformative, uh, with every level, uh, data sources increase.

ENTITIES

Entity	Category	Confidence
Rabobank	ORGANIZATION	0.99+
Indonesia	LOCATION	0.99+
Spain	LOCATION	0.99+
Joe Rodriguez	PERSON	0.99+
Asia	LOCATION	0.99+
OVO	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
2017	DATE	0.99+
Europe	LOCATION	0.99+
two minutes	QUANTITY	0.99+
MasterCard	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
30%	QUANTITY	0.99+
2.4 million	QUANTITY	0.99+
95	QUANTITY	0.99+
UOB	ORGANIZATION	0.99+
7,000	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
15%	QUANTITY	0.99+
two days	QUANTITY	0.99+
75%	QUANTITY	0.99+
650,000	QUANTITY	0.99+
VRI	ORGANIZATION	0.99+
$5.2 million	QUANTITY	0.99+
eight	QUANTITY	0.99+
Netherlands	LOCATION	0.99+
27	QUANTITY	0.99+
Rakka	ORGANIZATION	0.99+
3.2 million	QUANTITY	0.99+
82	QUANTITY	0.99+
one instance	QUANTITY	0.99+
Bri	ORGANIZATION	0.99+
Experian	ORGANIZATION	0.99+
more than 500 offices	QUANTITY	0.99+
Western Europe	LOCATION	0.99+
first	QUANTITY	0.99+
late 19th century	DATE	0.99+
more than 11,000 offices	QUANTITY	0.99+
more than 50%	QUANTITY	0.99+
124 years	QUANTITY	0.99+
south America	LOCATION	0.99+
Pacific	LOCATION	0.99+
millions	QUANTITY	0.99+
over 75 million customers	QUANTITY	0.99+
19 countries	QUANTITY	0.99+
four	QUANTITY	0.99+
10 million transactions	QUANTITY	0.98+
Jakarta Indonesia	LOCATION	0.98+
one	QUANTITY	0.98+
115 million devices	QUANTITY	0.98+
Accenture	ORGANIZATION	0.98+
over 80 cents	QUANTITY	0.98+
both	QUANTITY	0.98+
five top stock exchanges	QUANTITY	0.98+
five	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
approximately 8.3 million customers	QUANTITY	0.97+
Cloudera	TITLE	0.97+
3.7 million customers	QUANTITY	0.97+
$5	QUANTITY	0.97+
over 20 central banks	QUANTITY	0.97+
up to $110	QUANTITY	0.97+

RETAIL | CLOUDERA

>>Thank you and good morning or afternoon, everyone, depending on where you're coming to us from and welcome to today's breakout session, fast data, a retail industry business imperative. My name is Brent Bedell, global managing director of retail, consumer bids here at Cloudera and today's hosts joining today. Joining me today is our feature speaker Brian Hill course managing partner from RSR. We'll be sharing insights and implications from recently completed research across retailers of all sizes in vertical segments. At the end of today's session, I'll share a brief overview on what I personally learned from retailers and how Cloudera continues to support retail data analytic requirements, and specifically around streaming data, ingest analytics, automation for customers around the world. There really is the next step up in terms of what's happening with data analytics today. So let's get started. So I thought it'd be helpful to provide some background first on how Clare to Cloudera is supporting and retail industry leaders specifically how they're leveraging Cloudera for leading practice data analytics use cases primarily across four key business pillars. >>And these will be very familiar to, to those in the industry. Personalize interactions of course, plays heavily into e-commerce and marketing, whether that's developing customer profiles, understanding the OB omni-channel journey, moving into the merchandising line of business focused on localized promotional planning, forecasting demand, forecast accuracy, then into supply chain where inventory visibility is becoming more and more critical today, whether it's around fulfillment or just understanding where your stuff is from a customer perspective. And obviously in and outbound route optimization right now, as retailers are taking control of actual delivery, whether it's to a physical store location or to the consumer. And then finally, uh, which is pretty exciting to me as a former store operator, you know, what's happening with physical brick and mortar right now, especially for traditional retailers. Uh, the whole re-imagining of stores right now is on fire in a lot of focus because, you know, frankly, this is where fulfillment is happening. >>Um, this is where customers, you know, still 80% of revenue is driven through retail, through physical brick and mortar. So right now store operations is getting more focused and I would say it probably is had and decades. Uh, and a lot of has to do for us with IOT data and analytics in the new technologies that really help, uh, drive, uh, benefits for retailers from a brick and mortar standpoint. And then, and then finally, um, you know, to wrap up before handing off to Brian, um, as you'll see, you know, all of these, these lines of businesses are raw, really experiencing the need for speed, uh, you know, fast data. So we're, we're moving beyond just discovery analytics. You don't things that happened five, six years ago with big data, et cetera. And we're really moving into real time capabilities because that's really where the difference makers are. >>That's where the competitive differentiation as across all of these, uh, you know, lines of business and these four key pillars within retail, um, the dependency on fast data is, is evident. Um, and it's something that we all read, you know, you know, in terms of those that are students of the industry, if you will, um, you know, that we're all focused on in terms of bringing value to the individual, uh, lines of business, but more importantly to the overall enterprise. So without further ado, I, I really want to, uh, have Brian speak here as a, as a third party analyst. You know, he, he's close in touch with what's going on, retail talking to all the solution providers, all the key retailers about what's important, what's on their plate. What are they focusing on right now in terms of fast data and how that could potentially make a difference for them going forward? So, Brian, uh, off to you, >>Well, thanks, Brent. I appreciate the introduction. And I was thinking, as you were talking, what is fast data? Well, data is fast. It is fast data it's stuff that comes at you very quickly. When I think about the decision cycles in retail, they were, they were, they were time phased and there was a time when we could only make a decision perhaps once a month and then met once a week and then once a day, and then intraday fast data is data that's coming at you and something approaching real time. And we'll explain why that's important in just a second. But first I want to share with you just a little bit about RSR. We've been in business now for 14 years. And what we do is we studied the business use cases that drive the adoption of technology in retail. We come from the retail industry, I was a retail technologist, my entire working life. >>And so we started this company. So I'm, I have a built in bias, of course, and that is that the difference between the winners in the retail world and in fact, in the entire business world and everybody else is how they value the strategic importance of information, and really that's where the battle is being fought today. We'll talk a little bit about that. So anyway, uh, one other thing about RSR research, our research is free to the entire world. Um, we don't, we don't have a paywall. You have to get behind. All you have to do is sign into our website, uh, identify yourself and all of our research, including these two reports that we're showing on the screen now are available to you. And we'd love to hear your comments. So when we talk about data, there's a lot of business implications to what we're trying to do with fast data and as being driven by the real world. >>Uh, we saw a lot of evidence of that during the COVID pandemic in 2020, when people had to make many decisions very, very quickly, for example, a simple one. Uh, do I redirect my replenishments to store B because store a is impacted by the pandemic, those kinds of things. Uh, these two drawings are actually from a book that came out in 1997. It was a really important book for me personally is by a guy named Steven Hegel. And it was the name of the book was the adaptive enterprise. When you think about your business model, um, and you think about the retail business model, most of those businesses are what you see on the left. First of all, the mission of the business doesn't change much at all. It changes once in a generation or maybe once in a lifetime, um, but it it's established quite early. >>And then from that point on it's, uh, basically a wash rinse and repeat cycle. You do the things that you do over and over and over again, year in and year out season in and season out. And the most important piece of information that you have is the transaction data from the last cycle. So a Brent knows this from his experience as a, as a retailer, the baseline for next year's forecast is last year's performance. And this is transactional in nature. It's typically pulled from your ERP or from your best of breed solution set on the right is where the world is really going. And before we get into the details of this, I'll just use a real example. I'm I'm sure like, like me, you've watched the path of hurricanes as they go up to the Florida coast. And one of the things you might've noticed is that there's several different possible paths. >>These are models, and you'll hear a lot about models. When you talk to people in the AI world, these are models based on lots and lots of information that they're getting from Noah and from the oceanographic people and all those kinds of folks to understand the likely path of the hurricane, based on their analysis, the people who watch these things will choose the most likely paths and they will warn communities to lock down and do whatever they need to do. And then they see as the, as the real hurricane progresses, they will see if it's following that path, or if it's varying, it's going down a different path and based on that, they will adapt to a new model. And that is what I'm talking about here now that not everything is of course is life and death as, as a hurricane. But it's basically the same concept what's happening is you have your internal data that you've had since this, a command and control model that we've mentioned on the left, and you're taking an external data from the world around you, and you're using that to make snap decisions or quick decisions based on what you see, what's observable on the outside, back to my COVID example, um, when people were tracking the path of the pandemic through communities, they learn that customers or consumers would favor certain stores to pick up their, what they needed to get. >>So they would avoid some stores and they would favor other stores. And that would cause smart retailers to redirect the replenishments on very fast cycles to those stores where the consumers are most likely to be. They also did the same thing for employees. Uh, they wanted to know where they could get their employees to service these customers. How far away were they, were they in a community that was impacted or were they relatively safe? These are the decisions that were being made in real time based on the information that they were getting from the marketplace around them. So, first of all, there's a context for these decisions. There's a purpose and the bounds of the adaptive structure, and then there's a coordination of capabilities in real time. And that creates an internal feedback loop, but there's also an external feedback loop. This is more of an ecosystem view. >>And based on those two, those two inputs what's happening internally, what your performance is internally and how your community around you is reacting to what you're providing. You make adjustments as necessary. And this is the essence of the adaptive enterprise. Engineers might call this a sense and respond model. Um, and that's where retail is going. But what's essential to that is information and information, not just about the products that you sell or the stores that you sell it in, or the employees that you have on the sales floor or the number of market baskets you've completed in the day, but something much, much more. Um, if you will, a twin, a digital twin of the physical assets of your business, all of your physical assets, the people, the products, the customers, the buildings, the rolling stock, everything, everything. And if you can create a digital equivalent of a physical thing, you can then analyze it. >>And if you can analyze it, you can make decisions much, much more quickly. So this is what's happening with the predict pivot based on what you see, and then, because it's an intrinsically more complicated model to automate, decision-making where it makes sense to do so. That's pretty complicated. And I talk about new data. And as I said earlier, the old data is all transactional in nature. Mostly about sales. Retail has been a wash in sales data for as long as I can remember throw, they throw most of it away, but they do keep enough to create the forecast the next for the next business cycle. But there's all kinds of new information that they need to be thinking about. And a lot of this is from the outside world. And a lot of this is non-transactional nature. So let's just take a look at some of them, competitive information. >>Those are always interested in what the competitor is up to. What are they promoting? How well are they they doing, where are they? What kind of traffic are they generating sudden and stuff, significant changes in customer behaviors and sentiment COVID is a perfect example of something that would cause this consumers changing their behaviors very quickly. And we have the ability to, to observe this because in a great majority of cases, nowadays retailers have observed that customers start their, uh, shopping journey in the digital space. As a matter of fact, Google recently came out and said, 60%, 63% of all, all sales transactions begin in the digital domain. Even if many of them end up in the store. So we have the ability to observe changes in consumer behavior. What are they looking at? When are they looking at it? How long do they spend looking at it? >>What else are they looking at while they're, while they're doing that? What are the, what is the outcome of that market metrics? Certainly what's going on in the marketplace around you? A good idea. Example of this might be something related to a sporting event. If you've planned based on normal demand and for, for your store. And there's a big sporting event, like a football match or a baseball game, suddenly you're going to see a spike in demand. So understanding what's going on in the market is really important. Location, demographics and psychographics, demographics have always been important to retailers, but now we're talking about dynamic demographics, what customers, or what consumers are, are in your market, in something approaching real time, psychographics has more to do with their attitudes. What kind of folks are, are, are in them in a particular marketplace? What do they think about what do they favor? >>And all those kinds of interesting deep tales, real-time environmental and social incidents. Of course, I mentioned hurricanes. And so that's fairly, self-evident disruptive events, sporting events, et cetera. These are all real. And then we get the real time internet of things. These are, these are RFID sensors, beacons, video, et cetera. There's all kinds of stuff. And this is where, yeah, it's interesting. This is where the supply chain people will start talking about the difference, little twin to their physical world. If you can't say something, you can manage it. And retailers want to be able to manage things in real time. So IOT, along with it, the analytics and the data that's generated is really, really important for them going forward, community health. We've been talking a lot about that, the progression of the flu, et cetera, et cetera, uh, business schedules, commute patterns, school schedules, and whether these are all external data that are interesting to retailers and can help them to make better operational in something approaching real time. >>I mentioned the automation of decision making. This is a chart from Gardner, and I'd love to share with you. It's a really good one because it describes very simply what we're talking about. And it also describes where the inflection of new technology happens. If you look on the left there's data, we have lots and lots of data. We're getting more data all the time, retailers for a long time. Now, since certainly since the seventies or eighties have been using data to describe what happened, this is the retrospective analysis that we're all very familiar with, uh, data cubes and those kinds of things. And based on that, the human makes some decisions about what they're going to do going forward. Um, sometime in the not too distant past, this data was started to be used to make diagnostic decisions, not only what happened, but why did it happen? >>And me might think of this as, for example, if sales were depressed for a certain product, was it because we had another product on sale that day, that's a good example of fairly straightforward diagnostics. We then move forward to what we might think of as predictive analytics. And this was based on what happened in the past and why it happened in the past. This is what's likely to happen in the future. You might think of this as, for example, halo effect or, or the cannibalization effect of your category plans. If you're, if you happen to be a grocer and based on that, the human will make a decision as to what they need to do next then came along AI, and I don't want to oversell AI here. AI is a new way for us to examine lots and lots of data, particularly unstructured data AI. >>If I could simplify it to its maximum extent, it essentially is a data tool that allows you to see patterns in data, which might be interesting. It's very good at sifting through huge data sets of unstructured data and detecting statistically significant patterns. It gets deeper than that, of course, because it uses math instead of rules. So instead of an if then, or else a statement that we might've used with our structured data, we use the math to detect these patterns in unstructured data. And based on those, we can make some models. For example, uh, my guy in my, in my, uh, just turned 70 on my 70 year old man, I'm a white guy. I live in California. I have a certain income and a certain educational level. I'm likely to behave in this way based on a model that's pretty simplistic. But based on that, you can see that. >>And when another person who meets my psychographics, my demographics, my age group, my income level and all the rest, um, you, they might, they might be expected to make a certain action. And so this is where prescriptive really comes into play. Um, AI makes that possible. And then finally, when you start to think about moving closer to the customer on something, approaching a personalized level, a one-to-one level, you, you suddenly find yourself in this situation of having to make not thousands of decisions, but tens of millions of decisions. And that's when the automation of decision-making really gets to be pretty important. So this is all interesting stuff, and I don't want to oversell it. It's exciting. And it's new. It's just the latest turn of the technology screw. And it allows us to use this new data to basically automate decision-making in the business, in something approaching real time so that we can be much, much more responsive to real-time conditions in the marketplace. >>Very exciting. So I hope this is interesting. This is a piece of data from one of our recent pieces of research. Uh, this happens to be from a location analytics study. We just published last week and we asked retailers, what are the big challenges what's been going on in the last 12 months for them? And what's likely to be happening for them in the next few years. And it's just fascinating because it speaks to the need for faster decision-making there. The challenges in the last 12 months were all related to COVID. First of all, fulfilling growing online demand. This is a very, very real time issue that we all had to deal with. But the next one was keeping forecasts in sync with changing demand. And this is one of those areas where retailers are now finding themselves, needing to look at that exoticness for that external data that I mentioned to you last year, sales were not a good predictor of next year of sales. >>They needed to look at sentiment. They needed to look at the path of the disease. They needed to look at the availability of products, alternate sourcing, global political issues. All of these things get to be pretty important and they affect the forecast. And then finally managing a supply them the movement of the supply through the supply chain so that they could identify bottlenecks now, point to one of them, which we can all laugh at now because it's kind of funny. It wasn't funny at the time we ran out of toilet paper, toilet paper was a big problem. Now there is nothing quite as predictable as toilet paper, it's tied directly to the size of the population. And yet we ran out and the thing we didn't expect when the COVID pandemic hit was that people would panic. And when people panic, they do funny things. >>One of the things I do is buy up all the available toilet paper. I'm not quite sure why that happened. Um, but it did happen and it drained the supply chain. So retailers needed to be able to see that they needed to be able to find alternative sources. They needed to be able to do those kinds of things. This gets to the issue of visibility, real time data, fast data tomorrow's challenge. It's kind of interesting because one of the things that they've retailers put at the top of their list is improved inventory productivity. Uh, the reason that they are interested in this is because then we'll never spend as much money, anything as they will on inventory. And they want the inventory to be targeted to those places where it is most likely to be consumed and not to places where it's least likely to be consumed. >>So this is trying to solve the issue of getting the right product at the right place at the right time to the right consumer and retailers want to improve this because the dollars are just so big, but in this complex, fast moving world that we live in today, it's this requires something approaching real-time visibility. They want to be able to monitor the supply chain, the DCS and the warehouses. And they're picking capacity. We're talking about each of us, we're talking about each his level. Decision-making about what's flowing through the supply chain all the way from the, from the manufacturing doctor, the manufacturer through to consumption. There's two sides of the supply chain and retailers want to look at it, you'll hear retailers and, and people like me talk about the digital twin. This is where this really becomes important. And again, the digital twin is, is enabled by IOT and AI analytics. >>And finally they need to re to increase their profitability for online fulfillment. Uh, this is a huge issue, uh, for some grocers, the volume of online orders went from less than 10% to somewhere north of 40%. And retailers did in 2020, what they needed to do to fulfill those customer orders in the, in the year of the pandemic, that now the expectation that consumers have have been raised significantly. They now expect those, those features to be available to them all the time. And many people really liked them. Now retailers need to find out how to do it profitably. And one of the first things they need to do is they need to be able to observe the process so that they can find places to optimize. This is out of our recent research and I encourage you to read it. >>And when we think about the hard one wisdoms that retailers have come up with, we think about these things better visibility has led to better understanding, which increases their reaction time, which increases their profitability. So what are the opportunities? This is the first place that you'll see something that's very common. And in our research, we separate over performers, who we call retail winners from everybody else, average and under-performers, and we've noticed throughout the life of our company, that retail winners, don't just do all the same things that others do. They tend to do other things. And this shows up in this particular graph, this again is from the same study. So what are the opportunities to, to address these challenges? I mentioned to you in the last slide, first of all, strategic placement of inventory throughout the supply chain to better fulfill customer needs. This is all about being able to observe the supply chain, get the inventory into a position where it can be moved quickly to fast changing demand. >>And on the consumer side, a better understanding and reacting to unplanned events that can drive a dramatic change in customer behavior. Again, this is about studying the data, analyzing the data and reacting to the data that comes before the sales transaction. So this is observing the path to purchase observing things that are happening in the marketplace around the retailer, so that they can respond very quickly, a better understanding of the dramatic changes in customer preference and path to purchase. As they engage with us. One of the things we, all we all know about consumers now is that they are in control and the literally the entire planet is the assortment that's available to them. If they don't like the way they're interacting with you, they will drop you like a hot potato and go to somebody else. And what retailers fear justifiably is the default response to that is to just see if they can find it on Amazon. >>You don't want this to happen if you're a retailer. So we want to observe how we are interacting with consumers and how well we are meeting their needs, optimizing omni-channel order fulfillment to improve profitability. We've already mentioned this, uh, retailers did what they needed to do to offer new fulfillment options to consumers. Things like buy online pickup curbside, buy online pickup in store, buy online, pick up at a locker, a direct to consumer all of those things. Retailers offer those in 2020 because the consumers demand it and needed it. So when retailers are trying to do now is to understand how to do that profitably. And finally, this is important. It never goes away. Is the reduction of waste shrink within the supply chain? Um, I'm embarrassed to say that when I was a retail executive in the nineties, uh, we were no more certain of consumer demand than anybody else was, but we, we wanted to commit to very high service levels for some of our key county categories somewhere approaching 95%. >>And we found the best way to do that was to flood the supply chain with inventory. Uh, it sounds irresponsible now, but in those days, that was a sure-fire way to make sure that the customer had what she was looking for when she looked for it. You can't do that in today's world. Money is too tight and we can't have that, uh, inventory sitting around and move to the right places. Once we discovered what the right place is, we have to be able to predict, observe and respond in something much closer to your time. One of the next slide, um, the simple message here, again, a difference between winners and everybody else, the messages, if you can't see it, you can't manage it. And so we asked retailers to identify, to what extent an AI enabled supply chain can help their company address some issues. >>Look at the differences here. They're shocking identifying network bottlenecks. This is the toilet paper story I told you about over half of retail winners, uh, feel that that's very important. Only 19% of average and under performers, no surprise that their average and under-performers visibility into available to sell inventory anywhere within the enterprise, 58% of winners and only 32% of everybody else. And you can go on down the list, but you get the just retail winners, understand that they need to be able to see their assets and something approaching real time so that they can make the best decisions possible going forward in something approaching real time. This is the world that we live in today. And in order to do that, you need to be able to number one, see it. And number two, you need to be able to analyze it. And number three, you have to be able to make decisions based on what you saw, just some closing observations on. >>And I hope this was interesting for you. I love talking about this stuff. You can probably tell I'm very passionate about it, but the rapid pace of change in the world today is really underscoring the importance. For example, of location intelligence, as a key component of helping businesses to achieve sustainable growth, greater operational effectiveness and resilience, and ultimately your success. So this is really, really critical for retailers to understand and successfully evolving businesses need to accommodate these new consumer shopping behaviors and changes in how products are brought to the market. So that, and in order to do that, they need to be able to see people. They need to be able to see their assets, and they need to be able to see their processes in something approaching real time, and then they need to analyze it. And based on what they've uncovered, they need to be able to make strategic and operational decision making very quickly. This is the new world we live in. It's a real-time world. It's a, it's a sense and respond world and it's the way forward. So, Brent, I hope that was interesting for you. I really enjoyed talking about this, as I said, we'd love to hear a little bit more. >>Hey, Brian, that was excellent. You know, I always love me love hearing from RSR because you're so close to what retailers are talking about and the research that your company pulls together. Um, you know, one of the higher level research articles around, uh, fast data frankly, is the whole notion of IOT, right? And he does a lot of work in this space. Um, what I find fascinating based off the recent research is believe it or not, there's $1.2 trillion at stake in retail per year, between now and 2025. Now, how is that possible? Well, part of it is because the Kinsey captures not only traditional retail, but also QSRs and entertainment then use et cetera. That's considered all of retail, but it's a staggering number. And it really plays to the effect that real-time can have on individual enterprises. In this case, we're talking of course, about retail. >>So a staggering number. And if you think about it from streaming video to sensors, to beacons, RFID robotics, autonomous vehicles, retailers are testing today, even pizza delivery, you know, autonomous vehicle. Well, if you think about it, it shouldn't be that shocking. Um, but when they were looking at 12 different industries, retail became like the number three out of 12, and there's a lot of other big industries that will be leveraging IOT in the next four years. So, um, so retailers in the past have been traditionally a little stodgy about their spend in data and analytics. Um, I think retailers in general have got the religion that this is what it's going to take to compete in today's world, especially in a global economy. And in IOT really is the next frontier, which is kind of the definition of fast data. Um, so I, I just wanted to share just a few examples or exemplars of, of retailers that are leveraging Cloudera technology today. >>So now, so now the paid for advertisement at the end of this, right? So, so, you know, so what bringing to market here. So, you know, across all retail, uh, verticals, you know, if we look at, you know, for example, a well-known global mass virtual retailer, you know, they're leveraging Cloudera data flow, which is our solution to move data from point to point in wicked fast space. So it's open source technology that was originally developed by the NSA. So, um, it is best to class movement of data from an ingest standpoint, but we're also able to help the roundtrip. So we'll pull the sensor data off all the refrigeration units for this particular retailer. They'll hit it up against the product lifecycle table. They'll understand, you know, temperature fluctuations of 10, 20 degrees based on, you know, fresh food products that are in the store, what adjustments might need to be made because frankly store operators, they'll never know refrigeration don't know if a cooler goes down and they'll have to react quickly, but they won't know that 10, 20 degree temperature changes have happened overnight. >>So this particular customer leverages father a data flow understand temperature, fluctuations the impact on the product life cycle and the round trip communication back to the individual department manager, let's say a produce department manager, deli manager, meat manager, Hey, you had, you know, a 20 degree drop in temperature. We suggest you lower the price on these products that we know are in that cooler, um, for the next couple of days by 20%. So you don't have to worry, tell me about freshness issues and or potential shrink. So, you know, the grocery with fresh product, if you don't sell it, you smell it, you throw it away. It's lost to the bottom line. So, you know, critically important and, you know, tremendous ROI opportunity that we're helping to enable there, uh, from a, a leading global drugstore retailer. So this is more about data processing and, you know, we're excited to, you know, the recent partnership with the Vidia. >>So fast data, isn't always at the edge of IOT. It's also about workloads. And in retail, if you are processing your customer profiles or segmentation like intra day, you will ever achieve personalization. You will never achieve one-on-one communications with readers killers or with customers. And why is that? Because customers in many cases are touching your brand several times a week. So taking you a week or longer to process your segmentation schemes, you've already lost and you'll never achieve personalization in frack. In fact, you may offend customers by offering. You might push out based on what they just bought yesterday. You had no idea of it. So, you know, that's what we're really excited about. Uh, again, with, with the computation speed, then the video brings to, to Cloudera, we're already doing this today already, you know, been providing levels, exponential speed and processing data. But when the video brings to the party is course GPU's right, which is another exponential improvement, uh, to processing workloads like demand forecast, customer profiles. >>These things need to happen behind the scenes in the back office, much faster than retailers have been doing in the past. Um, that's just the world we all live in today. And then finally, um, you know, proximity marketing standpoint, or just from an in-store operation standpoint, you know, retailers are leveraging Cloudera today, not only data flow, but also of course our compute and storage platform and ML, et cetera, uh, to understand what's happening in store. It's almost like the metrics that we used to look at in the past in terms of conversion and traffic, all those metrics are now moving into the physical world. If you can leverage computer vision in streaming video, to understand how customers are traversing your store, how much time they're standing in front of the display, how much time they're standing in checkout line. Um, you can now start to understand how to better merchandise the store, um, where the hotspots are, how to in real time improve your customer service. >>And from a proximity marketing standpoint, understand how to engage with the customer right at the moment of truth, right? When they're right there, um, in front of a particular department or category, you know, of course leveraging mobile devices. So that's the world of fast data in retail and just kind of a summary in just a few examples of how folks are leveraging Cloudera today. Um, you know, from an overall platform standpoint, of course, father as an enterprise data platform, right? So, you know, we're, we're helping to the entire data life cycle. So we're not a data warehouse. Um, we're much more than that. So we have solutions to ingest data from the edge from IOT leading practice solutions to bring it in. We also have experiences to help, you know, leverage the analytic capabilities of, uh, data engineering, data science, um, analytics and reporting. Uh, we're not, uh, you know, we're not, we're not encroaching upon the legacy solutions that many retailers have today. >>We're providing a platform, this open source that helps weave all of this mess together that existed retail today from legacy systems because no retailer, frankly, is going to rip and replace a lot of stuff that they have today. Right. And the other thing that Cloudera brings to market is this whole notion of on-prem hybrid cloud and multi-cloud right. So our whole, our whole culture has been built around open source technology as the company that provides most of the source code to the Apache network around all these open source technologies. Um, we're kind of religious about open source and lack of vendor lock-in, uh, maybe to our fault. Uh, but as a company, we pull that together from a data platform standpoint. So it's not a rip and replace situation. It's like helping to connect legacy systems, data and analytics, um, you know, weaving that whole story together to be able to solve this whole data life cycle from beginning to end. >>And then finally, you know, just, you know, I want to thank everyone for joining today's session. I hope you found it informative. I can't say Brian killed course enough. Um, you know, he's my trusted friend in terms of what's going on in the industry. He has much broader reach of course, uh, in talking to a lot of our partners in, in, in, in other, uh, technology companies out there as well. But I really appreciate everyone joining the session and Brian, I'm going to kind of leave it open to you to, you know, any closing comments that you might have based on, you know, what we're talking about today in terms of fast data and retail. >>First of all, thank you, Brent. Um, and this is an exciting time to be in this industry. Um, and I'll just leave it with this. The reason that we are talking about these things is because we can, the technology has advanced remarkably in the last five years. Some of this data has been out there for a lot longer than that in it, frankly wasn't even usable. Um, but what we're really talking about is increasing the cycle time for decisions, making them go faster and faster so that we can respond to consumer expectations and delight them in ways that that make us a trusted provider of their life, their lifestyle needs. So this is really a good time to be a retailer, a real great time to be servicing the retail technology community. And I'm glad to be a part of it. And I was glad to be working with you. So thank you, Brian. >>Yeah, of course, Brian, and one of the exciting things for me to not being in the industry, as long as I have and being a former retailer is it's really exciting for me to see retailers actually spending money on data and it for a change, right? They've all kind of come to this final pinnacle of this is what it's going to take to compete. Um, you know, you know, and I talked to, you know, a lot of colleagues, even, even salespeople within Cloudera, I like, oh, retail, very stodgy, you know, slow to move. That's not the case anymore. Um, you know, religion is everyone's, everyone gets the religion of data and analytics and the value of that. And what's exciting for me to see as all this infusion of immense talent within the industry years ago, Brian, I mean, you know, retailers are like, you know, pulling people from some of the, you know, the greatest, uh, tech companies out there, right? From a data science data engineering standpoint, application developers, um, retail is really getting this legs right now in terms of, you know, go to market and in the leverage of data and analytics, which to me is very exciting. Well, >>You're right. I mean, I, I became a CIO around the time that, uh, point of sale and data warehouses were starting to happen data cubes and all those kinds of things. And I never thought I would see a change that dramatic, uh, as the industry experience back in those days, 19 89, 19 90, this changed doors that, but the good news is again, as the technology is capable, uh, it's, it's, we're talking about making technology and information available to, to retail decision-makers that consumers carry around in their pocket purses and pockets is there right now today. Um, so the, the, the question is, are you going to utilize it to win or are you going to get beaten? That's really what it boils down to. Yeah, >>For sure. Uh, Hey, thanks everyone. We'll wrap up. I know we ran a little bit long, but, uh, appreciate, uh, everyone, uh, hanging in there with us. We hope you enjoyed the session. The archive contact information is right there on the screen. Feel free to reach out to either Brian and I. You can go to cloudera.com. Uh, we even have, you know, joint sponsored papers with RSR. You can download there as well as eBooks and other assets that are available if you're interested. So thanks again, everyone for joining and really appreciate you taking the time. >>Hello everyone. And thanks for joining us today. My name is Brent Bedell, managing director retail, consumer goods here at Cloudera. Cloudera is very proud to be partnering with companies like three soft to provide data and analytic capabilities for over 200 retailers across the world and understanding why demand forecasting could be considered the heartbeat of retail. And what's at stake is really no mystery to most, to most retailers. And really just a quick level set before handing this over to my good friend, uh, Camille three soft, um, you know, IDC Gartner. Um, many other analysts have kind of summed up an average, uh, here that I thought would be important to share just to level set the importance of demand forecasting or retail. And what's at stake. I mean the combined business value for retailers leveraging AI and IOT. So this is above and beyond. What demand forecasting has been in the past is a $371 billion opportunity. >>And what's critically important to understand about demand forecasting. Is it directly impacts both the top line and the bottom line of retail. So how does it affect the top line retailers that leverage AI and IOT for demand forecasting are seeing average revenue increases of 2% and think of that as addressing the in stock or out of stock issue in retail and retail is become much more complex now, and that is no longer just brick and mortar, of course, but it's fulfillment centers driven by e-commerce. So inventory is now having to be spread over multiple channels. Being able to leverage AI and IOT is driving 2% average revenue increases. Now, if you think about the size of most retailers or the average retailer that on its face is worth millions of dollars of improvement for any individual retailer on top of that is balancing your inventory, getting the right product in the right place and having productive inventory. >>And that is the bottom line. So the average inventory reduction, leveraging AI and IOT as the analyst have found, and frankly, having spent time in this space myself in the past a 15% average inventory reduction is significant for retailers not being overstocked on product in the wrong place at the wrong time. And it touches everything from replenishment to out-of-stocks labor planning and customer engagement for purposes of today's conversation. We're going to focus on inventory and inventory optimization and reducing out-of-stocks. And of course, even small incremental improvements. I mentioned before in demand forecast accuracy have millions of dollars of direct business impact, especially when it comes to inventory optimization. Okay. So without further ado, I would like to now introduce Dr. Camille Volker to share with you what his team has been up to. And some of the amazing things that are driving at top retailers today. So over to you, Camille, >>Uh, I'm happy to be here and I'm happy to speak to you, uh, about, uh, what we, uh, deliver to our customers. But let me first, uh, introduce three soft. We are a 100 person company based in Europe, in Southern Poland. Uh, and we, uh, with 18 years of experience specialized in providing what we call a data driven business approach, uh, to our customers, our roots are in the solutions in the services. We originally started as a software house. And on top of that, we build our solutions. We've been automation that you get the software for biggest enterprises in Poland, further, we understood the meaning of data and, and data management and how it can be translated into business profits. Adding artificial intelligence on top of that, um, makes our solutions portfolio holistic, which enables us to realize very complex projects, which, uh, leverage all of those three pillars of our business. However, in the recent time, we also understood that services is something which only the best and biggest companies can afford at scale. And we believe that the future of retail, uh, demon forecasting is in the product solutions. So that's why we created occupy our AI platform for data driven retail. That also covers this area that we talked about today. >>I'm personally proud to be responsible for our technology partnerships with other on Microsoft. Uh, it's a great pleasure to work with such great companies and to be able to, uh, delivered a solution store customers together based on the common trust and understanding of the business, uh, which cumulates at customer success at the end. So why, why should you analyze data at retail? Why is it so important? Um, it's kind of obvious that there is a lot of potential in the data per se, but also understanding the different areas where it can be used in retail is very important. We believe that thanks to using data, it's basically easier to the right, uh, the good decisions for the business based on the facts and not intuition anymore. Those four areas that we observe in retail, uh, our online data analysis, that's the fastest growing sector, let's say for those, for those data analytics services, um, which is of course based on the econ and, uh, online channels, uh, availability to the customer. >>Pandemic only speeds up this process of engagement of the customers in that channel, of course, but traditional offline, um, let's say brick and mortar shops. Uh, they still play the biggest role for most of the retailers, especially from the FMCG sector. However, it's also very important to remember that there is plenty of business, uh, related questions that meet that need to be answered from the headquarter perspective. So is it actually, um, good idea to open a store in a certain place? Is it a good idea to optimize a stock with Saturday in producer? Is it a good idea to allocate the goods to online channel in specific way, those kinds of questions they are, they need to be answered in retail every day. And with that massive amount of factors coming into that question, it's really not, not that easy to base, only on the intuition and expert knowledge, of course, uh, as Brent mentioned at the beginning, the supply chain and everything who's relates to that is also super important. We observe our customers to seek for the huge improvements in the revenue, just from that one single area as well. Okay. >>So let me present you a case study of one of our solutions, and that was the lever to a leading global grocery retailer. Uh, the project started with the challenge set of challenges that we had to conquer. And of course the most important was how to limit overstocks and out of stocks. Uh, that's like the holy grail in of course, uh, how to do it without flooding the stores with the goods and in the same time, how to avoid empty shelves, um, from the perspective of the customer, it was obvious that we need to provide a very well, um, a very high quality of sales forecast to be able to ask for, uh, what will be the actual sales of the individual product in each store, uh, every day, um, considering huge role of the perishable goods in the specific grocery retailer, it was a huge challenge, uh, to provide a solution that was able to analyze and provide meaningful information about what's there in the sales data and the other factors we analyzed on daily basis at scale, however, uh, our holistic approach implementing AI with data management, uh, background, and these automation solutions all together created a platform that was able to significantly increase, uh, the sales for our customer just by minimizing out of stocks. >>In the same time we managed to not overflow the stock, the shops with the goods, which actually decreased losses significantly, especially on the fresh fruit. >>Having said that this results of course translate into the increase in revenue, which can be calculated in hundreds of millions of dollars per year. So how the solution actually works well in its principle, it's quite simple. We just collect the data. We do it online. We put that in our data lake, based on the cloud, there are technology, we implement our artificial intelligence models on top of it. And then based on the aggregated information, we create the forecast and we do it every day or every night for every single product in every single store. This information is sent to the warehouses and then the automated replenishment based on the forecast is on the way the huge and most important aspect of that is the use of the good tools to do the right job. Uh, having said that you can be sure that there is too many information in this data, and there is actually two-minute forecast created every night that any expert could ever check. >>This means our solution needs to be, uh, very robust. It needs to provide information with high quality and high porosity. There is plenty of different business process, which is on our forecast, which need to be delivered on time for every product in each individual shop observing the success of this project and having the huge market potential in mind, we decided to create our QB, which can be used by many retailers who don't want to create a dedicated software for that. We'll be solving this kind of problem. Occupy is, uh, our software service offering, which is enabling retailers to go data driven path management. >>We create occupant with retailers, for retailers, uh, implementing artificial intelligence, uh, on top of data science models created by our experts, uh, having data, data analysis in place based on data management tools that we use we've written first, um, attitude. The uncertain times of pandemic clearly shows that it's very important to apply correction factors, which are sometimes required because we need to respond quickly to the changes in the sales characteristics. That's why occupy B is open box solution, which means that you basically can implement that in your organization. We have without changing the process internally, it's all about mapping your process into this into the system, not the other way around the fast trends and products, collection possibilities allow the retailers to react to any changes, which are pure in the sales every day. >>Also, it's worth to mention that really it's not only FMCG. And we believe that different use cases, which we observed in fashion health and beauty, common garden pharmacies and electronics, flavors of retail are also very meaningful. They also have one common thread. That's the growing importance of e-commerce. That's why we didn't want to leave that aside of occupant. And we made everything we can to implement a solution, which covers all of the needs. When you think about the factors that affect sales, there is actually huge variety of data and that we can analyze, of course, the transactional data that every dealer possesses like sales data from sale from, from e-commerce channel also, uh, averaging numbers from weeks, months, and years makes sense, but it's also worth to mention that using the right tool that allows you to collect that data from also internal and external sources makes perfect sense for retail. Uh, it's very hard to imagine a competitive retailer that is not analyzing the competitor's activity, uh, changes in weather or information about some seasonal stores, which can be very important during the summer during the holidays, for example. Uh, but on the other hand, um, having that information in one place makes the actual benefit and environment for the customer. >>Okay. Demon forecasting seems to be like the most important and promising use case. We can talk about when I think about retail, but it's also their whole process of replenishment that can cover with different sets of machine learning models. And they done management tools. We believe that analyzing data from different parts of the retail, uh, replenishment process, uh, can be achieved with implementing a data management solution based on caldera products and with adding some AI on top of it, it makes perfect sense to focus on not only demand forecasting, but also further use cases down the line when it comes to the actual benefits from implementing solutions for demand management, we believe it's really important to analyze them holistically. First is of course, out of stocks, memorization, which can be provided by simply better sales focus, but also reducing overstocks by better inventory management can be achieved in, in the same time. Having said that we believe that analyzing data without any specific new equipment required in point of sales is the low hanging fruit that can be easily achieved in almost every industry in almost every regular customer. >>Hey, thanks, Camille, having worked with retailers in this space for a couple of decades, myself, I was really impressed by a couple of things and they might've been understated, frankly. Um, the results of course, I mean, you, you know, as I kind of set up this session, you doubled the numbers on the statistics that the analysts found. So obviously in customers you're working with, um, you know, you're, you're doubling average numbers that the industry is having and, and most notably how the use of AI or occupy has automated so many manual tasks of the past, like tour tuning, item profiles, adding new items, et cetera. Uh, and also how quickly it felt like, and this is my, my core question. Your team can cover, um, or, or provide the solution to, to not only core center store, for example, in grocery, but you're covering fresh products. >>And frankly, there are, there are solutions out on the market today that only focus on center store non-perishable department. So I was really impressed by the coverage that you're able to provide as well. So can you articulate kind of what it takes to get up and running and your overall process to roll out the solution? I feel like based on what you talked about, um, and how you were approaching this in leveraging AI, um, that you're, you're streamlining processes of legacy demand, forecasting solutions that required more manual intervention, um, how quickly can you get people set up and what is the overall process like to get started with soft? >>Yeah, it's usually it takes three to six months, uh, to onboard a new customer to that kind of solution. And frankly it depends on the data that the customer, uh, has. Uh, usually it's different, uh, for smaller, bigger companies, of course. Uh, but we believe that it's very important to start with a good foundation. The platform needs to be there, the platform that is able to, uh, basically analyze or process different types of data, structured, unstructured, internal, external, and so on. But when you have this platform set, it's all about starting ingesting data there. And usually for a smaller companies, it's easier to start with those, let's say, low hanging fruits. So the internal data, which is there, this data has the highest veracity is already easy to start with, to work with them because everyone in the organization understands this data for the bigger companies. It might be important to ingest also kind of more unstructured data, some kind of external data that need to be acquired. So that may, that may influence the length of the process. But we usually start with the customers. We have, uh, workshops. That's very important to understand their business because not every deal is the same. Of course, we believe that the success of our customers comes also due to the fact that we train those models, those AI models individually to the needs of our >>Totally understand and POS data, every retailer has right in, in one way shape or form. And it is the fundamental, uh, data point, whether it's e-comm or the brick and mortar data, uh, every retailer has that data. So that, that totally makes sense. But what you just described was bunts. Um, there are, there are legacy and other solutions out there that this could be a, a year or longer process to roll out to the number of stores, for example, that you're scaling to. So that's highly impressive. And my guess is a lot of the barriers that have been knocked down with your solution are the fact that you're running this in the cloud, um, you know, on, from a compute standpoint on Cloudera from a public cloud stamp point on Microsoft. So there's, there's no, it intervention, if you will, or hurdles in preparation to get the database set up and in all of the work, I would imagine that part of the time-savings to getting started, would that be an accurate description? >>Yeah, absolutely. Uh, in the same time, this actually lowering the business risks, because we simply take data and put that into the data lake, which is in the cloud. We do not interfere with the existing processes, which are processing this data in the combined. So we just use the same data. We just already in the company, we ask some external data if needed, but it's all aside of the current customers infrastructure. So this is also a huge gain, as you said, right? >>And you're meeting customers where they are. Right. So, as I said, foundationally, every retailer POS data, if they want to add weather data or calendar event data or, you know, want incorporate a course online data with offline data. Um, you have a roadmap and the ability to do that. So it is a building block process. So getting started with, for data, uh, as, as with POS online or offline is the foundational component, which obviously you're very good at. Um, and then having that ability to then incorporate other data sets is critically important because that just improves demand, forecast accuracy, right. By being able to pull in those, those other data sources, if you will. So Camille, I just have one final question for you. Um, you know, there, there are plenty of not plenty, but I mean, there's enough demand forecasting solutions out on the market today for retailers. One of the things that really caught my eye, especially being a former retailer and talking with retailers was the fact that you're, you're promoting an open box solution. And that is a key challenge for a lot of retailers that have, have seen black box solutions come and go. Um, and especially in this space where you really need direct input from the, to continue to fine tune and improve forecast accuracy. Could you give just a little bit more of a description or response to your approach to open box versus black box? >>Yeah, of course. So, you know, we've seen in the past the failures of the projects, um, based on the black box approach, uh, and we believe that this is not the way to go, especially with this kind of, uh, let's say, uh, specialized services that we provide in meaning of understanding the customer's business first and then applying the solution, because what stands behind our concept in occupy is the, basically your process in the organization as a retailer, they have been optimized for years already. That's where retailers put their, uh, focus for many years. We don't want to change that. We are not able to optimize it properly. For sure as it combined, we are able to provide you a tool which can then be used for mapping those very well optimized process and not to change them. That's our idea. And the open box means that in every process that you will map in the solution, you can then in real time monitor the execution of those processes and see what is the result of every step. That way we create truly explainable experience for our customers, then okay, then can easily go for the whole process and see how the forecast, uh, was calculated. And what is the reason for a specific number to be there at the end of the day? >>I think that is, um, invaluable. Um, can be, I really think that is a differentiator and what three soft is bringing to market with that. Thanks. Thanks everyone for joining us today, let's stay in touch. I want to make sure to leave, uh, uh, Camille's information here. Uh, so reach out to him directly or feel free at any, any point in time, obviously to reach out to me, um, again, so glad everyone was able to join today, look forward to talking to you soon.

Published Date : Aug 4 2021

SUMMARY :

At the end of today's session, I'll share a brief overview on what I personally learned from retailers and And then finally, uh, which is pretty exciting to me as a former Um, this is where customers, you know, still 80% of revenue is driven through retail, and it's something that we all read, you know, you know, in terms of those that are students of the industry, And I was thinking, as you were talking, what is fast data? So I'm, I have a built in bias, of course, and that is that most of those businesses are what you see on the left. And one of the things you might've noticed is that there's several different possible paths. on the outside, back to my COVID example, um, retailers to redirect the replenishments on very fast cycles to those stores where the information, not just about the products that you sell or the stores that you sell it in, And a lot of this is from the outside world. And we have the ability to, Example of this might be something related to a sporting event. We've been talking a lot about that, the progression of the flu, et cetera, et cetera, uh, And based on that, the human makes some decisions about what they're going to do going And this was based on what happened in the past and why it And based on those, we can make some models. And then finally, when you start to think about moving closer to the customer that I mentioned to you last year, sales were not a good predictor of next year All of these things get to be pretty important Uh, the reason that they are interested in this is because then we'll the manufacturer through to consumption. And one of the first things they need to do is they need to be able to observe the process so that they can find I mentioned to you in the last slide, first of all, the entire planet is the assortment that's available to them. Um, I'm embarrassed to say that when I was a retail executive in the nineties, One of the next slide, um, And in order to do that, you need to be able to number one, see it. So this is really, really critical for retailers to understand and successfully And it really plays to the effect that real-time can have And in IOT really is the next frontier, which is kind of the definition of fast So now, so now the paid for advertisement at the end of this, right? So you don't have to to Cloudera, we're already doing this today already, you know, been providing Um, that's just the world we all live in today. We also have experiences to help, you know, leverage the analytic capabilities And the other thing that Cloudera everyone joining the session and Brian, I'm going to kind of leave it open to you to, you know, any closing comments Um, and this is an exciting time to be in this industry. Yeah, of course, Brian, and one of the exciting things for me to not being in the industry, as long as I have and being to win or are you going to get beaten? Uh, we even have, you know, joint sponsored papers with RSR. And really just a quick level set before handing this over to my good friend, uh, Camille three soft, So inventory is now having to be spread over multiple channels. And that is the bottom line. in the recent time, we also understood that services is something which only to the right, uh, the good decisions for the business based it's really not, not that easy to base, only on the intuition and expert knowledge, sales forecast to be able to ask for, uh, what will be the actual sales In the same time we managed to not overflow the data lake, based on the cloud, there are technology, we implement our artificial intelligence This means our solution needs to be, uh, very robust. which means that you basically can implement that in your organization. but on the other hand, um, having that information in one place of sales is the low hanging fruit that can be easily numbers that the industry is having and, and most notably how I feel like based on what you talked about, um, And frankly it depends on the data that the customer, And my guess is a lot of the barriers that have been knocked down with your solution We just already in the company, we ask some external data if needed, but it's all Um, and especially in this space where you really need direct And the open box means that in every process that you will free at any, any point in time, obviously to reach out to me, um, again,

ENTITIES

Entity	Category	Confidence
Brian Hill	PERSON	0.99+
Brian	PERSON	0.99+
1997	DATE	0.99+
Steven Hegel	PERSON	0.99+
Brent	PERSON	0.99+
Europe	LOCATION	0.99+
Brent Bedell	PERSON	0.99+
California	LOCATION	0.99+
Camille	PERSON	0.99+
Poland	LOCATION	0.99+
10	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2%	QUANTITY	0.99+
two-minute	QUANTITY	0.99+
14 years	QUANTITY	0.99+
80%	QUANTITY	0.99+
20 degree	QUANTITY	0.99+
$371 billion	QUANTITY	0.99+
2020	DATE	0.99+
60%	QUANTITY	0.99+
18 years	QUANTITY	0.99+
15%	QUANTITY	0.99+
six months	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Camille Volker	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
63%	QUANTITY	0.99+
Southern Poland	LOCATION	0.99+
today	DATE	0.99+
NSA	ORGANIZATION	0.99+
100 person	QUANTITY	0.99+
two	QUANTITY	0.99+
last week	DATE	0.99+
20%	QUANTITY	0.99+
Vidia	ORGANIZATION	0.99+
70	QUANTITY	0.99+
2025	DATE	0.99+
once a month	QUANTITY	0.99+
$1.2 trillion	QUANTITY	0.99+
last year	DATE	0.99+
next year	DATE	0.99+
First	QUANTITY	0.99+
less than 10%	QUANTITY	0.99+
yesterday	DATE	0.99+
tens of millions	QUANTITY	0.99+
12 different industries	QUANTITY	0.99+
once a day	QUANTITY	0.99+
two sides	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
two drawings	QUANTITY	0.99+
one	QUANTITY	0.99+
once a week	QUANTITY	0.99+
tomorrow	DATE	0.99+
three	QUANTITY	0.99+
12	QUANTITY	0.98+
first	QUANTITY	0.98+
each	QUANTITY	0.98+
four key pillars	QUANTITY	0.98+
each store	QUANTITY	0.98+
COVID pandemic	EVENT	0.98+
twin	QUANTITY	0.98+

Manufacturing - Drive Transportation Efficiency and Sustainability with Big | Cloudera

>> Welcome to our industry drill down. This is for manufacturing. I'm here with Michael Ger, who is the managing director for automotive and manufacturing solutions at Cloudera. And in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data. Connected trucks are fundamental to optimizing fleet performance, costs, and delivering new services to fleet operators. And what's going to happen here is Michael's going to present some data and information, and we're going to come back and have a little conversation about what we just heard. Michael, great to see you! Over to you. >> Oh, thank you, Dave. And I appreciate having this conversation today. Hey, you know, this is actually an area, connected trucks, you know, this is an area that we have seen a lot of action here at Cloudera. And I think the reason is kind of important, right? Because you know, first of all, you can see that, you know, this change is happening very, very quickly, right? 150% growth is forecast by 2022 and the reasons, and I think this is why we're seeing a lot of action and a lot of growth, is that there are a lot of benefits, right? We're talking about a B2B type of situation here. So this is truck made, truck makers providing benefits to fleet operators. And if you look at the, the top fleet operator, the top benefits that fleet operators expect, you see this in, in the, in the graph over here, now almost 80% of them expect improved productivity, things like improved routing, right? So route efficiencies, improved customer service, decrease in fuel consumption, better better technology. This isn't technology for technology's sake, these connected trucks are coming onto the marketplace because, hey, it can provide tremendous value to the business. And in this case, we're talking about fleet operators and fleet efficiencies. So, you know, one of the things that's really important to be able to enable us, right, trucks are becoming connected because at the end of the day, we want to be able to provide fleet efficiencies through connected truck analytics and machine learning. Let me explain to you a little bit about what we mean by that, because what, you know, how this happens is by creating a connected vehicle, analytics, machine-learning life cycle, and to do that, you need to do a few different things, right? You start off, of course, with connected trucks in the field. And, you know, you could have many of these trucks because typically you're dealing at a truck level and at a fleet level, right? You want to be able to do analytics and machine learning to improve performance. So you start off with these trucks. And the first thing you need to be able to do is connect to those trucks, right? You have to have an intelligent edge where you can collect that information from the trucks. And by the way, once you collect the, this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now what I'm going to show you, the ability to take this real-time action, is actually the result of your machine-learning lifecycle. Let me explain to you what I mean by that. So we have these trucks, we start to collect data from it, right? At the end of the day what we'd like to be able to do is pull that data into either your data center or into the cloud, where we can start to do more advanced analytics. And we start with being able to ingest that data into the cloud, into that enterprise data lake. We store that data. We want to enrich it with other data sources. So for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've connected, collected from those trucks. And you want to augment that with your dealership, say, service information. Now you have, you know, you have sensor data and the resulting repair orders. You're now equipped to do things like predict when maintenance will work, all right. You've got all the data sets that you need to be able to do that. So what do you do? Like I said, you're ingested, you're storing, you're enriching it with data, right? You're processing that data. You're aligning, say, the sensor data to that transactional system data from your, from your your repair maintenance systems; you're, you're bringing it together so that you can do two things. You can do, first of all, you could do self-service BI on that data, right? You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to do create machine learning models. So if you have the sensor values and the need, for example, for, for a dealership repair, or is, you could start to correlate which sensor values predicted the need for maintenance, and you could build out those machine learning models. And then as I mentioned to you, you could push those machine learning models back out to the edge, which is how you would then take those real-time actions I mentioned earlier. As that data that then comes through in real-time, you're running it again against that model. And you can take some real-time actions. This is what we, this is this, this, this analytics and machine learning model, machine learning life cycle is exactly what Cloudera enables. This end-to-end ability to ingest data; store, you know, store it, put a query lay over it, create machine learning models, and then run those machine learning models in real time. Now that's what we, that's what we do as a business. Now one such customer, and I just want to give you one example of a customer that we have worked with to provide these types of results is Navistar. And Navistar was kind of an early, early adopter of connected truck analytics, and they provided these capabilities to their fleet operators, right? And they started off by, by, you know, connecting 475,000 trucks to up to well over a million now. And you know, the point here is that they were centralizing data from their telematics service providers, from their trucks' telematics service providers. They're bringing in things like weather data and all those types of things. And what they started to do was to build out machine learning models aimed at predictive maintenance. And what's really interesting is that you see that Navistar made tremendous strides in reducing the need, or the expense associated with maintenance, right? So rather than waiting for a truck to break and then fixing it, they would predict when that truck needs service, condition-based monitoring, and service it before it broke down, so that you can do that in a much more cost-effective manner. And if you see the benefits, right, they reduce maintenance costs 3 cents a mile from the, you know, down from the industry average of 15 cents a mile down to 12 cents cents a mile. So this was a tremendous success for Navistar. And we're seeing this across many of our, you know, truck manufacturers. We're, we're working with many of the truck OEMs, and they are all working to achieve very, very similar types of benefits to their customers. So just a little bit about Navistar. Now, we're going to turn to Q and A. Dave's got some questions for me in a second, but before we do that, if you want to learn more about our, how we work with connected vehicles and autonomous vehicles, please go to our web, to our website. What you see up, up on the screen. There's the URL. It's cloudera.com forward slash solutions, forward slash manufacturing. And you'll see a whole slew of collateral and information in much more detail in terms of how we connect trucks to fleet operators who provide analytics. Use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >> Thank you, Michael. That's a great example you've got. I love the life cycle. You can visualize that very well. You've got an edge use case you do in both real time inference, really, at the edge. And then you're blending that sensor data with other data sources to enrich your models. And you can push that back to the edge. That's that life cycle. So really appreciate that, that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into? >> Yeah, that's really, that's a great question, Dave, you know, cause, you know, everybody always thinks about machine learning like this is the first thing you go to. Well, actually it's not, right? For the first thing you really want to be able to go down, many of our customers are doing, is look, let's simply connect our trucks or our vehicles or whatever our IOT asset is, and then you can do very simple things like just performance monitoring of the, of the piece of equipment. In the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how is the, how is the driver performing? Is there a lot of idle time spent? You know, what's, what's route efficiency looking like? You know, by connecting the vehicles, right? You get insights, as I said, into the truck and into the driver and that's not machine learning even, right? But, but that, that monitoring piece is really, really important. So the first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, what I call the machine learning and AI models, where you're using inference on the edge. And then you start to see things like predictive maintenance happening, kind of route real-time, route optimization and things like that. And you start to see that evolution again, to those smarter, more intelligent dynamic types of decision-making. But let's not, let's not minimize the value of good old fashioned monitoring, that's to give you that kind of visibility first, then moving to smarter use cases as you, as you go forward. >> You know, it's interesting, I'm I'm envisioning, when you talked about the monitoring, I'm envisioning, you see the bumper sticker, you know, "How am I driving?" The only time somebody ever probably calls is when they get cut off it's and you know, I mean, people might think, "Oh, it's about big brother," but it's not. I mean, that's yeah okay, fine. But it's really about improvement and training and continuous improvement. And then of course the, the route optimization. I mean, that's, that's bottom line business value. So, so that's, I love those, those examples. >> Great! >> I wonder, I mean, what are the big hurdles that people should think about when they want to jump into those use cases that you just talked about, what are they going to run into? You know, the blind spots they're, they're going to, they're going to to get hit with. >> There's a few different things, right? So first of all, a lot of times your IT folks aren't familiar with the kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set, right? There's very specialized hardware in the car and things like, like that and protocols. That's number one. That's the classic IT OT kind of conundrum that, you know, many of our customers struggle with. But then, more fundamentally, is, you know, if you look at the way these types of connected truck or IOT solutions started, you know, oftentimes they were, the first generation were very custom built, right? So they were brittle, right? They were kind of hardwired. Then as you move towards more commercial solutions, you had what I call the silo problem, right? You had fragmentation in terms of this capability from this vendor, this capability from another vendor. You get the idea. You know, one of the things that we really think that we need that we, that needs to be brought to the table, is, first of all, having an end to end data management platform. It's kind of an integrated, it's all tested together, you have a data lineage across the entire stack. But then also importantly, to be realistic, we have to be able to integrate to industry kind of best practices as well in terms of solution components in the car, the hardware and all those types of things. So I think there's, you know, it's just stepping back for a second, I think that there is, has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of offerings. Our job as a software maker is to make that easier and connect those dots, so customers don't have to do it all on all on their own. >> And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about new types of hardware coming in. You guys are optimizing for that. We see the IT and the OT worlds blending together, no question. And then that end-to-end management piece, you know, this is different from, your right, from IT, normally everything's controlled, you're in the data center. And this is a metadata, you know, rethinking kind of how you manage metadata. So in the spirit of, of what we talked about earlier today, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >> Yeah, I'm really glad you're asking that, Dave, because we actually embarked on a product on a project called Project Fusion, which really was about integrating with, you know, when you look at that connected vehicle lifecycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Now Cloudera's piece of this was ingesting data and all the things I talked about in storing and the machine learning, right? And so we provide that end to end. But what we wanted to do is we wanted to partner with some key partners. And the partners that we did integrate with were NXP. NXP provides the service-oriented gateways in the car, so that's the hardware in the car. Wind River provides an in-car operating system. That's Linux, right? That's hardened and tested. We then ran ours, our, our Apache MiNiFi, which is part of Cloudera data flow, in the vehicle, right on that operating system, on that hardware. We pumped the data over into the cloud where we did the, all the data analytics and machine learning, and built out these very specialized models. And then we used a company called Airbiquity, once we built those models, to do, you know, they specialize in automotive over-the-air updates, right? So they can then take those models, and update those models back to the vehicle very rapidly. So what we said is, look, there's, there's an established, you know, ecosystem, if you will, of leaders in this space. What we wanted to do is make sure that Cloudera was part and parcel of this ecosystem. And by the way, you mentioned Nvidia as well. We're working close with Nvidia now. So when we're doing the machine learning, we can leverage some of their hardware to get some still further acceleration in the machine learning side of things. So yeah, you know, one of the things I, I, I always say about these types of use cases, it does take a village. And what we've really tried to do is build out that, that, that an ecosystem that provides that village so that we can speed that analytics and machine learning lifecycle just as fast as it can be. >> This is, again, another great example of data intensive workloads. It's not your, it's not your grandfather's ERP that's running on, you know, traditional, you know, systems, it's, these are really purpose built, maybe they're customizable for certain edge-use cases. They're low cost, low, low power. They can't be bloated. And you're right, it does take an ecosystem. You've got to have, you know, APIs that connect and, and that's that, that takes a lot of work and a lot of thought. So that, that leads me to the technologies that are sort of underpinning this. We've talked, we've talked a lot on The Cube about semiconductor technology, and now that's changing, and the advancements we're seeing there. What, what do you see as some of the key technology areas that are advancing this connected vehicle machine learning? >> You know, it's interesting, I'm seeing it in, in a few places, just a few notable ones. I think, first of all, you know, we see that the vehicle itself is getting smarter, right? So when you look at, we look at that NXP type of gateway that we talked about. That used to be kind of a, a dumb gateway that was, really all it was doing was pushing data up and down, and provided isolation as a gateway down to the, down from the lower level subsystems. So it was really security and just basic, you know, basic communication. That gateway now is becoming what they call a service oriented gateway. So it can run. It's not, it's got disc, it's got memory, it's got all this. So now you could run serious compute in the car, right? So now all of these things like running machine-learning inference models, you have a lot more power in the car. At the same time, 5G is making it so that you can push data fast enough, making low latency computing available, even on the cloud. So now, now you've got incredible compute both at the edge in the vehicle and on the cloud, right? And, you know, and then on the, you know, on the cloud, you've got partners like Nvidia, who are accelerating it still further through better GPU-based computing. So, I mean the whole stack, if you look at that, that machine learning life cycle we talked about, you know, Dave, it seems like there's improvements in every, in every step along the way, we're starting to see technology optim, optimization just pervasive throughout the cycle. >> And then, you know, real quick, it's not a quick topic, but you mentioned security. I mean, we've seen a whole new security model emerge. There is no perimeter anymore in this, in a use case like this is there? >> No, there isn't. And one of the things that we're, you know, remember we're the data management plat, platform, and the thing we have to provide is provide end-to-end, you know, end-to-end lineage of where that data came from, who can see it, you know, how it changed, right? And that's something that we have integrated into, from the beginning of when that data is ingested, through when it's stored, through when it's kind of processed and people are doing machine learning; we provide, we will provide that lineage so that, you know, that security and governance is assured throughout the, throughout that data learning life's level. >> And federated across, in this example, across the fleet, so. All right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it. >> Dave, thank you. And thanks for the audience for listening in today. >> Yes, thank you for watching. Keep it right there.

Published Date : Aug 3 2021

SUMMARY :

And in this first session, And the first thing you the use cases that you see For the first thing you really it's and you know, I that you just talked about, So I think there's, you know, And this is a metadata, you know, And by the way, you You've got to have, you and just basic, you know, And then, you know, real that lineage so that, you know, the time we have right now. And thanks for the audience Yes, thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Michael	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Michael Ger	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
12 cents	QUANTITY	0.99+
NXP	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Airbiquity	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
150%	QUANTITY	0.99+
475,000 trucks	QUANTITY	0.99+
2022	DATE	0.99+
first session	QUANTITY	0.99+
today	DATE	0.99+
two things	QUANTITY	0.99+
first generation	QUANTITY	0.98+
15 cents a mile	QUANTITY	0.98+
Wind River	ORGANIZATION	0.98+
Linux	TITLE	0.98+
cloudera.com	OTHER	0.98+
One	QUANTITY	0.98+
3 cents a mile	QUANTITY	0.98+
first thing	QUANTITY	0.97+
one example	QUANTITY	0.97+
both	QUANTITY	0.96+
one	QUANTITY	0.96+
almost 80%	QUANTITY	0.94+
Apache	ORGANIZATION	0.94+
cents a mile	QUANTITY	0.82+
over a million	QUANTITY	0.79+
earlier today	DATE	0.79+
Project	TITLE	0.75+
one such customer	QUANTITY	0.72+
Cube	ORGANIZATION	0.7+
a second	QUANTITY	0.69+
5G	ORGANIZATION	0.64+
well	QUANTITY	0.61+
MiNiFi	COMMERCIAL_ITEM	0.59+
second	QUANTITY	0.56+
up	QUANTITY	0.54+
Cloudera	TITLE	0.53+

Supercharge Your Business with Speed Rob Bearden - Joe Ansaldi | Cloudera 2021

>> Okay. We want to pick up on a couple of themes that Mick discussed, you know, supercharging your business with AI, for example, and this notion of getting hybrid right. So right now we're going to turn the program over to Rob Bearden, the CEO of Cloudera and Manuvir Das who's the head of enterprise computing at NVIDIA. And before I hand it off to Rob, I just want to say for those of you who follow me at the Cube, we've extensively covered the transformation of the semiconductor industry. We are entering an entirely new era of computing in the enterprise and it's being driven by the emergence of data intensive applications and workloads. No longer will conventional methods of processing data suffice to handle this work. Rather, we need new thinking around architectures and ecosystems. And one of the keys to success in this new era is collaboration between software companies like Cloudera and semiconductor designers like NVIDIA. So let's learn more about this collaboration and what it means to your data business. Rob, take it away. >> Thanks Mick and Dave. That was a great conversation on how speed and agility is everything in a hyper competitive hybrid world. You touched on AI as essential to a data first strategy in accelerating the path to value and hybrid environments. And I want to drill down on this aspect. Today, every business is facing accelerating change. Everything from face-to-face meetings to buying groceries has gone digital. As a result, businesses are generating more data than ever. There are more digital transactions to track and monitor now. Every engagement with coworkers, customers and partners is virtual. From website metrics to customer service records and even onsite sensors. Enterprises are accumulating tremendous amounts of data and unlocking insights from it is key to our enterprises success. And with data flooding every enterprise, what should the businesses do? At Cloudera, we believe this onslaught of data offers an opportunity to make better business decisions faster and we want to make that easier for everyone, whether it's fraud detection, demand forecasting, preventative maintenance, or customer churn. Whether the goal is to save money or produce income, every day that companies don't gain deep insight from their data is money they've lost. And the reason we're talking about speed and why speed is everything in a hybrid world and in a hyper competitive climate, is that the faster we get insights from all of our data, the faster we grow and the more competitive we are. So those faster insights are also combined with the scalability and cost benefit that cloud provides. And with security and edge to AI data intimacy, that's why the partnership between Cloudera and NVIDIA together means so much. And it starts with a shared vision, making data-driven decision-making a reality for every business. And our customers will now be able to leverage virtually unlimited quantities and varieties of data to power an order of magnitude faster decision-making. And together we turbo charged the enterprise data cloud to enable our customers to work faster and better, and to make integration of AI approaches a reality for companies of all sizes in the cloud. We're joined today by NVIDIA's Manduvir Das, and to talk more about how our technologies will deliver the speed companies need for innovation in our hyper competitive environment. Okay, Manuvir, thank you for joining us. Over to you now. >> Thank you Rob, for having me. It's a pleasure to be here on behalf of NVIDIA. We're so excited about this partnership with Cloudera. You know, when, when NVIDIA started many years ago, we started as a chip company focused on graphics. But as you know, over the last decade, we've really become a full stack, accelerated computing company where we've been using the power of GPU hardware and software to accelerate a variety of workloads, AI being a prime example. And when we think about Cloudera, and your company, your great company, there's three things we see Rob. The first one is that for the companies that were already transforming themselves by the use of data, Cloudera has been a trusted partner for them. The second thing we've seen is that when it comes to using your data, you want to use it in a variety of ways with a powerful platform, which of course you have built over time. And finally, as we've heard already, you believe in the power of hybrid, that data exists in different places and the compute needs to follow the data. Now, if you think about NVIDIA's mission going forward to democratize accelerated computing for all companies, our mission actually aligns very well with exactly those three things. Firstly, you know, we've really worked with a variety of companies to date who have been the early adopters using the power acceleration by changing their technology and their stacks. But more and more we see the opportunity of meeting customers where they are with tools that they're familiar with, with partners that they trust. And of course, Cloudera being a great example of that. The second part of NVIDIA's mission is we focused a lot in the beginning on deep learning where the power of GPU is really shown through. But as we've gone forward, we found that GPU's can accelerate a variety of different workloads from machine learning to inference. And so again, the power of your platform is very appealing. And finally, we know that AI is all about data, more and more data. We believe very strongly in the idea that customers put their data, where they need to put it. And the compute, the AI compute, the machine learning compute, needs to meet the customer where their data is. And so that matches really well with your philosophy, right? And, and Rob, that's why we were so excited to do this partnership with you. It's come to fruition. We have a great combined stack now for the customer and we already see people using it. I think the IRS is a fantastic example where, literally, they took the workflow they had, they took the servers they had, they added GPUs into those servers. They did not change anything. And they got an eight times performance improvement for their fraud detection workflows, right? And that's the kind of success we're looking forward to with all customers. So the team has actually put together a great video to show us what the IRS is doing with this technology. Let's take a look. >> How you doing? My name's Joe Ansaldi. I'm the branch chief of the technical branch in RAS. It's actually the research division, research and statistical division of the IRS. Basically, the mission that RAS has is we do statistical and research on all things related to taxes, compliance issues, fraud issues, you know, anything that you can think of basically, we do research on that. We're running into issues now that we have a lot of ideas to actually do data mining on our big troves of data, but we don't necessarily have the infrastructure or horsepower to do it. So our biggest challenge is definitely the, the infrastructure to support all the ideas that the subject matter experts are coming up with in terms of all the algorithms they would like to create. And the diving deeper within the algorithm space, the actual training of those algorithms, the number of parameters each of those algorithms have. So that's, that's really been our challenge now. The expectation was that with NVIDIA and Cloudera's help and with the cluster, we actually build out to test this on the actual fraud detection algorithm. Our expectation was we were definitely going to see some speed up in computational processing times. And just to give you context, the size of the data set that we were, the SME was actually working her algorithm against was around four terabytes. If I recall correctly, we had a 22 to 48 times speed up after we started tweaking the original algorithm. My expectations, quite honestly, in that sphere, in terms of the timeframe to get results, was it that you guys actually exceeded them. It was really, really quick. The definite now term, short term, what's next is going to be the subject matter expert is actually going to take our algorithm run with that. So that's definitely the now term thing we want to do. Going down, go looking forward, maybe out a couple of months, we're also looking at procuring some A-100 cards to actually test those out. As you guys can guess, our datasets are just getting bigger and bigger and bigger, and it demands to actually do something when we get more value added out of those data sets is just putting more and more demands on our infrastructure. So, you know, with the pilot, now we have an idea with the infrastructure, the infrastructure we need going forward and then also just our in terms of thinking of the algorithms and how we can approach these problems to actually code out solutions to them. Now we're kind of like the shackles are off and we can just run a, you know, run to our heart's desire, wherever our imaginations takes our SMEs to actually develop solutions. Now have the platforms to run them on. Just kind of to close out, we really would be remiss, I've worked with a lot of companies through the year and most of them been spectacular. And you guys are definitely in that category, the whole partnership, as I said, a little bit early, it was really, really well, very responsive. I would be remiss if I didn't thank you guys. So thank you for the opportunity. Doing fantastic. and I'd have to also, I want to thank my guys. my staff, Raul, David worked on this, Richie worked on this, Lex and Tony just, they did a fantastic job and I want to publicly thank them for all the work they did with you guys and Chev, obviously also is fantastic. So thank you everyone. >> Okay. That's a real great example of speed and action. Now let's get into some follow up questions guys, if I may, Rob, can you talk about the specific nature of the relationship between Cloudera and NVIDIA? Is it primarily go to market or are you doing engineering work? What's the story there? >> It's really both. It's both go to market and engineering The engineering focus is to optimize and take advantage of NVIDIA's platform to drive better price performance, lower cost, faster speeds, and better support for today's emerging data intensive applications. So it's really both. >> Great. Thank you. Manuvir, maybe you could talk a little bit more about why can't we just use existing general purpose platforms that are, that are running all this ERP and CRM and HCM and you know, all the, all the Microsoft apps that are out there. What, what do NVIDIA and Cloudera bring to the table that goes beyond the conventional systems that we've known for many years? >> Yeah. I think Dave, as we've talked about the asset that the customer has is really the data, right? And the same data can be utilized in many different ways. Some machine learning, some AI, some traditional data analytics. So, the first step here was really to take a general platform for data processing, Cloudera data platform, and integrate with that. Now NVIDIA has a software stack called rapids, which has all of the primitives that make different kinds of data processing go fast on GPU's. And so the integration here has really been taking rapids and integrating it into a Cloudera data platform so that regardless of the technique the customer is using to get insight from the data, the acceleration will apply in all cases. And that's why it was important to start with a platform like Cloudera rather than a specific application. >> So, I think this is really important because if you think about, you know, the software defined data center brought in, you know, some great efficiencies, but at the same time, a lot of the compute power is now going towards doing things like networking and storage and security offloads. So the good news, the reason this is important is because when you think about these data intensive workloads, we can now put more processing power to work for those, you know, AI intensive things. And so that's what I want to talk about a little bit, maybe a question for both of you, maybe Rob, you could start. You think about AI that's done today in the enterprise. A lot of it is modeling in the cloud, but when we look at a lot of the exciting use cases, bringing real-time systems together, transaction systems and analytics systems, and real-time AI inference, at least even at the edge, huge potential for business value. In a consumer, you're seeing a lot of applications with AI biometrics and voice recognition and autonomous vehicles and the liking. So you're putting AI into these data intensive apps within the enterprise. The potential there is enormous. So what can we learn from sort of where we've come from, maybe these consumer examples and Rob, how are you thinking about enterprise AI in the coming years? >> Yeah, you're right. The opportunity is huge here, but you know, 90% of the cost of AI applications is the inference. And it's been a blocker in terms of adoption because it's just been too expensive and difficult from a performance standpoint. And new platforms like these being developed by Cloudera and NVIDIA will dramatically lower the cost of enabling this type of workload to be done. And what we're going to see the most improvements will be in the speed and accuracy for existing enterprise AI apps like fraud detection, recommendation engine, supply chain management, drug province. And increasingly the consumer led technologies will be bleeding into the enterprise in the form of autonomous factory operations. An example of that would be robots. That AR, VR and manufacturing so driving better quality. The power grid management, automated retail, IOT, you know, the intelligent call centers, all of these will be powered by AI, but really the list of potential use cases now are going to be virtually endless. >> I mean, Manufir, this is like your wheelhouse. Maybe you could add something to that. >> Yeah. I mean, I agree with Rob. I mean he listed some really good use cases, you know, The way we see this at NVIDIA, this journey is in three phases or three steps, right? The first phase was for the early adopters. You know, the builders who assembled use cases, particular use cases like a chat bot from the ground up with the hardware and the software. Almost like going to your local hardware store and buying piece parts and constructing a table yourself right now. Now, I think we are in the first phase of the democratization. For example, the work we do with Cloudera, which is for a broader base of customers, still building for a particular use case, but starting from a much higher baseline. So think about, for example, going to Ikea now and buying a table in a box, right. And you still come home and assemble it, but all the parts are there, the instructions are there, there's a recipe you just follow and it's easy to do, right? So that's sort of the phase we're in now. And then going forward, the opportunity we really look forward to for the democratization, you talked about applications like CRM, et cetera. I think the next wave of democratization is when customers just adopt and deploy the next version of an application they already have. And what's happening is that under the covers, the application is infused by AI and it's become more intelligent because of AI and the customer just thinks they went to the store and bought a table and it showed up and somebody placed it in the right spot. Right? And they didn't really have to learn how to do AI. So these are the phases. And I think we're very excited to be going there. >> You know, Rob, the great thing about, for your customers is they don't have to build out the AI. They can, they can buy it. And just in thinking about this, it seems like there are a lot of really great and even sometimes narrow use cases. So I want to ask you, you know, staying with AI for a minute, one of the frustrations, and Mick I talked about this, the GIGO problem that we've all, you know, studied in college, you know, garbage in, garbage out. But, but the frustrations that users have had is really getting fast access to quality data that they can use to drive business results. So do you see, and how do you see AI maybe changing the game in that regard, Rob, over the next several years? >> So yeah, the combination of massive amounts of data that had been gathered across the enterprise in the past 10 years with an open APIs are dramatically lowering the processing costs that perform at much greater speed and efficiency. And that's allowing us as an industry to democratize the data access while at the same time delivering the federated governance and security models. And hybrid technologies are playing a key role in making this a reality and enabling data access to be quote, hybridized, meaning access and treated in a substantially similar way, irrespective of the physical location of where that data actually resides. >> And that's great. That is really the value layer that you guys are building out on top of all this great infrastructure that the hyperscalers have have given us. You know, a hundred billion dollars a year that you can build value on top of, for your customers. Last question, and maybe Rob, you could, you could go first and then Manuvir, you could bring us home. Where do you guys want to see the relationship go between Cloudera and NVIDIA? In other words, how should we as outside observers be, be thinking about and measuring your project, specifically in the industry's progress generally? >> Yes. I think we're very aligned on this and for Cloudera, it's all about helping companies move forward, leverage every bit of their data and all the places that it may be hosted and partnering with our customers, working closely with our technology ecosystem of partners, means innovation in every industry and that's inspiring for us. And that's what keeps us moving forward. >> Yeah and I agree with Rob and for us at NVIDIA, you know, we, this partnership started with data analytics. As you know, Spark is a very powerful technology for data analytics. People who use Spark rely on Cloudera for that. And the first thing we did together was to really accelerate Spark in a seamless manner. But we're accelerating machine learning. We're accelerating artificial intelligence together. And I think for NVIDIA it's about democratization. We've seen what machine learning and AI have done for the early adopters and help them make their businesses, their products, their customer experience better. And we'd like every company to have the same opportunity.

Published Date : Aug 2 2021

SUMMARY :

And one of the keys to is that the faster we get and the compute needs to follow the data. Now have the platforms to run them on. of the relationship between The engineering focus is to optimize and you know, all the, And so the integration here a lot of the compute power And increasingly the Maybe you could add something to that. from the ground up with the the GIGO problem that we've all, you know, irrespective of the physical location that the hyperscalers have have given us. and all the places that it may be hosted And the first thing we did

ENTITIES

Entity	Category	Confidence
NVIDIA	ORGANIZATION	0.99+
Mick	PERSON	0.99+
Rob Bearden	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Rob	PERSON	0.99+
22	QUANTITY	0.99+
Raul	PERSON	0.99+
Joe Ansaldi	PERSON	0.99+
90%	QUANTITY	0.99+
Richie	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
RAS	ORGANIZATION	0.99+
Lex	PERSON	0.99+
second	QUANTITY	0.99+
Ikea	ORGANIZATION	0.99+
Tony	PERSON	0.99+
first phase	QUANTITY	0.99+
IRS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
eight times	QUANTITY	0.99+
48 times	QUANTITY	0.99+
second thing	QUANTITY	0.99+
Chev	PERSON	0.99+
Firstly	QUANTITY	0.98+
three steps	QUANTITY	0.98+
Today	DATE	0.98+
one	QUANTITY	0.98+
three things	QUANTITY	0.97+
today	DATE	0.97+
first	QUANTITY	0.96+
three phases	QUANTITY	0.95+
Manuvir	ORGANIZATION	0.95+
first one	QUANTITY	0.95+
Manuvir	PERSON	0.95+
Cloudera	TITLE	0.93+
around four terabytes	QUANTITY	0.93+
first strategy	QUANTITY	0.92+
each	QUANTITY	0.91+
last decade	DATE	0.89+
years ago	DATE	0.89+
Spark	TITLE	0.89+
SME	ORGANIZATION	0.88+
Manuvir Das	PERSON	0.88+

Speed Ideas into Insight Mick Hollison | Cloudera 2021

(upbeat music) >> Welcome to transforming ideas into insights, presented with theCUBE and made possible by Cloudera. My name is Dave Vellante from theCUBE and I'll be your host for today. In the next hundred minutes, you're going to hear how to turn your best ideas into action using data and we're going to share the real-world examples of 12 industry use cases that apply modern data techniques to improve customer experience, reduce fraud, drive manufacturing efficiencies, better forecast retail demand, transform analytics, improve public sector service and so much more. How we use data is rapidly evolving. That is the language that we use to describe data. I mean, for example, we don't really use the term big data as often as we used to, rather we use terms like, digital transformation and digital business. But you think about it. What is a digital business? How is that different from just a business? Well, a digital business is a data business and it differentiates itself by the way it uses data to compete. So whether we call it data, big data or digital, our belief is we're entering the next decade of a world that puts data at the core of our organizations. And as such, the way we use insights is also rapidly evolving. You know, of course we get value from enabling humans to act with confidence on let's call it near perfect information or capitalize on non-intuitive findings, but increasingly insights are leading to the development of data products and services that can be monetized. Or as you'll hear in our industry examples, data is enabling machines to take cognitive actions on our behalf. Examples are everywhere in the forms of apps and products and services, all built on data. Think about a real time fraud detection, know your customer and finance, personal health apps that monitor our heart rates. Self-service investing, filing insurance claims on our smart phones and so many examples. IOT systems that communicate and act machine to machine. Real-time pricing actions, these are all examples of products and services that drive revenue, cut costs or create other value and they all rely on data. Now, while many business leaders sometimes express frustration that their investments in data, people and process and technologies haven't delivered the full results they desire. The truth is that the investments that they've made over the past several years should be thought of as a step on the data journey. Key learnings and expertise from these efforts are now part of the organizational DNA that can catapult us into this next era of data transformation and leadership. One thing is certain, the next 10 years of data and digital transformation won't be like the last 10. So let's into it. Please join us in the chat. You can ask questions. You can share your comments. Hit us up on Twitter. Right now, it's my pleasure to welcome Mick Holliston and he's the president of Cloudera. Mick, great to see you. >> Great to see you as well, Dave. >> Hey, so I call it the new abnormal, right? The world is kind of out of whack. Offices are reopening again. We're seeing travel coming back. There's all this pent up demand for cars and vacations, line cooks at restaurants. Everything that we consumers have missed, but here's the one thing, it seems like the algorithms are off. Whether it's retail's fulfillment capabilities, airline scheduling, their pricing algorithms, commodity prices, we don't know. Is inflation transitory? Is it a long-term threat, trying to forecast GDP? It seems like we have to reset all of our assumptions and Mick, I feel a quality data is going to be a key here. How do you see the current state of the industry in the role data plays to get us into a more predictable and stable future? >> Well, I can sure tell you this, Dave, out of whack is definitely right. I don't know if you know or not, but I happened to be coming to you live today from Atlanta and as a native of Atlanta, I can tell you there's a lot to be known about the airport here. It's often said that whether you're going to heaven or hell, you got to change planes in Atlanta and after 40 minutes waiting on an algorithm to be right for baggage claim last night, I finally managed to get some bag and to be able to show up, dressed appropriately for you today. Here's one thing that I know for sure though, Dave. Clean, consistent and safe data will be essential to getting the world and businesses as we know it back on track again. Without well-managed data, we're certain to get very inconsistent outcomes. Quality data will be the normalizing factor because one thing really hasn't changed about computing since the dawn of time, back when I was taking computer classes at Georgia Tech here in Atlanta and that's what we used to refer to as garbage in, garbage out. In other words, you'll never get quality data-driven insights from a poor dataset. This is especially important today for machine learning and AI. You can build the most amazing models and algorithms, but none of it will matter if the underlying data isn't rock solid. As AI is increasingly used in every business app, you must build a solid data foundation. >> Mick, let's talk about hybrid. Every CXO that I talked to, they're trying to get hybrid right. Whether it's hybrid work, hybrid events, which is our business, hybrid cloud. How are you thinking about the hybrid everything, what's your point of view? >> With all those prescriptions of hybrid and everything, there was one item you might not have quite hit on, Dave and that's hybrid data. >> Oh yeah, you're right, Mick, I did miss that. What do you mean by hybrid data? >> Well, Dave, in Cloudera, we think hybrid data is all about the juxtaposition of two things, freedom and security. Now, every business wants to be more agile. They want the freedom to work with their data, wherever it happens to work best for them, whether that's on premises, in a private cloud, in public cloud or perhaps even in a new open data exchange. Now, this matters to businesses because not all data applications are created equal. Some apps are best suited to be run in the cloud because of their transitory nature. Others may be more economical if they're running a private cloud. But either way, security, regulatory compliance and increasingly data sovereignty are playing a bigger and more important role in every industry. If you don't believe me, just watch or read a recent news story. Data breaches are at an all time high and the ethics of AI applications are being called into question everyday. And understanding lineage of machine learning algorithms is now paramount for every business. So how in the heck do you get both the freedom and security that you're looking for? Well, the answer is actually pretty straightforward. The key is developing a hybrid data strategy. And what do you know, Dave, that's the business Cloudera is in. On a serious note, from Cloudera's perspective, adopting a hybrid data strategy is central to every business' digital transformation. It will enable rapid adoption of new technologies and optimize economic models, while ensuring the security and privacy of every bit of data. >> Okay, Mick, I'm glad you brought in that notion of hybrid data because when you think about things, especially remote work, it really changes a lot of the assumptions. You talked about security, the data flows are going to change. You got the economics, the physics, the local laws come into play, so what about the rest of hybrid? >> Yeah, that's a great question, Dave and certainly, Cloudera itself as a business and all of our customers are feeling this in a big way. We now have the overwhelming majority of our workforce working from home. And in other words, we've got a much larger surface area from a security perspective to keep in mind, the rate and pace of data, just generating a report that might've happened very quickly and rapidly on the office ethernet may not be happening quite so fast in somebody's rural home in the middle of Nebraska somewhere. So it doesn't really matter whether you're talking about the speed of business or securing data, any way you look at it, hybrid I think is going to play a more important role in how work is conducted and what percentage of people are working in the office and are not, I know our plans, Dave, involve us kind of slowly coming back to work, beginning this fall. And we're looking forward to being able to shake hands and see one another again for the first time, in many cases, for more than a year and a half. But yes, hybrid work and hybrid data are playing an increasingly important role for every kind of business. >> Thanks for that. I wonder if we could talk about industry transformation for a moment because it's a major theme of course, of this event. So, here's how I think about it. I mean, some industries have transformed. You think about retail, for example, it's pretty clear. Although, every physical retail brand I know has not only beefed up its online presence, but they also have an Amazon war room strategy because they're trying to take greater advantage of that physical presence. And reverse, we see Amazon building out physical assets, so there's more hybrid going on. But when you look at healthcare, for example, it's just starting with such highly regulated industry. It seems that there's some hurdles there. Financial services is always been data savvy, but you're seeing the emergence of FinTech and some other challenges there in terms of control of payment systems. In manufacturing, the pandemic highlighted, America's reliance on China as a manufacturing partner and supply chain. And so my point is, it seems at different industries, they're in different stages of transformation, but two things look really clear. One, you got to put data at the core of the business model, that's compulsory. It seems like embedding AI into the applications, the data, the business process, that's going to become increasingly important. So how do you see that? >> Wow, there's a lot packed into that question there, Dave. But yeah, at Cloudera, I happened to be leading our own digital transformation as a technology company and what I would tell you there that's been an interesting process. The shift from being largely a subscription-based model to a consumption-based model requires a completely different level of instrumentation in our products and data collection that takes place in real-time, both for billing for our customers and to be able to check on the health and wellness, if you will, of their Cloudera implementations. But it's clearly not just impacting the technology industry. You mentioned healthcare and we've been helping a number of different organizations in the life sciences realm, either speed the rate and pace of getting vaccines to market or we've been assisting with testing process that's taken place. Because you can imagine the quantity of data that's been generated as we've tried to study the efficacy of these vaccines on millions of people and try to ensure that they were going to deliver great outcomes and healthy and safe outcomes for everyone. And Cloudera has been underneath a great deal of that type of work. And the financial services industry you pointed out, we continue to be central to the large banks, meeting their compliance and regulatory requirements around the globe. And in many parts of the world, those are becoming more stringent than ever. And Cloudera solutions are helping those kinds of organizations get through those difficult challenges. You also happened to mention public sector and in public sector, we're also playing a key role in working with government entities around the world and applying AI to some of the most challenging missions that those organizations face. And while I've made the kind of pivot between the industry conversation and the AI conversation, what I'll share with you about AI, I touched upon a little bit earlier. You can't build great AI, you can't build great ML apps unless you've got a strong data foundation underneath. It's back to that garbage in, garbage out comment that I made previously. And so, in order to do that, you've got to have a great hybrid data management platform at your disposal to ensure that your data is clean and organized and up to date. Just as importantly from that, that's kind of the freedom side of things. On the security side of things, you've got to ensure that you can see who's touched not just the data itself, Dave, but actually the machine learning models. And organizations around the globe are now being challenged. It's kind of on the topic of the ethics of AI to produce model lineage in addition to data lineage. In other words, who's had access to the machine learning models? When and where and at what time and what decisions were made perhaps, by the humans, perhaps by the machines that may have led to a particular outcome? So, every kind of business that is deploying AI applications should be thinking long and hard about whether or not they can track the full lineage of those machine learning models, just as they can track the lineage of data. So, lots going on there across industries. Lots going on as those various industries think about how AI can be applied to their businesses. >> It's a pretty interesting concept you're bringing into the discussion, the hybrid data, sort of, I think new to a lot of people. And this idea of model lineage is a great point because people want to talk about AI ethics, transparency of AI. When you start putting those models into machines to do real-time inferencing at the edge, it starts to get really complicated. I wonder if we could talk, we're still on that theme of industry transformation. I felt like coming into the pre-pandemic, there was just a lot of complacency. Yeah, digital transformation and a lot of buzz words and then we had this forced march to digital, but people are now being more planful, but there's still a lot of sort of POC limbo going on. How do you see that? Can you help accelerate that and get people out of that state? >> There definitely is a lot of a POC limbo or I think some of us internally have referred to as POC purgatory, just getting in that phase, not being able to get from point A to point B in digital transformation. And for every industry, transformation, change in general, is difficult and it takes time and money and thoughtfulness. But like with all things, what we've found is small wins work best and done quickly. So trying to get to quick, easy successes where you can identify a clear goal and a clear objective and then accomplish it in rapid fashion is sort of the way to build your way towards those larger transformative efforts. To say it another way, Dave, it's not wise to try to boil the ocean with your digital transformation efforts, as it relates to the underlying technology here and to bring it home a little bit more practically, I guess I would say. At Cloudera, we tend to recommend that companies begin to adopt cloud infrastructure, for example, containerization. And they begin to deploy that on-prem and then they start to look at how they may move those containerized workloads into the public cloud. That'll give them an opportunity to work with the data and the underlying applications themselves, right close to home. In place, they can kind of experiment a little bit more safely and economically and then determine which workloads are best suited for the public cloud and which ones should remain on prem. That's a way in which a hybrid data strategy can help get a digital transformation accomplished, but kind of starting small and then drawing fast from there on customer's journey to the cloud. >> Well Mick, we've covered a lot of ground. Last question, what do you want people to leave this event, this session with and thinking about sort of the next era of data that we're entering? >> Well, it's a great question, but I think it could be summed up in two words. I want them to think about a hybrid data strategy. So, really hybrid data is a concept that we're bringing forward on this show really, for the first time, arguably. And we really do think that it enables customers to experience what we refer to, Dave, as the power of ANT. That is freedom and security and in a world where we're all still trying to decide whether each day when we walk out, each building we walk into, whether we're free to come in and out with a mask, without a mask, that sort of thing, we all want freedom, but we also also want to be safe and feel safe for ourselves and for others. And the same is true of organization's IT strategies. They want the freedom to choose, to run workloads and applications in the best and most economical place possible, but they also want to do that with certainty that they're going to be able to deploy those applications in a safe and secure way that meets the regulatory requirements of their particular industry. So, hybrid data we think is key to accomplishing both freedom and security for your data and for your business as a whole. >> Nick, thanks so much, great conversation. I really appreciate the insights that you're bringing to this event, into the industry, really. Thank you for your time. >> You bet, Dave, pleasure being with you.

Published Date : Aug 2 2021

SUMMARY :

and it differentiates itself by the way in the role data plays to get to you live today from Atlanta the hybrid everything, Dave and that's hybrid data. What do you mean by hybrid data? So how in the heck do you get of the assumptions. and rapidly on the office ethernet of the business model, that's compulsory. and to be able to check on I felt like coming into the pre-pandemic, and the underlying about sort of the next era and applications in the best I really appreciate the

ENTITIES

Entity	Category	Confidence
Mick Holliston	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Atlanta	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Mick	PERSON	0.99+
Mick Hollison	PERSON	0.99+
Nebraska	LOCATION	0.99+
two words	QUANTITY	0.99+
Nick	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
today	DATE	0.99+
each day	QUANTITY	0.99+
more than a year and a half	QUANTITY	0.99+
two things	QUANTITY	0.99+
Georgia Tech	ORGANIZATION	0.99+
one item	QUANTITY	0.99+
last night	DATE	0.98+
millions of people	QUANTITY	0.98+
40 minutes	QUANTITY	0.98+
both	QUANTITY	0.97+
One	QUANTITY	0.97+
12 industry use cases	QUANTITY	0.96+
this fall	DATE	0.96+
each building	QUANTITY	0.96+
one thing	QUANTITY	0.95+
Twitter	ORGANIZATION	0.89+
ANT	ORGANIZATION	0.88+
next decade	DATE	0.86+
pandemic	EVENT	0.84+
hundred	QUANTITY	0.83+
America	LOCATION	0.74+
One thing	QUANTITY	0.73+
10 years	DATE	0.71+
China	LOCATION	0.69+
2021	DATE	0.67+
theCUBE	ORGANIZATION	0.64+
10	QUANTITY	0.58+
minutes	DATE	0.52+
years	DATE	0.5+

Enable an Insights Driven Business Michele Goetz, Cindy Maike | Cloudera 2021

>> Okay, we continue now with the theme of turning ideas into insights so ultimately you can take action. We heard earlier that public cloud first doesn't mean public cloud only. And a winning strategy comprises data, irrespective of physical location on prem, across multiple clouds at the edge where real-time inference is going to drive a lot of incremental value. Data is going to help the world come back to normal we heard, or at least semi normal as we begin to better understand and forecast demand and supply imbalances and economic forces. AI is becoming embedded into every aspect of our business, our people, our processings, and applications. And now we're going to get into some of the foundational principles that support the data and insights centric processes, which are fundamental to digital transformation initiatives. And it's my pleasure to welcome two great guests, Michelle Goetz, who's a Cube alum and VP and principal analyst at Forrester, and doin' some groundbreaking work in this area. And Cindy Maike who is the vice president of industry solutions and value management at Cloudera. Welcome to both of you. >> Welcome, thank you. >> Thanks Dave. >> All right Michelle, let's get into it. Maybe you could talk about your foundational core principles. You start with data. What are the important aspects of this first principle that are achievable today? >> It's really about democratization. If you can't make your data accessible, it's not usable. Nobody's able to understand what's happening in the business and they don't understand what insights can be gained or what are the signals that are occurring that are going to help them with decisions, create stronger value or create deeper relationships with their customers due to their experiences. So it really begins with how do you make data available and bring it to where the consumer of the data is rather than trying to hunt and peck around within your ecosystem to find what it is that's important. >> Great thank you for that. So, Cindy, I wonder in hearing what Michelle just said, what are your thoughts on this? And when you work with customers at Cloudera, are there any that stand out that perhaps embody the fundamentals that Michelle just shared? >> Yeah, there's quite a few. And especially as we look across all the industries that were actually working with customers in. A few that stand out in top of mind for me is one is IQVIA. And what they're doing with real-world evidence and bringing together data across the entire healthcare and life sciences ecosystems, bringing it together in different shapes and formats, making it accessible by both internally, as well as for the entire extended ecosystem. And then for SIA, who's working to solve some predictive maintenance issues within, they're are a European car manufacturer and how do they make sure that they have efficient and effective processes when it comes to fixing equipment and so forth. And then also there's an Indonesian based telecommunications company, Techsomel, who's bringing together over the last five years, all their data about their customers and how do they enhance a customer experience, how do they make information accessible, especially in these pandemic and post pandemic times. Just getting better insights into what customers need and when do they need it? >> Cindy, platform is another core principle. How should we be thinking about data platforms in this day and age? Where do things like hybrid fit in? What's Cloudera's point of view here? >> Platforms are truly an enabler. And data needs to be accessible in many different fashions, and also what's right for the business. When I want it in a cost and efficient and effective manner. So, data resides everywhere, data is developed and it's brought together. So you need to be able to balance both real time, our batch, historical information. It all depends upon what your analytical workloads are and what types of analytical methods you're going to use to drive those business insights. So putting in placing data, landing it, making it accessible, analyzing it, needs to be done in any accessible platform, whether it be a public cloud doing it on-prem or a hybrid of the two is typically what we're seeing being the most successful. >> Great, thank you. Michelle let's move on a little bit and talk about practices and processes, the next core principles. Maybe you could provide some insight as to how you think about balancing practices and processes while at the same time managing agility. >> Yeah, it's a really great question 'cause it's pretty complex when you have to start to connect your data to your business. The first thing to really gravitate towards is what are you trying to do. And what Cindy was describing with those customer examples is that they're all based off of business goals, off of very specific use cases. That helps kind of set the agenda about what is the data and what are the data domains that are important to really understanding and recognizing what's happening within that business activity and the way that you can affect that either in near time or real time, or later on, as you're doing your strategic planning. What that's balancing against is also being able to not only see how that business is evolving, but also be able to go back and say, "Well, can I also measure the outcomes from those processes and using data and using insight? Can I also get intelligence about the data to know that it's actually satisfying my objectives to influence my customers in my market? Or is there some sort of data drift or detraction in my analytic capabilities that are allowing me to be effective in those environments?" But everything else revolves around that and really thinking succinctly about a strategy that isn't just data aware, what data do I have and how do I use it? But coming in more from that business perspective, to then start to be data driven, recognizing that every activity you do from a business perspective leads to thinking about information that supports that and supports your decisions. And ultimately getting to the point of being insight driven, where you're able to both describe what you want your business to be with your data, using analytics to then execute on that fluidly and in real time. And then ultimately bringing that back with linking to business outcomes and doing that in a continuous cycle where you can test and you can learn, you can improve, you can optimize and you can innovate. Because you can see your business as it's happening. And you have the right signals and intelligence that allow you to make great decisions. >> I like how you said near time or real time, because it is a spectrum. And at one end of the spectrum, autonomous vehicles. You've got to make a decision in real time but near real-time, or real-time, it's in the eyes of the beholder if you will. It might be before you lose the customer or before the market changes. So it's really defined on a case by case basis. I wonder Michelle, if you could talk about in working with a number of organizations I see folks, they sometimes get twisted up in understanding the dependencies that technology generally, and the technologies around data specifically can sometimes have on critical business processes. Can you maybe give some guidance as to where customers should start? Where can we find some of the quick wins and high returns? >> It comes first down to how does your business operate? So you're going yo take a look at the business processes and value stream itself. And if you can understand how people, and customers, partners, and automation are driving that step by step approach to your business activities, to realize those business outcomes, it's way easier to start thinking about what is the information necessary to see that particular step in the process, and then take the next step of saying what information is necessary to make a decision at that current point in the process? Or are you collecting information, asking for information that is going to help satisfy a downstream process step or a downstream decision? So constantly making sure that you are mapping out your business processes and activities, aligning your data process to that helps you now rationalize do you need that real time, near real time, or do you want to start creating greater consistency by bringing all of those signals together in a centralized area to eventually oversee the entire operations and outcomes as they happen? It's the process, and the decision points, and acting on those decision points for the best outcome that really determines are you going to move in more of a real-time streaming capacity, or are you going to push back into more of a batch oriented approach? Because it depends on the amount of information and the aggregate of which provides the best insight from that. >> Got it. Let's, bring Cindy back into the conversation here. Cindy, we often talk about people, process, and technology and the roles they play in creating a data strategy that's logical and sound. Can you speak to the broader ecosystem and the importance of creating both internal and external partners within an organization? >> Yeah. And that's kind of building upon what Michelle was talking about. If you think about datas and I hate to use the phrase almost, but the fuel behind the process and how do you actually become insight-driven. And you look at the capabilities that you're needing to enable from that business process, that insight process. Your extended ecosystem on how do I make that happen? Partners and picking the right partner is important because a partner is one that actually helps you implement what your decisions are. So looking for a partner that has the capability that believes in being insight-driven and making sure that when you're leveraging data within your process that if you need to do it in a real-time fashion, that they can actually meet those needs of the business. And enabling on those process activities. So the ecosystem looking at how you look at your vendors, and fundamentally they need to be that trusted partner. Do they bring those same principles of value, of being insight driven? So they have to have those core values themselves in order to help you as a business person enable those capabilities. >> So Cindy I'm cool with fuel, but it's like super fuel when you talk about data. 'Cause it's not scarce, right? You're never going to run out. (Dave chuckling) So Michelle, let's talk about leadership. Who leads? What does so-called leadership look like in an organization that's insight driven? >> So I think the really interesting thing that is starting to evolve as late is that organizations, enterprises are really recognizing that not just that data is an asset and data has value, but exactly what we're talking about here, data really does drive what your business outcomes are going to be. Data driving into the insight or the raw data itself has the ability to set in motion what's going to happen in your business processes and your customer experiences. And so, as you kind of think about that, you're now starting to see your CEO, your CMO, your CRO coming back and saying, I need better data. I need information that's representative of what's happening in my business. I need to be better adaptive to what's going on with my customers. And ultimately that means I need to be smarter and have clearer forecasting into what's about ready to come. Not just one month, two months, three months, or a year from now, but in a week or tomorrow. And so that is having a trickle down effect to then looking at two other types of roles that are elevating from technical capacity to more business capacity. You have your chief data officer that is shaping the experiences with data and with insight and reconciling what type of information is necessary with it within the context of answering these questions and creating a future fit organization that is adaptive and resilient to things that are happening. And you also have a chief digital officer who is participating because they're providing the experience and shaping the information and the way that you're going to interact and execute on those business activities. And either running that autonomously or as part of an assistance for your employees and for your customers. So really to go from not just data aware to data-driven, but ultimately to be insight driven, you're seeing way more participation and leadership at that C-suite level and just underneath, because that's where the subject matter expertise is coming in to know how to create a data strategy that is tightly connected to your business strategy. >> Great, thank you. Let's wrap, and I've got a question for both of you, maybe Cindy, you could start and then Michelle bring us home. A lot of customers, they want to understand what's achievable. So it's helpful to paint a picture of a maturity model. I'd love to go there, but I'm not going to get there anytime soon, but I want to take some baby steps. So when you're performing an analysis on an insight driven organization, Cindy what do you see as the major characteristics that define the differences between sort of the early beginners sort of fat middle, if you will, and then the more advanced constituents? >> Yeah, I'm going to build upon what Michelle was talking about is data as an asset. And I think also being data aware and trying to actually become insight driven. Companies can also have data, and they can have data as a liability. And so when you're data aware, sometimes data can still be a liability to your organization. If you're not making business decisions on the most recent and relevant data, you're not going to be insight-driven. So you've got to move beyond that data awareness, where you're looking at data just from an operational reporting. But data's fundamentally driving the decisions that you make as a business. You're using data in real time. You're leveraging data to actually help you make and drive those decisions. So when we use the term you're data-driven, you can't just use the term tongue-in-cheek. It actually means that I'm using the recent, the relevant, and the accuracy of data to actually make the decisions for me, because we're all advancing upon, we're talking about artificial intelligence and so forth being able to do that. If you're just data aware, I would not be embracing on leveraging artificial intelligence. Because that means I probably haven't embedded data into my processes. Yes, data could very well still be a liability in your organization, so how do you actually make it an asset? >> Yeah I think data aware it's like cable ready. (Dave chuckling) So Michelle, maybe you could add to what Cindy just said and maybe add as well any advice that you have around creating and defining a data strategy. >> So every data strategy has a component of being data aware. This is like building the data museum. How do you capture everything that's available to you? How do you maintain that memory of your business? Bringing in data from your applications, your partners, third parties, wherever that information is available, you want to ensure that you're capturing it and you're managing and you're maintaining it. And this is really where you're starting to think about the fact that it is an asset, it has value. But you may not necessarily know what that value is yet. If you move into a category of data driven, what starts to shift and change there is you're starting to classify label, organize the information in context of how you're making decisions and how you do business. It could start from being more proficient from an analytic purpose. You also might start to introduce some early stages of data science in there. So you can do some predictions and some data mining to start to weed out some of those signals. And you might have some simple types of algorithms that you're deploying to do a next next best action, for example. And that's what data-driven is really about. You're starting to get value out of it. The data itself is starting to make sense in context of your business, but what you haven't done quite yet, which is what insight driven businesses are, is really starting to take away the gap between when you see it, know it, and then get the most value and really exploit what that is at the time when it's right, so in the moment. We talk about this in terms of perishable insights, data and insights are ephemeral. And we want to ensure that the way that we're managing that and delivering on that data and insights is in time with our decisions and the highest value outcome we're going to have, that that insight can provide us. So are we just introducing it as data-driven organizations where we could see spreadsheets and PowerPoint presentations and lots of mapping to help make longer strategic decisions, or are those insights coming up and being activated in an automated fashion within our business processes that are either assisting those human decisions at the point when they're needed, or an automated decisions for the types of digital experiences and capabilities that we're driving in our organization. So it's going from, I'm a data hoarder if I'm data aware to I'm interested in what's happening as a data-driven organization and understanding my data. And then lastly being insight driven is really where light between business, data and insight, there is none, it's all coming together for the best outcomes. >> Right, it's like people are acting on perfect or near perfect information. Or machines are doing so with a high degree of confidence. Great advice and insights, and thank you both for sharing your thoughts with our audience today, it was great to have you. >> Thank you. >> Thank you. >> Okay, now we're going to go into our industry deep dives. There are six industry breakouts. Financial services, insurance, manufacturing, retail communications, and public sector. Now each breakout is going to cover two distinct use cases for a total of essentially 12 really detailed segments. Now each of these is going to be available on demand, but you can scan the calendar on the homepage and navigate to your breakout session of choice. Or for more information, click on the agenda page and take a look to see which session is the best fit for you and then dive in. Join the chat and feel free to ask questions or contribute your knowledge, opinions, and data. Thanks so much for being part of the community, and enjoy the rest of the day. (upbeat music)

Published Date : Aug 2 2021

SUMMARY :

that support the data and Maybe you could talk and bring it to where that perhaps embody the fundamentals and how do they make sure in this day and age? And data needs to be accessible insight as to how you think that are allowing me to be and the technologies that is going to help satisfy and technology and the roles they play in order to help you as a business person You're never going to and the way that you're going to interact that define the to actually help you make that you have around creating and lots of mapping to help and thank you both for and navigate to your

ENTITIES

Entity	Category	Confidence
Michelle Goetz	PERSON	0.99+
Cindy Maike	PERSON	0.99+
Cindy	PERSON	0.99+
Dave	PERSON	0.99+
Michelle	PERSON	0.99+
Michele Goetz	PERSON	0.99+
Techsomel	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
one month	QUANTITY	0.99+
both	QUANTITY	0.99+
two	QUANTITY	0.99+
two months	QUANTITY	0.99+
each	QUANTITY	0.99+
PowerPoint	TITLE	0.98+
first principle	QUANTITY	0.98+
three months	QUANTITY	0.98+
six industry	QUANTITY	0.98+
IQVIA	ORGANIZATION	0.98+
two great guests	QUANTITY	0.97+
a year	QUANTITY	0.97+
today	DATE	0.97+
two distinct use cases	QUANTITY	0.96+
first thing	QUANTITY	0.96+
first	QUANTITY	0.94+
Forrester	ORGANIZATION	0.94+
tomorrow	DATE	0.93+
one	QUANTITY	0.93+
each breakout	QUANTITY	0.91+
a week	QUANTITY	0.9+
12 really detailed segments	QUANTITY	0.86+
two other types	QUANTITY	0.81+
last five years	DATE	0.7+
Cloudera 2021	ORGANIZATION	0.69+
European	OTHER	0.68+
Indonesian	OTHER	0.64+
Cube	ORGANIZATION	0.61+
SIA	ORGANIZATION	0.59+
Cindy	TITLE	0.33+

Cloud First – Data Driven Reinvention Drew Allan | Cloudera 2021

>>Okay. Now we're going to dig into the data landscape and cloud of course. And talk a little bit more about that with drew Allen. He's a managing director at Accenture drew. Welcome. Great to see you. Thank you. So let's talk a little bit about, you know, you've been in this game for a number of years. Uh, you've got a particular expertise in, in, in data and finance and insurance. I mean, you think about it within the data and analytics world, even our language is changing. You know, we don't say talk about big data so much anymore. We, we talk more about digital, you know, or, or, or data-driven when you think about sort of where we've come from and where we're going, what are the puts and takes that you have with regard to what's going on in the business today? >>Well, thanks for having me. Um, you know, I think some of the trends we're seeing in terms of challenges and puts some takes are that a lot of companies are already on this digital transformation journey. Um, they focused on customer experience is kind of table stakes. Everyone wants to focus on that and kind of digitizing their channels. But a lot of them are seeing that, you know, a lot of them don't even own their, their channels necessarily. So like we're working with a big cruise line, right. And yes, they've invested in digitizing what they own, but a lot of the channels that they sell through, they don't even own, right. It's the travel agencies or third-party real sellers. So having the data to know where, you know, where those agencies are, that that's something that they've discovered. And so there's a lot of big focus on not just digitizing, but also really understanding your customers and going across products because a lot of the data has built, been built up in individual channels and in digital products. >>And so bringing that data together is something that customers that have really figured out in the last few years is a big differentiator. And what we're seeing too, is that a big trend that the data rich are getting richer. So companies that have really invested in data, um, are having, uh, an outside market share and outside earnings per share and outside revenue growth. And it's really being a big differentiator. And I think for companies just getting started in this, the thing to think about is one of the missteps is to not try to capture all the data at once. The average company has, you know, 10,000, 20,000 data elements individually, when you want to start out, you know, 500, 300 critical data elements, about 5% of the data of a company drives 90% of the business value. So focusing on, on those key critical data elements is really what you need to govern first and really invest in first. And so that's something we tell companies at the beginning of their data strategy is first focus on those critical data elements, really get a handle on governing that data, organizing that data and building data products around >>That data. You can't boil the ocean. Right. And so, and I, I feel like pre pandemic, there was a lot of complacency. Oh yeah, we'll get to that. You know, not on my watch, I'll be retired before that, you know, it becomes a minute. And then of course the pandemic was, I call it sometimes a forced March to digital. So in many respects, it wasn't planned. It just ha you know, you had to do it. And so now I feel like people are stepping back and saying, okay, let's now really rethink this and do it right. But is there, is there a sense of urgency, do you think? >>Absolutely. I think with COVID, you know, we were working with, um, a retailer where they had 12,000 stores across the U S and they had didn't have the insights where they could drill down and understand, you know, with the riots and with COVID was the store operational, you know, with the supply chain of they having multiple, uh, distributors, what did they have in stock? So there are millions of data points that you need to drill down, down at the cell level, at the store level to really understand how's my business performing. And we like to think about it for like a CEO and his leadership team of like, think of it as a digital cockpit, right? You think about a pilot, they have a cockpit with all these dials and, um, dashboards, essentially understanding the performance of their business. And they should be able to drill down and understand for each individual, you know, unit of their work, how are they performing? That's really what we want to see for businesses. Can they get down to that individual performance to really understand how their businesses and >>The ability to connect those dots and traverse those data points and not have to go in and come back out and go into a new system and come back out. And that's really been a lot of the frustration where does machine intelligence and AI fit in? Is that sort of a dot connector, if you will, and an enabler, I mean, we saw, you know, decades of the, the AI winter, and then, you know, there's been a lot of talk about it, but it feels like with the amount of data that we've collected over the last decade and the, the, the low costs of processing that data now, it feels like it's, it's real. Where do you see AI fitting in? Yeah, >>I mean, I think there's been a lot of innovation in the last 10 years with, um, the low cost of storage and computing and these algorithms in non-linear, um, you know, knowledge graphs, and, um, um, a whole bunch of opportunities in cloud where what I think the, the big opportunity is, you know, you can apply AI in areas where a human just couldn't have the scale to do that alone. So back to the example of a cruise lines, you know, you may have a ship being built that has 4,000 cabins on the single cruise line, and it's going to multiple deaths that destinations over its 30 year life cycle. Each one of those cabins is being priced individually for each individual destination. It's physically impossible for a human to calculate the dynamic pricing across all those destinations. You need a machine to actually do that pricing. And so really what a machine is leveraging is all that data to really calculate and assist the human, essentially with all these opportunities where you wouldn't have a human being able to scale up to that amount of data >>Alone. You know, it's interesting. One of the things we talked to Mick Halston about earlier was just the everybody's algorithms are out of whack. You know, you look at the airline pricing, you look at hotels it's as a consumer, you would be able to kind of game the system and predict a, they can't even predict these days. And I feel as though that the data and AI are actually going to bring us back into some kind of normalcy and predictability, uh, w what do you see in that regard? >>Yeah, I think it's, I mean, we're definitely not at a point where when I talk to, you know, the top AI engineers and data scientists, we're not at a point where we have what they call broad AI, right? Where you can get machines to solve general knowledge problems, where they can solve one problem, and then a distinctly different problem, right? That's still many years away, but narrow AI, there's still tons of use cases out there that can really drive tons of business performance challenges, tons of accuracy challenges. So, for example, in the insurance industry, commercial lines, where I work a lot of the time, the biggest leakage of loss experience and pricing for commercial insurers is, um, people will go in as an agent and they'll select an industry to say, you know what, I'm a restaurant business. Um, I'll select this industry code to quote out a policy, but there's, let's say, you know, 12 dozen permutations, you could be an outdoor restaurant. >>You could be a bar, you could be a caterer, and all of that leads to different loss experience. So what this does is they built a machine learning algorithm. We've helped them do this, that actually at the time that they're putting in their name and address, it's crawling across the web and predicting in real time, you know, is this address actually, you know, a business that's a restaurant with indoor dining, does it have a bar is an outdoor dining, and it's that that's able to accurately more price the policy and reduce the loss experience. So there's a lot of that you can do, even with narrow AI that can really drive top line of business results. >>Yeah. I like that term narrow AI because getting things done is important. Let's talk about cloud a little bit because people talk about cloud first public cloud first doesn't necessarily mean public cloud only, of course. So where do you see things like what's the right operating model, the right regime hybrid cloud. We talked earlier about hybrid data help us squint through the cloud landscape. Yeah. >>I mean, I think for most right, most fortune 500 companies, they can't just their fingers and say, let's move all of our data centers to the cloud. They've got to move, you know, gradually. And it's usually a journey that's taking more than two to three plus years, even more than that in some cases. So they're half they have to move their data, uh, incrementally to the cloud. And what that means is that, that they have to move to a hybrid perspective where some of their data is on premise and some of it is publicly on the cloud. And so that's the term hybrid cloud essentially. And so what they've had to think about is from an intelligence perspective, the privacy of that data, where is it being moved? Can they reduce the replication of that data? Because ultimately you like, uh, replicating the data from on-premise to, to the cloud that introduces, you know, errors and data quality issues. So thinking about how do you manage, uh, you know, uh, on-premise and public cloud as a transition is something that Accenture thinks, thinks, and helps our clients do quite a bit. And how do you move them in a manner that's well-organized and well thought about? >>Yeah. So I've been a big proponent of sort of line of business lines of business becoming much more involved in, in the data pipeline, if you will, the data process, if you think about our major operational systems, they all have sort of line of business context in them. Then the salespeople, they know the CRM data and, you know, logistics folks. There they're very much in tune with ERP. I almost feel like for the past decade, the lines of business have been somewhat removed from the, the data team, if you will. And that, that seems to be changing. What are you seeing in terms of the line of line of business being much more involved in sort of end to end ownership if you will, if I can use that term of, uh, of the data and sort of determining things like helping determine anyway, the data quality and things of that nature. Yeah. >>I mean, I think this is where thinking about your data operating model and thinking about ideas of a chief data officer and having data on the CEO agenda, that's really important to get the lines of business, to really think about data sharing and reuse, and really getting them to, you know, kind of unlock the data because they do think about their data as a fiefdom data has value, but you've got to really get organizations in their silos to open it up and bring that data together because that's where the value is. You know, data doesn't operate. When you think about a customer, they don't operate in their journey across the business in silo channels. They don't think about, you know, I use only the web and then I use the call center, right? They think about that as just one experience. And that data is a single journey. >>So we like to think about data as a product. You know, you should think about a data in the same way. You think about your products as, as products, you know, data as a product, you should have the idea of like every two weeks you have releases to it. You have an operational resiliency to it. So thinking about that, where you can have a very product mindset to delivering your data, I think is very important for the success. And that's where kind of, there's not just the things about critical data elements and having the right platform architecture, but there's a soft stuff as well, like a product mindset to data, having the right data, culture, and business adoption and having the right value set mindset for, for data, I think is really, >>I think data as a product is a very powerful concept. And I think it maybe is uncomfortable to some people sometimes. And I think in the early days of big data, if you will, people thought, okay, data is a product going to sell my data, and that's not necessarily what you mean. You mean thinking about products or data that can fuel products that you can then monetize maybe as a product or as a, as, as a service. And I like to think about a new metric in the industry, which is how long does it take me to get from idea of I'm a business person. I have an idea for a data product. How long does it take me to get from idea to monetization? And that's going to be something that ultimately as a business person, I'm going to use to determine the success of my data team and my, my data architecture is, is that kind of thinking starting to really hit the marketplace. >>I mean, I insurers now are working, partnering with, you know, auto manufacturers to monetize, um, driver usage data, you know, on telematics to see, you know, driver behavior on how, you know, how auto manufacturers are using that data. That's very important to insurers, you know, so how an auto manufacturer can monetize that data is very important and also an insurance, you know, cyber insurance, um, are there news new ways we can look at how companies are being attacked with viruses and malware, and is there a way we can somehow monetize that information? So companies that are able to agily, you know, think about how can we, you know, collect this data, bring it together, think about it as a product, and then potentially, you know, sell it as a service is something that, um, company, successful companies are doing >>Great examples of data products, and it might be revenue generating, or it might be in the case of, you know, cyber, maybe it reduces my expected loss. Exactly. And it drops right to my bottom line. What's the relationship between Accenture and cloud era? Do you, I presume you guys meet at the customer, but maybe you could give us some insight as to yeah. So, >>Um, I I'm in the executive sponsor for, um, the Accenture cloud era partnership on the Accenture side. Uh, we do quite a lot of business together and, um, you know, Cloudera has been a great partner for us. Um, and they've got a great product in terms of the Cloudera data platform where, you know, what we do is as a big systems integrator for them, we help, um, you know, configure and we have a number of engineers across the world that come in and help in terms of, um, engineer architects and install, uh, cloud errors, data platform, and think about what are some of those, you know, value cases where you can really think about organizing data and bringing it together for all these different types of use cases. And really just as the examples we thought about. So the telematics, you know, um, in order to realize something like that, you're bringing in petabytes and huge scales of data that, you know, you just couldn't bring on a normal, uh, platform. You need to think about cloud. You need to think about speed of, of data and real-time insights and cloud errors, the right data platform for that. So, um, >>That'd be Cloudera ushered in the modern big data era. We, we kind of all know that, and it was, which of course early on, it was very services intensive. You guys were right there helping people think through there weren't enough data scientists. We've sort of all, all been through that. And of course in your wheelhouse industries, you know, financial services and insurance, they were some of the early adopters, weren't they? Yeah, >>Absolutely. Um, so, you know, an insurance, you've got huge amounts of data with loss history and, um, a lot with IOT. So in insurance, there's a whole thing of like sensorized thing in, uh, you know, taking the physical world and digitizing it. So, um, there's a big thing in insurance where, um, it's not just about, um, pricing out the risk of a loss experience, but actual reducing the loss before it even happens. So it's called risk control or loss control, you know, can we actually put sensors on oil pipelines or on elevators and, you know, reduce, um, you know, accidents before they happen. So we're, you know, working with an insurer to actually, um, listen to elevators as they move up and down and are there signals in just listening to the audio of an elevator over time that says, you know what, this elevator is going to need maintenance, you know, before a critical accident could happen. So there's huge applications, not just in structured data, but in unstructured data like voice and audio and video where a partner like Cloudera has a huge role apply. >>Great example of it. So again, narrow sort of use case for machine intelligence, but, but real value. True. We'll leave it like that. Thanks so much for taking some time. Thank you.

Published Date : Aug 2 2021

SUMMARY :

So let's talk a little bit about, you know, you've been in this game But a lot of them are seeing that, you know, a lot of them don't even own their, you know, 10,000, 20,000 data elements individually, when you want to start out, It just ha you know, I think with COVID, you know, we were working with, um, a retailer where and an enabler, I mean, we saw, you know, decades of the, the AI winter, the big opportunity is, you know, you can apply AI in areas where You know, you look at the airline pricing, you look at hotels it's as a Yeah, I think it's, I mean, we're definitely not at a point where when I talk to, you know, you know, is this address actually, you know, a business that's a restaurant So where do you see things like They've got to move, you know, gradually. more involved in, in the data pipeline, if you will, the data process, and really getting them to, you know, kind of unlock the data because they do You know, you should think about a data in And I think in the early days of big data, if you will, people thought, okay, data is a product going to sell my data, that are able to agily, you know, think about how can we, you know, collect this data, Great examples of data products, and it might be revenue generating, or it might be in the case of, you know, So the telematics, you know, um, in order to realize something you know, financial services and insurance, they were some of the early adopters, weren't they? this elevator is going to need maintenance, you know, before a critical accident could happen. So again, narrow sort of use case for machine intelligence,

ENTITIES

Entity	Category	Confidence
Accenture	ORGANIZATION	0.99+
Mick Halston	PERSON	0.99+
90%	QUANTITY	0.99+
10,000	QUANTITY	0.99+
4,000 cabins	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
12 dozen	QUANTITY	0.99+
12,000 stores	QUANTITY	0.99+
Drew Allan	PERSON	0.99+
U S	LOCATION	0.99+
more than two	QUANTITY	0.98+
one experience	QUANTITY	0.98+
each individual	QUANTITY	0.98+
One	QUANTITY	0.98+
first	QUANTITY	0.97+
pandemic	EVENT	0.97+
Allen	PERSON	0.97+
one	QUANTITY	0.96+
one problem	QUANTITY	0.96+
about 5%	QUANTITY	0.95+
three plus years	QUANTITY	0.94+
Each one	QUANTITY	0.94+
30 year	QUANTITY	0.93+
single cruise line	QUANTITY	0.92+
COVID	ORGANIZATION	0.91+
500, 300 critical data elements	QUANTITY	0.9+
today	DATE	0.89+
20,000 data elements	QUANTITY	0.89+
companies	QUANTITY	0.89+
decades	QUANTITY	0.85+
Accenture drew	ORGANIZATION	0.84+
single journey	QUANTITY	0.83+
2021	DATE	0.83+
each individual destination	QUANTITY	0.8+
millions of data points	QUANTITY	0.77+
last decade	DATE	0.74+
two weeks	QUANTITY	0.73+
last 10 years	DATE	0.72+
fortune 500	ORGANIZATION	0.71+
tons	QUANTITY	0.69+
half	QUANTITY	0.68+
last few years	DATE	0.65+
fiefdom	QUANTITY	0.63+
Cloud First	ORGANIZATION	0.6+
past decade	DATE	0.58+
March	DATE	0.55+

Cindy Maike & Nasheb Ismaily | Cloudera

>>Hi, this is Cindy Mikey, vice president of industry solutions at Cloudera. Joining me today is chef is Molly, our solution engineer for the public sector. Today. We're going to talk about speed to insight. Why using machine learning in the public sector, specifically around fraud, waste and abuse. So topic for today, we'll discuss machine learning, why the public sector uses it to target fraud, waste, and abuse, the challenges. How do we enhance your data and analytical approaches the data landscape analytical methods and Shev we'll go over reference architecture and a case study. So by definition, fraud, waste and abuse per the government accountability office is fraud is an attempt to obtain something about a value through unwelcomed. Misrepresentation waste is about squandering money or resources and abuse is about behaving improperly or unreasonably to actually obtain something of value for your personal benefit. So as we look at fraud and across all industries, it's a top of mind, um, area within the public sector. >>Um, the types of fraud that we see is specifically around cyber crime, uh, looking at accounting fraud, whether it be from an individual perspective to also, uh, within organizations, looking at financial statement fraud, to also looking at bribery and corruption, as we look at fraud, it really hits us from all angles, whether it be from external perpetrators or internal perpetrators, and specifically from the research by PWC, the key focus area is we also see over half of fraud is actually through some form of internal or external are perpetrators again, key topics. So as we also look at a report recently by the association of certified fraud examiners, um, within the public sector, the us government, um, in 2017, it was identified roughly $148 billion was attributable to fraud, waste and abuse. Specifically of that 57 billion was focused on reported monetary losses and another 91 billion on areas where that opportunity or the monetary basis had not yet been measured. >>As we look at breaking those areas down again, we look at several different topics from an out payment perspective. So breaking it down within the health system, over $65 billion within social services, over $51 billion to procurement fraud to also, um, uh, fraud, waste and abuse that's happening in the grants and the loan process to payroll fraud, and then other aspects, again, quite a few different topical areas. So as we look at those areas, what are the areas that we see additional type of focus, there's broad stroke areas? What are the actual use cases that our agencies are using the data landscape? What data, what analytical methods can we use to actually help curtail and prevent some of the, uh, the fraud waste and abuse. So, as we look at some of the analytical processes and analytical use crate, uh, use cases in the public sector, whether it's from, uh, you know, the taxation areas to looking at social services, uh, to public safety, to also the, um, our, um, uh, additional agency methods, we're going to focus specifically on some of the use cases around, um, you know, fraud within the tax area. >>Uh, we'll briefly look at some of the aspects of unemployment insurance fraud, uh, benefit fraud, as well as payment and integrity. So fraud has its, um, uh, underpinnings in quite a few different on government agencies and difficult, different analytical methods and I usage of different data. So I think one of the key elements is, you know, you can look at your, your data landscape on specific data sources that you need, but it's really about bringing together different data sources across a different variety, a different velocity. So, uh, data has different dimensions. So we'll look at on structured types of data of semi-structured data, behavioral data, as well as when we look at, um, you know, predictive models, we're typically looking at historical type information, but if we're actually trying to lock at preventing fraud before it actually happens, or when a case may be in flight, which is specifically a use case, that shadow is going to talk about later it's how do I look at more of that? >>Real-time that streaming information? How do I take advantage of data, whether it be, uh, you know, uh, financial transactions we're looking at, um, asset verification, we're looking at tax records, we're looking at corporate filings. Um, and we can also look at more, uh, advanced data sources where as we're looking at, um, investigation type information. So we're maybe going out and we're looking at, uh, deep learning type models around, uh, you know, semi or that behavioral, uh, that's unstructured data, whether it be camera analysis and so forth. So quite a different variety of data and the, the breadth, um, and the opportunity really comes about when you can integrate and look at data across all different data sources. So in a sense, looking at a more extensive on data landscape. So specifically I want to focus on some of the methods, some of the data sources and some of the analytical techniques that we're seeing, uh, being used, um, in the government agencies, as well as opportunities, uh, to look at new methods. >>So as we're looking at, you know, from a, um, an audit planning or looking at, uh, the opportunity for the likelihood of non-compliance, um, specifically we'll see data sources where we're maybe looking at a constituents profile, we might actually be, um, investigating the forms that they've provided. We might be comparing that data, um, or leveraging internal data sources, possibly looking at net worth, comparing it against other financial data, and also comparison across other constituents groups. Some of the techniques that we use are some of the basic natural language processing, maybe we're going to do some text mining. We might be doing some probabilistic modeling, uh, where we're actually looking at, um, information within the agency to also comparing that against possibly tax forms. A lot of times it's information historically has been done on a batch perspective, both structured and semi-structured type information. And typically the data volumes can be low, but we're also seeing those data volumes increase exponentially based upon the types of events that we're dealing with, the number of transactions. >>Um, so getting the throughput, um, and chef's going to specifically talk about that in a moment. The other aspect is, as we look at other areas of opportunity is when we're building upon, how do I actually do compliance? How do I actually look at conducting audits, uh, or potential fraud to also looking at areas of under reported tax information? So there you might be pulling in some of our other types of data sources, whether it's being property records, it could be data that's being supplied by the actual constituents or by vendors to also pulling in social media information to geographical information, to leveraging photos on techniques that we're seeing used is possibly some sentiment analysis, link analysis. Um, how do we actually blend those data sources together from a natural language processing? But I think what's important here is also the method and the looking at the data velocity, whether it be batch, whether it be near real time, again, looking at all types of data, whether it's structured semi-structured or unstructured and the key and the value behind this is, um, how do we actually look at increasing the potential revenue or the, um, under reported revenue? >>Uh, how do we actually look at stopping fraudulent payments before they actually occur? Um, also looking at increasing the amount of, uh, the level of compliance, um, and also looking at the potential of prosecution of fraud cases. And additionally, other areas of opportunity could be looking at, um, economic planning. How do we actually perform some link analysis? How do we bring some more of those things that we saw in the data landscape on customer, or, you know, constituent interaction, bringing in social media, bringing in, uh, potentially police records, property records, um, other tax department, database information. Um, and then also looking at comparing one individual to other individuals, looking at people like a specific, like, uh, constituent, are there areas where we're seeing, uh, um, other aspects of, of fraud potentially being, uh, occurring. Um, and also as we move forward, some of the more advanced techniques that we're seeing around deep learning is looking at computer vision, um, leveraging geospatial information, looking at social network entity analysis, uh, also looking at, um, agent-based modeling techniques, where we're looking at simulation, Monte Carlo type techniques that we typically see in the financial services industry, actually applying that to fraud, waste, and abuse within the, the public sector. >>Um, and again, that really, uh, lends itself to a new opportunities. And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing these buckets. >>Sure. Yeah. Thanks, Cindy. Um, so I'm going to walk you through an example, reference architecture for fraud detection, using Cloudera as underlying technology. Um, and you know, before I get into the technical details, uh, I want to talk about how this would be implemented at a much higher level. So with fraud detection, what we're trying to do is identify anomalies or anomalous behavior within our datasets. Um, now in order to understand what aspects of our incoming data represents anomalous behavior, we first need to understand what normal behavior is. So in essence, once we understand normal behavior, anything that deviates from it can be thought of as an anomaly, right? So in order to understand what normal behavior is, we're going to need to be able to collect store and process a very large amount of historical data. And so incomes, clutters platform, and this reference architecture that needs to be for you. >>So, uh, let's start on the left-hand side of this reference architecture with the collect phase. So fraud detection will always begin with data collection. Uh, we need to collect large amounts of information from systems that could be in the cloud. It could be in the data center or even on edge devices, and this data needs to be collected so we can create from normal behavior profiles and these normal behavioral profiles would then in turn, be used to create our predictive models for fraudulent activity. Now, uh, uh, to the data collection side, one of the main challenges that many organizations face, uh, in this phase, uh, involves using a single technology that can handle, uh, data that's coming in all different types of formats and protocols and standards with different velocities and velocities. Um, let me give you an example. Uh, we could be collecting data from a database that gets updated daily, uh, and maybe that data is being collected in Agra format. >>At the same time, we can be collecting data from an edge device that's streaming in every second, and that data may be coming in Jace on or a binary format, right? So this is a data collection challenge that can be solved with cluttered data flow, which is a suite of technologies built on Apache NIFA and mini five, allowing us to ingest all of this data, do a drag and drop interface. So now we're collecting all of this data, that's required to map out normal behavior. The next thing that we need to do is enrich it, transform it and distribute it to know downstream systems for further process. Uh, so let's, let's walk through how that would work first. Let's taking Richmond for, uh, for enrichment, think of adding additional information to your incoming data, right? Let's take, uh, financial transactions, for example, uh, because Cindy mentioned it earlier, right? >>You can store known locations of an individual in an operational database, uh, with Cloudera that would be HBase. And as an individual makes a new transaction, their geo location that's in that transaction data, it can be enriched with previously known locations of that very same individual and all of that enriched data. It can be later used downstream for predictive analysis, predictable. So the data has been enrich. Uh, now it needs to be transformed. We want the data that's coming in, uh, you know, Avro and Jason and binary and whatever other format to be transformed into a single common format. So it can be used downstream for stream processing. Uh, again, this is going to be done through clutter and data flow, which is backed by NIFA, right? So the transformed semantic data is then going to be stimulated to Kafka and coffin is going to serve as that central repository of syndicated services or a buffer zone, right? >>So cough is, you know, pretty much provides you with, uh, extremely fast resilient and fault tolerance storage. And it's also going to give you the consumer API APIs that you need that are going to enable a wide variety of applications to leverage that enriched and transform data within your buffer zone. Uh, I'll add that, you know, 17, so you can store that data, uh, in a distributed file system, give you that historical context that you're going to need later on from machine learning, right? So the next step in the architecture is to leverage, uh, clutter SQL stream builder, which enables us to write, uh, streaming sequel jobs on top of Apache Flink. So we can, uh, filter, analyze and, uh, understand the data that's in the Kafka buffer zone in real-time. Uh, I'll, you know, I'll also add like, you know, if you have time series data, or if you need a lab type of cubing, you can leverage Q2, uh, while EDA or, you know, exploratory data analysis and visualization, uh, can all be enabled through clever visualization technology. >>All right, so we've filtered, we've analyzed, and we've our incoming data. We can now proceed to train our machine learning models, uh, which will detect anomalous behavior in our historically collected data set, uh, to do this, we can use a combination of supervised unsupervised, even deep learning techniques with neural networks. Uh, and these models can be tested on new incoming streaming data. And once we've gone ahead and obtain the accuracy of the performance, the X one, uh, scores that we want, we can then take these models and deploy them into production. And once the models are productionalized or operationalized, they can be leveraged within our streaming pipeline. So as new data is ingested in real time knife, I can query these models to detect if the activity is anomalous or fraudulent. And if it is, they can alert downstream users and systems, right? So this in essence is how fraudulent activity detection works. Uh, and this entire pipeline is powered by clutters technology. Uh, Cindy, next slide please. >>Right. And so, uh, the IRS is one of, uh, clutter as customers. That's leveraging our platform today and implementing a very similar architecture, uh, to detect fraud, waste, and abuse across a very large set of, uh, historical facts, data. Um, and one of the neat things with the IRS is that they've actually recently leveraged the partnership between Cloudera and Nvidia to accelerate their Spark-based analytics and their machine learning. Uh, and the results have been nothing short of amazing, right? And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the research analytics and statistics division group within the IRS with zero changes to our fraud detection workflow, we're able to obtain eight times to performance simply by adding GPS to our mainstream big data servers. This improvement translates to half the cost of ownership for the same workloads, right? So embedding GPU's into the reference architecture I covered earlier has enabled the IRS to improve their time to insights by as much as eight X while simultaneously reducing their underlying infrastructure costs by half, uh, Cindy back to you >>Chef. Thank you. Um, and I hope that you found, uh, some of the, the analysis, the information that Sheva and I have provided, uh, to give you some insights on how cloud era is actually helping, uh, with the fraud waste and abuse challenges within the, uh, the public sector, um, specifically looking at any and all types of data, how the clutter a platform is bringing together and analyzing information, whether it be you're structured you're semi-structured to unstructured data, both in a fast or in a real-time perspective, looking at anomalies, being able to do some of those on detection methods, uh, looking at neural network analysis, time series information. So next steps we'd love to have an additional conversation with you. You can also find on some additional information around how called areas working in federal government, by going to cloudera.com solutions slash public sector. And we welcome scheduling a meeting with you again, thank you for joining us today. Uh, we greatly appreciate your time and look forward to future conversations. Thank you.

Published Date : Jul 22 2021

SUMMARY :

So as we look at fraud and across So as we also look at a report So as we look at those areas, what are the areas that we see additional So I think one of the key elements is, you know, you can look at your, Um, and we can also look at more, uh, advanced data sources So as we're looking at, you know, from a, um, an audit planning or looking and the value behind this is, um, how do we actually look at increasing Um, also looking at increasing the amount of, uh, the level of compliance, um, And on that, I'm going to turn it over to Chevy to talk about, uh, the reference architecture for doing Um, and you know, before I get into the technical details, uh, I want to talk about how this It could be in the data center or even on edge devices, and this data needs to be collected so At the same time, we can be collecting data from an edge device that's streaming in every second, So the data has been enrich. So the next step in the architecture is to leverage, uh, clutter SQL stream builder, obtain the accuracy of the performance, the X one, uh, scores that we want, And in fact, we have a quote here from Joe and salty who's, uh, you know, the technical branch chief for the the analysis, the information that Sheva and I have provided, uh, to give you some insights

ENTITIES

Entity	Category	Confidence
Cindy Mikey	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Molly	PERSON	0.99+
Nasheb Ismaily	PERSON	0.99+
PWC	ORGANIZATION	0.99+
Joe	PERSON	0.99+
Cindy	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
2017	DATE	0.99+
Cindy Maike	PERSON	0.99+
Today	DATE	0.99+
over $65 billion	QUANTITY	0.99+
today	DATE	0.99+
NIFA	ORGANIZATION	0.99+
over $51 billion	QUANTITY	0.99+
57 billion	QUANTITY	0.99+
salty	PERSON	0.99+
single	QUANTITY	0.98+
first	QUANTITY	0.98+
Jason	PERSON	0.98+
one	QUANTITY	0.97+
91 billion	QUANTITY	0.97+
IRS	ORGANIZATION	0.96+
Shev	PERSON	0.95+
both	QUANTITY	0.95+
Avro	PERSON	0.94+
Apache	ORGANIZATION	0.93+
eight	QUANTITY	0.93+
$148 billion	QUANTITY	0.92+
zero changes	QUANTITY	0.91+
Richmond	LOCATION	0.91+
Sheva	PERSON	0.88+
single technology	QUANTITY	0.86+
Cloudera	TITLE	0.85+
Monte Carlo	TITLE	0.84+
eight times	QUANTITY	0.83+
cloudera.com	OTHER	0.79+
Kafka	TITLE	0.77+
second	QUANTITY	0.77+
one individual	QUANTITY	0.76+
coffin	PERSON	0.72+
Kafka	PERSON	0.69+
Jace	TITLE	0.69+
SQL	TITLE	0.68+
17	QUANTITY	0.68+
over half	QUANTITY	0.63+
Chevy	ORGANIZATION	0.57+
elements	QUANTITY	0.56+
half	QUANTITY	0.56+
mini five	COMMERCIAL_ITEM	0.54+
Apache Flink	ORGANIZATION	0.52+
HBase	TITLE	0.45+

Cloudera Transform Innovative Ideas Promo

>>Speed is everything in a hyper competitive climate. The faster we get insights from data and get data products to market. The faster we grow and the more competitive we become, this is Dave Volante from the cutie inviting you to join us on Thursday, August 5th, for cloud areas, industry insights. We'll look at the biggest challenges facing businesses today, especially the need to access and leverage data at an accelerated velocity. You'll hear from industry leaders like Nick Collison, whose cloud era's president, Rob Bearden, the CEO of Cloudera, Michelle Goetz from Forrester. You'll hear from Nvidia and industry experts in insurance, manufacturing, retail, and public sector. Who can address your biggest concerns? Like how do I remove constraints and put data at the core of my business, streaming begins at 9:00 AM Pacific on the Q3 65. You're a leader in global enterprise tech coverage.

Published Date : Jul 22 2021

SUMMARY :

You'll hear from industry leaders like Nick Collison,

ENTITIES

Entity	Category	Confidence
Michelle Goetz	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Nick Collison	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Thursday, August 5th	DATE	0.99+
Cloudera	ORGANIZATION	0.99+
9:00 AM Pacific	DATE	0.97+
today	DATE	0.89+
Forrester	ORGANIZATION	0.75+
Q3	LOCATION	0.44+
65	EVENT	0.42+

Ram Venkatesh, Cloudera | AWS re:Invent 2020

>>from >>around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS and our community partners. >>Everyone welcome back to the cubes Coverage of AWS reinvent 2020 virtual. This is the Cube virtual. I'm John for your host this year. We're not in person. We're doing remote interviews because of the pandemic. The whole events virtual over three weeks for this week would be having a lot of coverage in and out of what's going on with the news. All that stuff here happening on the Cube Our next guest is a featured segment. Brown Venkatesh, VP of Engineering at Cloudera. Welcome back to the Cube Cube Alumni. Last time you were on was 2018 when we had physical events. Great to see you, >>like good to be here. Thank you. >>S O. You know, Cloudera obviously modernized up with Horton works. That comedy has been for a while, always pioneering this abstraction layer originally with a dupe. Now, with data, all those right calls were made. Data is hot is a big part of reinvent. That's a big part of the theme, you know, machine learning ai ai edge edge edge data lakes on steroids, higher level services in the cloud. This is the focus of reinvents. The big conversations Give us an update on cloud eras. Data platform. What's that? What's new? >>Absolutely. You are really speaking of languages. Read with the whole, uh, data lake architecture that you alluded to. It's uploaded. This mission has always been about, you know, we want to manage how the world's data that what this means for our customers is being ableto aggregate data from lots of different sources into central places that we call data lakes on. Then apply lots of different types of passing to it to direct business value that would cdp with Florida data platform. What we have essentially done is take those same three core tenants around data legs multifunctional takes on data stewardship of management to add on a bunch off cloud native capabilities to it. So this was fundamentally I'm talking about things like disaggregated storage and compute by being able to now not only take advantage of H d efs, but also had a pretty deep, fundamental level club storage. But this is the form factor that's really, really good for our customers. Toe or to operate that from a TCO perspective, if you're going to manage hundreds of terabytes of data like like a lot of a lot of customers do it. The second key piece that we've done with CDP has to do with us embracing containers and communities in a big way on primer heritages around which machines and clusters and things of that nature. But in the cloud context, especially in the context, off managed community services like Amazon CKs, this Lexus spin apart traditional workloads, Sequels, park machine learning and so on. In the context of these Cuban exiles containerized environments which lets customers spin these up in seconds. They're supposed to, you know, tens of minutes on as they're passing, needs grow and shrink. They can actually scale much, much faster up and down to, you know, to make sure that they have the right cost effective footprint for their compute e >>go ahead third piece. >>But the turkey piece of all of this right is to say, along with like cloud native orchestration and cloud NATO storage is that we've embraced this notion of making sure that you actually have a robust data discovery story around it. so increasingly the data sets that you create on top off a platform like CDP. There themselves have value in other use cases that you want to make sure that these data sets are properly replicated. They're probably secure the public government. So you can go and analyze where the data set came from. Capabilities of security and provenance are increasingly more important to our customers. So with CDP, we have a really good story around that data stewardship aspect, which is increasingly important as you as you get into the cloud. And you have these sophisticated sharing scenarios. The >>you know, Clotaire has always had and Horton works. Both companies had strong technical chops. It's well document. Certainly the queues been toe all the events and covered both companies since the inception of 10 years ago. A big data. But now we're in cloud. Big data, fast data, little data, all data. This is what the cloud brings. So I want to get your thoughts on the number one focus of problem solving around cloud. I gotta migrate. Or do I move to the cloud immediately and be born there? Now we know the hyper scale is born in the cloud companies like the Dropbox in the world. They were born in the cloud and all the benefits and goodness came with that. But I'm gonna be pivoting. I'm a company at a co vid with a growth strategy. Lift and shift. Okay, that was It's over. Now that's the low hanging fruit that's use cases kind of done. Been there, done that. Is it migration or born in the cloud? Take us through your thoughts on what does the company do right now? >>E thinks it's a really good question. If you think off, you know where our customers are in their own data journey, right? So increasingly. You know, a few years ago, I would say it was about operating infrastructure. That's where their head was at, right? Increasingly, I think for them it's about deriving value from the data assets that they already have on. This typically means in a combining data from different sources the structure data, some restructure data, transactional data, non transactional, data event oriented data messaging data. They wanna bring all of that and analyze that to make sure that they can actually identify ways toe monetize it in ways that they had not thought about when they actually stored the data originally, right? So I think it's this drive towards increasing monetization of data assets that's driving the new use cases on the platform. Traditionally, it used to be about, you know, sequel analysts who are, if you are like a data scientist using a party's park. So it was sort of this one function that you would focus on with the data. But increasingly, we're seeing these air about, you know, these air collaborative use cases where you wanna have a little bit of sequel, a little bit of machine learning, a little bit off, you know, potentially real time streaming or even things like Apache fling that you're gonna use to actually analyze the data eso when this kind of an environment. But we see that the data that's being generated on Prem is extremely relevant to the use case, but the speed at which they want to deploy the use case. They really want to make sure that they can take advantage of the clouds, agility and infinite capacity to go do that. So it's it's really the answer is it's complicated. It's not so much about you know I'm gonna move my data platform that I used to run the old way from here to there. But it's about I got this use case and I got to stand this up in six weeks, right in the middle of the pandemic on how do I go do that on the data that has to come from my existing line of business systems. I'm not gonna move those over, but I want to make sure that I can analyze the data from their in some cohesive Does that make sense? >>Totally makes sense. And I think just to kind of bring that back for the folks watching. And I remember when CDP was launching the thes data platforms, it really was to replace the data warehouse is the old antiquated way of doing things. But it was interesting. It wasn't just about competing at that old category. It was a new category. So, yeah, you had to have some tooling some sequel, you know, to wrangle data and have some prefabricated, you know, data fenced out somewhere in some warehouse. But the value was the new use cases of data where you never know. You don't know where it's going to come until it comes right, because if you make it addressable, that was the idea of the data platform and data Lakes and then having higher level services. So s so to me. That's, I think, one distinction kind of new category coexisting and disrupting an old category data warehousing. Always bought into that. You know, there's some technical things spark Do all these elements on mechanisms underneath. That's just evolution. But income in incomes cloud on. I want to get your thoughts on this because one of the things that's coming out of all my interviews is speed, speed, speed, deploying high, high, large scale at very large speed. This is the modern application thinking okay to make that work, you gotta have the data fabric underneath. This has always been kind of the dream scenario, So it's kind of playing out. So one Do you believe in that? And to what is the relationship between Cloudera and AWS? Because I think that kind of interestingly points to this one piece. >>Absolutely. So I think that yeah, from my perspective, this is what we call the shared data experience that's central to see PP like the idea is that, you know, data that is generated by the business in one use case is relevant and valid in another use case that is central to how we see companies leveraging data or the second order monetization that they're after, Right? So I think this is where getting out off a traditional data warehouse like data side of context, being able to analyze all of the data that you have, I think is really, really important for many of our customers. For example, many of them increasingly hold what they call this like data hackathons right where they're looking at can be answered. This new question from all the data that we have that is, that is a type of use case that's really hard to enable unless you have a very cohesive, very homogeneous view off all of your data. When it comes to the cloud partners, right, Increasingly, we see that the cloud native services, especially for the core storage, compute and security services are extremely robust that they give us, you know, the scale and that's really truly unparalled in terms of how much data we can address, how quickly we can actually get access to compute on demand when we need it. And we can do all of this with, like, a very, very mature security and governance fabric that you can fit into. So we see that, you know, technologies like s three, for example, have come a long way on along the journey with Amazon on this over the last 78 years. But we both learned how to operate our work clothes. When you're running a terabytes scale, right, you really have to pay attention to matters like scale out and consistency and parallelism and all of these things. These matters significantly right? And it's taken a certain maturity curve that you have to go through to get there. The last part of that is that because the TCO is so optimized with the customer to operate this without any ops on their side, they could just start consuming data, even if it's a terabyte of data. So this means that now we have to have the smarts in the processing engines to think about things like cashing, for example very, very differently because the way you cash data that Zinn hedge defense is very different from how you would do that in the context of his three are similarly, the way you think about consistency and metadata is very, very different at that layer. But we made sure that we can abstract these differences out at the platform layer so that as an as it is an application consumer, you really get the same experience, whether you're running these analytics on clam or whether you're running them in the cloud. And that's really central to how I see this space evolving is that we want to meet the customer where they are, rather than forcing them to change the way they work because off the platform that they're simple. >>So could you take them in to explain some of the integrations with AWS and some customer examples? Because, um, you know, first of all, cost is a big concern on everyone's mind because, you know, it's still lower costs and higher value with the cloud anyway. But it could get away from you. So you know, you're constantly petabytes of scale. There's a lot of data moving around. That's one thing to integration with higher level services. Can you give where does explain how Claudia integration with Amazon? What's the relation of customer wants to know. Hey, you guys, you know, partnering, explain the partnership. And what does it mean for me? >>Absolutely. So the way we look at the partnership hit that one person and ghetto. It's really a four layer cake because the lowest layer is the core infrastructure services. We talked about storage and computing on security, and I am so on and so forth. So that layer is a very robust integration that goes back a few years. The next layer up from that has to do with increasingly, you know, as our customers use analytic experiences from Florida on, they want to combine that with data that's actually in the AWS compute experiences like the red Ship, for example. That's what the analytics layer uploaded the data warehouse offering and how that interrupts would be other services in Amazon that could be relevant. This is common file formats that open source well form it really help us in this context to make sure that they have a very strong level of interest at the analytics there. The third layer up from that has to do with consumption. Like if you're gonna bring an analyst on board. You want to make sure that all of their sequel, like analyst experiences, notebooks, things of that nature that's really strong. And club out of the third layer on the highest layer is really around. Data sharing. That's as aws new and technologies like that become more prevalent. Now. Customers want to make sure that they can have these data states that they have in the different clouds, actually in a robbery. So we provide ways for them, toe browse and search data, regardless of whether that data is on AWS or on traffic. And so that's how the fourth layer in the stack, the vertical slice running through all of these, that we have a really strong business relationship with them both on the on the on the commercial market side as well as in AWS marketplace. Right? So we can actually by having cdp be a part of it of the US marketplace. This means that if you have an enterprise agreement with with Amazon, you can actually pay for CDP toe the credit sexuality purchased. This is a very, very tight relationship that's designed again for these large scale speeds and feeds. Can the customer >>so just to get this right. So if I love the four layer cake icings the success of CDP love that birthday candles can be on top to when you're successful. But you're saying that you're going to mark with Amazon two ways marketplace listing and then also jointly with their enterprise field programs. That right? You say because they have this program you can bundle into the blanket pos or Pio processes That right can explain that again. >>S so if you think this'll states, if you're talking about are significant. So we want to make sure that, you know, we're really aligned with them in terms off our cloud migration strategy in terms of how the customer actually execute to what is a fairly you know, it's a complex deployment to deploy a large multiple functions did and existed takes time, right, So we're gonna make sure that we navigate this together jointly with the U. S. To make sure that from a best practices standpoint, for example, were very well aligned from a cost standpoint, you know what we're telling the customer architecturally is very rather nine. That's that's where I think really the heart of the engineering relationship between the two companies without. >>So if you want Cloudera on Amazon, you just go in. You can click to buy. Or if you got to deal with Amazon in terms of global marketplace deal, which they have been rolling out, I could buy there too, Right? All right, well, run. Thanks for the update and insight. Um, love the four layer cake love gets. See the modernization of the data platform from Cloudera. And congratulations on all the hard work you guys been doing with AWS. >>Thank you so much. Appreciate. >>Okay, good to see you. Okay, I'm John for your hearing. The Cube for Cube virtual for eight of us. Reinvent 2020 virtual. Thanks for watching.

Published Date : Dec 8 2020

SUMMARY :

It's the Cube with digital coverage of AWS All that stuff here happening on the Cube Our next like good to be here. That's a big part of the theme, you know, machine learning ai ai edge you know, to make sure that they have the right cost effective footprint for their compute e so increasingly the data sets that you create on top off a platform you know, Clotaire has always had and Horton works. on how do I go do that on the data that has to come from my existing line of business systems. But the value was the new use cases of data where you never know. So we see that, you know, technologies like s three, So you know, you're constantly petabytes of scale. The next layer up from that has to do with increasingly, you know, as our customers use analytic So if I love the four layer cake icings the success of CDP love So we want to make sure that, you know, we're really aligned with them And congratulations on all the hard work you guys been Thank you so much. Okay, good to see you.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Ram Venkatesh	PERSON	0.99+
2018	DATE	0.99+
Dropbox	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
John	PERSON	0.99+
Florida	LOCATION	0.99+
Horton	PERSON	0.99+
Brown Venkatesh	PERSON	0.99+
Both companies	QUANTITY	0.99+
Lexus	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
two companies	QUANTITY	0.99+
eight	QUANTITY	0.99+
tens of minutes	QUANTITY	0.99+
one thing	QUANTITY	0.99+
hundreds of terabytes	QUANTITY	0.98+
this week	DATE	0.98+
three	QUANTITY	0.98+
third layer	QUANTITY	0.98+
aws	ORGANIZATION	0.98+
two ways	QUANTITY	0.98+
this year	DATE	0.98+
US	LOCATION	0.98+
Intel	ORGANIZATION	0.97+
over three weeks	QUANTITY	0.97+
10 years ago	DATE	0.97+
third piece	QUANTITY	0.97+
fourth layer	QUANTITY	0.97+
both	QUANTITY	0.97+
one piece	QUANTITY	0.96+
Clotaire	ORGANIZATION	0.96+
pandemic	EVENT	0.94+
third laye	QUANTITY	0.94+
second key piece	QUANTITY	0.93+
Cube virtual	COMMERCIAL_ITEM	0.92+
TCO	ORGANIZATION	0.91+
second order	QUANTITY	0.9+
four layer	QUANTITY	0.89+
U. S.	LOCATION	0.89+
six weeks	QUANTITY	0.89+
one	QUANTITY	0.88+
Zinn	ORGANIZATION	0.86+
few years ago	DATE	0.86+
last 78 years	DATE	0.85+
one person	QUANTITY	0.84+
terabyte	QUANTITY	0.83+
Cube for	COMMERCIAL_ITEM	0.83+
one function	QUANTITY	0.81+
Apache	ORGANIZATION	0.79+
Cube	COMMERCIAL_ITEM	0.79+
2020	TITLE	0.79+
one distinction	QUANTITY	0.77+
CDP	ORGANIZATION	0.74+
three core tenants	QUANTITY	0.72+
Claudia	PERSON	0.72+
turkey	OTHER	0.71+
reinvent 2020	EVENT	0.67+
S O.	PERSON	0.64+
nine	QUANTITY	0.63+
data	QUANTITY	0.6+
NATO	ORGANIZATION	0.59+
clam	ORGANIZATION	0.59+
VP	PERSON	0.53+

Tom Deane, Cloudera and Abhinav Joshi, Red Hat | KubeCon + CloudNativeCon NA 2020

from around the globe it's thecube with coverage of kubecon and cloudnativecon north america 2020 virtual brought to you by red hat the cloud native computing foundation and ecosystem partners hello and welcome back to the cube's coverage of kubecon plus cloud nativecon 2020 the virtual edition abinav joshi is here he's the senior product marketing manager for openshift at red hat and tom dean is the senior director of pro product management at cloudera gentlemen thanks for coming on thecube good to see you thank you very much for having us here hey guys i know you would be here it was great to have you and guys i know you're excited about the partnership and i definitely want to get in and talk about that but before we do i wonder if we could just set the tone you know what are you seeing in the market tom let's let's start with you i had a great deep dive a couple of weeks back with anupam singh and he brought me up to speed on what's new with cloudera but but one of the things we discussed was the accelerated importance of data putting data at the core of your digital business tom what are you seeing in the marketplace right now yeah absolutely so um overall we're still seeing a growing demand for uh storing and and processing massive massive amounts of data even in the past few months um where perhaps we see a little bit more variety is on by industry sector is on the propensity to adopt some of the latest and greatest uh technologies that are out there or that we we deliver to the market um so whether perhaps in the retail hospitality sector you may see a little bit more risk aversion around some of the latest tools then you you go to the healthcare industry as an example and you see we see a strong demand for our latest technologies uh with with everything that is that is going on um so overall um still a lot lots of demand around this space so abnormal i mean we just saw in ibm's earnings though the momentum of red hat you know growing in the mid teens and the explosion that we're seeing around containers and and obviously openshift is at the heart of that how the last nine months affected your customers priorities and what are you seeing yeah we've been a lot more busier like in the last few months because there's like a lot of use cases and if you look at the like a lot of the research and so on and we are seeing that from our customers as well that now the customers are actually speeding up the digital transformation right people say that okay kovac 19 has actually uh speeded up the digital transformation for a lot of our customers for the right reasons to be able to help the customers and so on so we are seeing a lot of attraction on like number of verticals and number of use cases beyond the traditional lab dev data analytics aiml messaging streaming edge and so on like lots of use cases in like a lot of different like industry verticals so there's a lot of momentum going on on openshift and the broader that portfolio as well yeah it's ironic the the timing of the pandemic but it sure underscores that this next 10 years is going to be a lot different than the last 10 years okay let's talk about some of the things that are new around data tom cloudera you guys have made a number of moves since acquiring hortonworks a little over two years ago what's new with uh with the cloudera data platform cdp sure so yes our latest therap uh platform is called cbp clara data platform last year we announced the public cloud version of cdp running on aws and then azure and what's new is just two months ago we announced the release of the version of this platform targeted at the data center and that's called cvp private cloud and really the focus of this platform this new version has been around solving some of the pain points that we see around agility or time to value and the ease of use of the platform and to give you some specific examples with our previous technology it could take a customer three months to provision a data warehouse if you include everything from obtaining the infrastructure to provisioning the warehouse loading the data setting security policies uh and fine-tuning the the software now with cbp private cloud we've been able to take those uh three months and turn it into three minutes so significant uh speed up in in that onboarding time and in time to valley and a key piece of this uh that enabled this this speed up was a revamping of the entire stack specifically the infrastructure and service services management layer and this is where the containerization of the platform comes in specifically kubernetes and red hat open shift that is a key piece of the puzzle that enables this uh order of magnitude uh improvement in time right uh now abner you think about uh red hat you think about cloudera of course hortonworks the stalwarts of of of open source you got kind of like birds of a feather how are red hat and cloudera partnering with each other you know what are the critical aspects of that relationship that people should be aware of yeah absolutely that's a very good question yeah so on the openshift side we've had a lot of momentum in the market and we have well over 2000 customers in terms of a lot of different verticals and the use cases that i talked about at the beginning of our conversation in terms of traditional and cloud native app dev databases data analytics like ai messaging and so on right and the value that you have with openshift and the containers kubernetes and devops like part of the solution being able to provide the agility flexibility scalability the cross cloud consistency like so all that that you see in a typical app dev world is directly applicable to fast track the data analytics and the ai projects as well and we've seen like a lot of customers and some of the ones that we can talk about in a public way like iix rbc bank hca healthcare boston children's bmw exxon mobil so all these organizations are being are able to leverage openshift to kind of speed up the ai projects and and help with the needs of the data engineers data scientists and uh and the app dev folks now from our perspective providing the best in class uh you say like experience for the customers at the platform level is key and we have to make sure that the tooling that the customers run on top of it uh gets the best in class the experience in terms of the day zero to day two uh management right and it's uh and and it's an ecosystem play for us and and and that's the way cloudera is the top isv in the space right when it comes to data analytics and ai and that was our key motivation to partner with cloudera in terms of bringing this joint solution to market and making sure that our customers are successful so the partnership is at all the different levels in the organization say both up and down as well as in the the engineering level the product management level the marketing level the sales level and at the support and services level as well so that way if you look at the customer journey in terms of selecting a solution uh putting it in place and then getting the value out of it so the partnership it actually spans across the entire spectrum yeah and tom you know i wonder if you could add anything there i mean it's not just about the public cloud with containers you're seeing obviously the acceleration of of cloud native principles on-prem in a hybrid you know across clouds it's sort of the linchpin containers really and kubernetes specifically linchpin to enable that what would you add to that discussion yeah as part of the partnership when we were looking for a vendor who could provide us that kubernetes layer we looked at our customer base and if you think about who clara is focused on we really go after that global the global 2000 firms out there these customers have very strict uh security requirements and they're often in these highly regulated uh industries and so when we looked at a customer's base uh we saw a lot of overlap and there was a natural good fit for us there but beyond that just our own technical evaluation of the solutions and also talking to uh to our own customers about who they do they see as a trusted platform that can provide enterprise grade uh features on on a kubernetes layer red hat had a clear leadership in in that front and that combined with our own uh long-standing relationship with our parent company ibm uh it made this partnership a natural good thing for us right and cloudera's always had a good relationship with ibm tom i want to stay with you if i can for a minute and talk about the specific joint solutions that you're providing with with red hat what are you guys bringing to customers in in terms of those solutions what's the business impact where's the value absolutely so the solution is called cbd or color data platform private cloud on red hat openshift and i'll describe three uh the three pillars that make up cbp uh first what we have is the five data analytic experiences and that is meant to cover the end to end data lifecycle in the first release we just came out two months ago we announced the availability of two of those five experiences we have data warehousing for bi analytics as well as machine learning and ai where we offer a collaborative data science data science tools for data scientists to come together do exploratory data analytics but also develop predictive models and push them to production going forward we'll be adding the remaining three uh experiences they include data engineering or transformations on uh on your data uh data flow for streaming analytics and ingest uh as well as operational database for uh real-time surveying of both structure and unstructured data so these five experiences have been re-banked right compared to our prior platform to target these specific use cases and simplify uh these data disciplines the second pillar that i'll talk about is the sdx or uh what what we call the shared data experience and what this is is the ability for these five experiences to have one global data set that they can all access with shared metadata security including fine grain permissions and a suite of governance tools that provide lineage provide auditing and business metadata so by having these shared data experiences our developers our users can build these multi-disciplinary workflows in a very straightforward way without having to create all this custom code and i can stitch you can stitch them together and the last pillar that i'll mention uh is the containerization of of the platform and because of containers because of kubernetes we're now able to offer that next level of agility isolation uh and infrastructure efficiency on the platform so give you a little bit more specific examples on the agility i mentioned going from three months to three minutes in terms of the speed up with i uh with uh containers we can now also give our users the ability to bring their own versions of their libraries and engines without colliding with another user who's sharing the platform that has been a big ask from our customers and last i'll mention infrastructure efficiency by re-architecting our services to running a microservices architecture we can now impact those servers in a much more efficient way we can also auto scale auto suspend bring all this as you mentioned bring all these cloud native concepts on premises and the end result of that is better infrastructure efficiency now our customers can do more with the same amount of hard work which overall uh reduces their their total spend on the solution so that's what we call cbp private cloud great thanks for that i mean wow we've seen really the evolution from the the wild west days of you know the early days of so-called big data ungoverned a lot of shadow data science uh maybe maybe not as efficient as as we'd like and but certainly today taking advantage of some of those capabilities dealing with the noisy neighbor problem enough i wonder if you could comment another question that i have is you know one of the things that jim whitehurst talked about when ibm acquired red hat was the scale that ibm could bring and what i always looked at in that context was ibm's deep expertise in vertical industries so i wonder what are some of the key industry verticals that you guys are targeting and succeeding in i mean yes there's the pandemic has some effects we talked about hospitality obviously airlines have to have to be careful and conserving cash but what are some of the interesting uh tailwinds that you're seeing by industry and some of the the more interesting and popular use cases yeah that's a very good question now in terms of the industry vertical so we are seeing the traction in like a number of verticals right and the top ones being the financial services like healthcare telco the automotive industry as well as the federal government are some of the key ones right and at the end of the day what what all the customers are looking at doing is be able to improve the experience of their customers with the digital services that they roll out right as part of the pandemic and so on as well and then being able to gain competitive edge right if you can have the services in your platform and make them kind of fresh and relevant and be able to update them on a regular basis that's kind of that's your differentiator these days right and then the next one is yeah if you do all this so you should be able to increase your revenue be able to save cost as well that's kind of a key one that you mentioned right that that a lot of the industries like the hospitality the airlines and so on are kind of working on saving cash right so if you can help them save the cost that's kind of key and then the last one is is being able to automate the business processes right because there's not like a lot of the manual processes so yeah if you can add in like a lot of automation that's all uh good for your business and then now if you look at the individual use cases in these different industry verticals what we're seeing that the use cases cannot vary from the industry to industry like if you look at the financial services the use cases like fraud detection being able to do the risk analysis and compliance being able to improve the customer support and so on are some of the key use cases the cyber security is coming up a lot as well because uh yeah nobody wants to be hacked and so and and so on yeah especially like in these times right and then moving on to healthcare and the life sciences right what we're seeing the use cases on being able to do the data-driven diagnostics and care and being able to do the discovery of drugs being able to say track kobit 19 and be able to tell that okay uh which of my like hospital is going to be full when and what kind of ppe am i going to need at my uh the the sites and so on so that way i can yeah and mobilize like as needed are some of the key ones that we are seeing on the healthcare side uh and then in terms of the automotive industry right that's where being able to speed up the autonomous driving initiatives uh being able to do uh the auto warranty pricing based on the history of the drivers and so on and then being able to save on the insurance cost is a big one that we are seeing as well for the insurance industries and then but more like manufacturing right being able to do the quality assurance uh at the shop floor being able to do the predictive maintenance on machinery and also be able to do the robotics process automation so like lots of use cases that customers are prioritizing but it's very verticalized it kind of varies from the vertical to a vertical but at the end of the day yeah it's all about like improving the customer experience the revenue saving cost and and being able to automate the business processes yeah that's great thank you for that i mean we we heard a lot about automation we were covering ansible fest i mean just think about fraud how much you know fraud detection has changed in the last 10 years it used to be you know so slow you'd have to go go through your financial statements to find fraud and now it's instantaneous cyber security is critical because the adversaries are very capable healthcare is a space where you know it's ripe for change and now of course with the pandemic things are changing very rapidly automotive another one an industry that really hasn't hadn't seen much disruption and now you're seeing with a number of things autonomous vehicles and you know basically software on wheels and insurance great example even manufacturing you're seeing you know a real sea change there so thank you for that description you know very often in the cube we like to look at joint engineering solutions that's a gauge of the substance of a partnership you know sometimes you see these barney deals you know there's a press release i love you you love me okay see you but but so i wonder if you guys could talk about specific engineering that you're doing tom maybe you could start sure yeah so on the on the engineering and product side um we've um for cbp private cloud we've we've changed our uh internal development and testing to run all on uh openshift uh internally uh and as part of that we we have a direct line to red hat engineering to help us solve any issues that that uh we run into so in the initial release we start with support of openshift43 we're just wrapping up uh testing of and we'll begin with openshift46 very soon on another aspect of their partnership is on being able to update our images to account for any security vulnerabilities that are coming up so with the guidance and help from red hat we've been we've standardized our docker images on ubi or the universal based image and that allows us to automatically get many of these security fixes uh into our into our software um the last point that i mentioned here is that it's not just about providing kubernetes uh red hat helps us with the end to end uh solution so there is also the for example bringing a docker registry into the picture or providing a secure vault for storing uh all the secrets so all these uh all these pieces combined make up the uh a strong complete solution actually the last thing i'll mention is is a support aspect which is critical to our customers in this model our customers can bring support tickets to cluberra but as soon as we determine that it may be an issue that uh related to red hat or openshift where we can use their help we have that direct line of communication uh and automated systems in the back end to resolve those support tickets uh quickly for our customers so those are some of the examples of what we're doing on the technical side great thank you uh enough we're out of time but i wonder if we could just close here i mean when we look at our survey data with our data partner etr we see containers container orchestration container management generally and again kubernetes specifically is the the number one area of investment for companies that has the most momentum in terms of where they're putting their efforts it's it's it's right up there and even ahead of ai and machine learning and even ahead of cloud which is obviously larger maybe more mature but i wonder if you can add anything and bring us home with this segment yeah absolutely and i think uh so uh one thing i want to add is like in terms of the engineering level right we also have like between cloudera and red hat the partnership and the sales and the go to market levels as well because once you build the uh the integration it yeah it has to be built out in the customer environments as well right so that's where we have the alignment um at the marketing level as well as the sales level so that way we can like jointly go in and do the customer workshops and make sure the solutions are getting deployed the right way right uh and also we have a partnership at the professional services level as well right where um the experts from both the orgs are kind of hand in hand to help the customers right and then at the end of the day if you need help with support and that's what tom talked about that we have the experts on the support side as well yeah and then so to wrap things up right uh so all the industry research and the customer conversation that we are having are kind of indicating that the organizations are actually increasing the focus on digital uh transformation with the data and ai being a key part of it and that's where this strategic partnership between cloudera and and red hat is going to play a big role to help our mutual customers uh through that our transition and be able to achieve the key goals that they set for their business great well guys thanks so much for taking us through the partnership and the integration work that you guys are doing with customers a great discussion really appreciate your time yeah thanks a lot dave really appreciate it really enjoyed the conversation all right keep it right there everybody you're watching thecube's coverage of cubecon plus cloud nativecon north america the virtual edition keep it right there we'll be right back

Published Date : Nov 19 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
two	QUANTITY	0.99+
five experiences	QUANTITY	0.99+
three months	QUANTITY	0.99+
three minutes	QUANTITY	0.99+
Abhinav Joshi	PERSON	0.99+
last year	DATE	0.99+
ibm	ORGANIZATION	0.99+
KubeCon	EVENT	0.99+
Red Hat	ORGANIZATION	0.99+
cloudera	ORGANIZATION	0.99+
first release	QUANTITY	0.98+
second pillar	QUANTITY	0.98+
two months ago	DATE	0.98+
red hat	ORGANIZATION	0.98+
openshift46	TITLE	0.98+
jim whitehurst	PERSON	0.98+
tom	PERSON	0.98+
telco	ORGANIZATION	0.98+
pandemic	EVENT	0.98+
north america	LOCATION	0.98+
Cloudera	ORGANIZATION	0.97+
both	QUANTITY	0.97+
abinav joshi	PERSON	0.97+
one	QUANTITY	0.97+
today	DATE	0.97+
CloudNativeCon	EVENT	0.96+
a minute	QUANTITY	0.95+
tom dean	PERSON	0.95+
openshift	TITLE	0.95+
five data	QUANTITY	0.95+
kubecon	ORGANIZATION	0.95+
hortonworks	ORGANIZATION	0.94+
anupam singh	PERSON	0.94+
dave	PERSON	0.92+
last few months	DATE	0.9+
Tom Deane	PERSON	0.9+
over two years ago	DATE	0.9+
first	QUANTITY	0.9+
kobit 19	OTHER	0.89+
rbc bank	ORGANIZATION	0.88+
last 10 years	DATE	0.88+
north america	LOCATION	0.88+
openshift43	TITLE	0.87+
hca	ORGANIZATION	0.87+
three	QUANTITY	0.87+
over 2000 customers	QUANTITY	0.86+
2020	DATE	0.86+
next 10 years	DATE	0.85+
kovac 19	ORGANIZATION	0.82+
one thing	QUANTITY	0.81+
past few months	DATE	0.81+
three pillars	QUANTITY	0.8+
last nine months	DATE	0.78+
federal government	ORGANIZATION	0.76+
one global	QUANTITY	0.76+
hat	TITLE	0.75+
both structure	QUANTITY	0.74+
NA 2020	EVENT	0.72+
cloudnativecon	ORGANIZATION	0.7+
cubecon	ORGANIZATION	0.69+
cloud	ORGANIZATION	0.69+
three uh experiences	QUANTITY	0.68+
lot	QUANTITY	0.68+
day two	QUANTITY	0.67+
exxon mobil	ORGANIZATION	0.67+
a couple of weeks back	DATE	0.67+
iix	ORGANIZATION	0.66+
kubernetes	ORGANIZATION	0.66+
of a feather	TITLE	0.64+
2000 firms	QUANTITY	0.63+
lot of use cases	QUANTITY	0.61+

Anupam Singh, Cloudera & Manish Dasaur, Accenture

>> Well, thank you, Gary. Well, you know, reasonable people could debate when the so-called big data era started. But for me it was in the fall of 2010 when I was sleepwalking through this conference in Dallas. And the conference was focused on data being a liability. And the whole conversation was about, how do you mitigate the risks of things like work in process and smoking-gun emails. I got a call from my business partner, John Fard, he said to me, "get to New York and come and see the future of data. We're doing theCUBE at Hadoop World tomorrow." I subsequently I canceled about a dozen meetings that I had scheduled for the week. And with only one exception, every one of the folks I was scheduled to meet said, "what's a Hadoop?" Well, I flew through an ice storm across country. I got to the New York Hilton around 3:00 AM, and I met John in the Dark Bar. If any of you remember that little facility. And I caught a little shut eye. And then the next day I met some of the most interesting people in tech during that time. They were thinking a lot differently than we were used to. They looked at data through a prism of value. And they were finding new ways to do things like deal with fraud, they were building out social networks, they were finding novel marketing vectors and identifying new investment strategies. The other thing they were doing is, they were taking these little tiny bits of code and bring it to really large sets of data. And they were doing things that I hadn't really heard of like no schema-on-write. And they were transforming their organizations by looking at data not as a liability, but as a monetization opportunity. And that opened my eyes and theCUBE, like a lot of others bet its business on data. Now over the past decade, customers have built up infrastructure and have been accommodating a lot of different use cases. Things like offloading ETL, data protection, mining data, analyzing data, visualizing. And as you know, you no doubt realize this was at a time when the cloud was, you know, really kind of nascent. And it was really about startups and experimentation. But today, we've evolved from the wild west of 2010, and many of these customers they're leveraging the cloud for of course, ease of use and flexibility it brings, but also they're finding out it brings complexity and risk. I want to tell you a quick story. Recently it was interviewing a CIO in theCUBE and he said to me, "if you just shove all your workloads into the cloud, you might get some benefit, but you're also going to miss the forest to the trees. You have to change your operating model and expand your mind as to what is cloud and create a cloud light experience that spans your on premises, workloads, multiple public clouds, and even the edge. And you have to re-imagine your business and the possibilities that this new architecture this new platform can bring." So we're going to talk about some of this today in a little bit more detail and specifically how we can better navigate the data storm. And what's the role of hybrid cloud. I'm really excited to have two great guests. Manish Dasaur is the managing director in the North America lead for analytics and artificial intelligence at Accenture. Anupam Singh is the chief customer officer for Cloudera. Gentlemen, welcome to theCUBE, great to see you. >> Hi Dave good to see you again. >> All right, guys, Anupam and Manish, you heard my little monologue upfront Anupam we'll start with you. What would you? Anything you'd add, amend, emphasize? You know, share a quick story. >> Yeah, Dave thank you for that introduction. It takes me back to the days when I was an article employee and went to this 14 people meet up. Just a couple of pizza talking about this thing called Hadoop. And I'm just amazed to see that today we are now at 2000 customers, who are using petabytes of data to make extremely critical decisions. Reminds me of the fact that this week, a lot of our customers are busy thinking about elections and what effect it would have on their data pipeline. Will it be more data? Will it be more stressful? So, totally agree with you. And also agree that cloud, is almost still in early days in times of the culture of IT on how to use the cloud. And I'm sure we'll talk about that today in greater detail. >> Yeah most definitely Manish I wonder if we could get your perspective on this. I mean, back when Anupam was at Oracle you'd shove a bunch of, you know, data, maybe you could attach a big honking disc drive, you'd buy some Oracle licenses, you know, it was a Unix box. Everything went into this, you know, this God box and then things changed quite dramatically, which was awesome, but also complex. And you guys have been there from the beginning. What's your perspective on all this? >> Yeah, it's been fascinating just to watch the market and the technology evolve. And I think the urgency to innovate is really just getting started. We're seeing companies drive growth from 20% in cloud today, to 80% cloud in the next few years. And I think the emergence of capabilities like hybrid cloud, we really get upfront a lot of flexibility for companies who need the ability to keep some data in a private setting, but be able to share the rest of the data in a public setting. I think we're just starting to scratch the surface of it. >> So let's talk a little bit about what is a hybrid cloud Anupam I wonder if you could take this one let's start with you and then Manish we come back to you and to get the customer perspective as well. I mean, it is a lot of things to a lot of people, but what is it? Why do we need it? And you know, what's the value? >> Yeah, I could speak poetic about Kubernetes and containers et cetera. But given that, you know, we talk to customers a lot, all three of us from the customer's perspective, hybrid cloud is a lot about collaboration and ease of procurement. A lot of our customers, whether they're in healthcare, banking or telco, are being asked to make the data available to regulatory authority, to subsidiaries outside of their geography. When you need that data to be available in other settings, taking a from on-prem and making it available in public cloud, enables extreme collaboration, extreme shared data experience if you will, inside the company. So we think about hybrid like that. >> Manish anything you'd add? How are your customers thinking about it? >> I mean, in a very simple way, it's a structure that where we are allowing mixed computing storage and service environments that's made of on-prem structures, private cloud structures, and public cloud structures. We're often calling it multicloud or mixcloud. And I think the really big advantage is, this model of cloud computing is enabling our clients to gain the benefits of public cloud setting, while maintaining your own private cloud for sensitive and mission critical and highly regulated computing services. That's also allowing our clients and organizations to leverage the pay-as-you-go model, which is really quite impressive and attractive to them because then they can scale their investments accordingly. Clients can combine one or more public cloud providers together in a private cloud, multicloud platform. The cloud can operate independently of each other, communicate over an encrypted connection. This dynamic solution offers a lot of flexibility and scalability which I think is really important to our clients. >> So Manish I wonder if we would stay there. How do they, how do your customers decide? How do you help them decide, you know, what the right mix is? What the equilibrium is? How much should it be in on-prem? How much should be in public or across clouds? Or, you know, eventually, well the edge will I guess decide for us. But, how do you go through, what are the decision points there? >> Yeah, I think that's a great question Dave. I would say there's several factors to consider when developing a cloud strategy that's the right strategy for you. Some of the factors that come to my mind when contemplating it, one would be security. Are there data sets that are highly sensitive that you don't want leaving the premise, versus data sets that need to be in a more shareable solution. Another factor I'd consider is speed and flexibility. Is there a need to stand up and stand down capabilities based on the seasonality of the business or some short-term demands? Is there a need to add and remove scale from the infrastructure and that quick pivot and that quick reaction is another factor they should consider. The third one I'd probably put out there is cost. Large data sets and large computing capacities often much more scalable and cost effective than a cloud infrastructure so there's lots of advantages to think through there. And maybe lastly I'd share is the native services. A lot of the cloud providers enable a set of native services for ingestion, for processing, of modeling, for machine learning, that organizations can really take advantage of. I would say if you're contemplating your strategy right now, my coaching would be, get help. It's a team sport. So definitely leverage your partners and think through the pros and cons of the strategy. Establish a primary hyperscaler, I think that's going to be super key and maximize your value through optimizing the workload, the data placement and really scaling the running operations. And lastly, maybe Dave move quickly right? Each day that you wait, you're incurring technical debt in your legacy environment, that's going to increase the cost and barrier to entry when moving to the new cloud hybrid driver. >> Thank you for that. Anupam I wonder if we could talk a little bit about the business impact. I mean, in the early days of big data, yes, it was a heavy lift, but it was really transformative. When you go to hybrid cloud, is it really about governance and compliance and security and getting the right mix in terms of latency? Are there other, you know, business impacts that are potentially as transformative as we saw in the early days? What are your thoughts on that? >> Absolutely. It's the other business impacts that are interesting. And you know, Dave, let's say you're in the line of business and I come to you and say, oh, there's cost, there's all these other security governance benefits. It doesn't ring the bell for you. But if I say, Dave used to wait 32 weeks, 32 weeks to procure hardware and install software, but I can give you the same thing in 30 minutes. It's literally that transformative, right? Even on-prem, if I use cloud native technology, I can give something today within days versus weeks. So we have banks, we have a bank in Ohio that would take 32 weeks to rack up a 42 node server. Yes, it's very powerful, you have 42 nodes on it, 42 things stacked on it, but still it's taking too much time. So when you get cloud native technologies in your data center, you start behaving like the cloud and you're responsive to the business. The responsiveness is very important. >> That's a great point. I mean, in fact, you know, there's always this debate about is the cloud public cloud probably cost more expensive? Is it more expensive to rent than it is to own? And you get debates back and forth based on your perspective. But I think at the end of the day, what, Anupam you just talked about, it may oftentimes could dwarf, you know, any cost factors, if you can actually, you know, move that fast. Now cost is always a consideration. But I want to talk about the migration path if we can Manish. Where do, how should customers think about migrating to the cloud migration's a, an evil word. How should they think about migrating to the cloud? What's the strategy there? Where should they start? >> No I think you should start with kind of a use case in mind. I think you should start with a particular data set in mind as well. I think starting with what you're really seeking to achieve from a business value perspective is always the right lens in my mind. This is the decade of time technology and cloud to the fitness value, right? So if you start with, I'm seeking to make a dramatic upsell or dramatic change to my top line or bottom line, start with the use case in mind and migrate the data sets and elements that are relevant to that use case, relevant to that value, relevant to that unlock that you're trying to create, that I think is the way to prioritize it. Most of our clients are going to have tons and tons of data in their legacy environment. I don't think the right way to start is to start with a strategy that's going to be focused on migrating all of that. I think the strategy is start with the prioritized items that are tied to the specific value or the use case you're seeking to drive and focus your transformation and your migration on that. >> So guys I've been around a long time in this business and been an observer for awhile. And back in the mainframe days, we used to have a joke called CTAM. When we talk about moving data, it was called the Chevy truck access method. So I want to ask you Anupam, how do you move the data? Do you, it's like an Einstein saying, right? Move as much data as you need to, but no more. So what's going on in that front? what's happening with data movement, and, you know, do you have to make changes to the applications when you move data to the cloud? >> So there's two design patterns, but I love your service story because you know, when you have a 30 petabyte system and you tell the customer, hey, just make a copy of the data and everything will be fine. That will take you one and a half years to make the copies aligned with each other. Instead, what we are seeing is the biggest magic is workload analysis. You analyze the workload, you analyze the behavior of the users, and say so let's say Dave runs dashboards that are very complicated and Manish waits for compute when Dave is running his dashboard. If you're able to gather that information, you can actually take some of the noise out of the system. So you take selected sets of hot data, and you move it to public cloud, process it in public cloud maybe even bring it back. Sounds like science fiction, but the good news is you don't need a Chevy to take all that data into public cloud. It's a small amount of data. That's one reason the other pattern that we have seen is, let's say Manish needs something as a data scientist. And he needs some really specific type of GPUs that are only available in the cloud. So you pull the data sets out that Manish needs so that he can get the new silicone, the new library in the cloud. Those are the two patterns that if you have a new type of compute requirement, you go to public cloud, or if you have a really noisy tenant, you take the hot data out into public cloud and process it there. Does that make sense? >> Yeah it does and it sort of sets up this notion I was sort of describing upfront that the cloud is not just, you know, the public cloud, it's the spans on-prem and multicloud and even the edge. And it seems to me that you've got a metadata opportunity I'll call it and a challenge as well. I mean, there's got to be a lot of R and D going on right now. You hear people talking about cloud native and I wonder on Anupam if you could stay on that, I mean, what's your sense as to how, what the journey is going to look like? I mean, we're not going to get there overnight. People have laid out a vision of this sort of expanding cloud and I'm saying it's a metadata opportunity, but I, you know, how do you, the system has to know what workload to put where based on a lot of those factors that you guys were talking about. The governance, the laws of the land, the latency issues, the cost issues is, you know, how is the industry Anupam sort of approaching this problem and solving this problem? >> I think the biggest thing is to reflect all your security governance across every cloud, as well as on-prem. So let's say, you know, a particular user named Manish cannot access financial data, revenue data. It's important that that data as it goes around the cloud, if it gets copied from on-prem to the cloud, it should carry that quality with it. A big danger is you copy it into some optic storage, and you're absolutely right Dave metadata is the goal there. If you copy the data into an object storage and you lose all metadata, you lose all security, you lose all authorization. So we have invested heavily in something called shared data experience. Which reflects policies from on-prem all the way to the cloud and back. We've seen customers needing to invest in that, but some customers went all hog on the cloud and they realize that putting data just in these buckets of optic storage, you lose all the metadata, and then you're exposing yourself to some breach and security issues. >> Manish I wonder if we could talk about, thank you for that Anupam. Manish I wonder if we could talk about, you know, I've imagined a project, okay? Wherever I am in my journey, maybe you can pick your sort of sweet spot in the market today. You know, what's the fat middle if you will. What does a project look like when I'm migrating to the cloud? I mean, what are some of the, who are the stakeholders? What are some of the outer scope maybe expectations that I better be thinking about? What kind of timeframe? How should I tackle this and so it's not like a, you know, a big, giant expensive? Can I take it in pieces? What's the state-of-the-art of a project look like today? >> Yeah, lots of thoughts come to mind, Dave, when you ask that question. So there's lots to pack. As far as who the buyer is or what the project is for, this is out of migration is directly relevant to every officer in the C-suite in my mind. It's very relevant for the CIO and CTO obviously it's going to be their infrastructure of the future, and certainly something that they're going to need to migrate to. It's very important for the CFO as well. These things require a significant migration and a significant investment from enterprises, different kind of position there. And it's very relevant all the way up to the CEO. Because if you get it right, the truly the power it unlocks is illuminates parts of your business that allow you to capture more value, capture a higher share of wallet, allows you to pivot. A lot of our clients right now are making a pivot from going from a products organization to an as a service organization and really using the capabilities of the cloud to make that pivot happen. So it's really relevant kind of across the C-suite. As far as what a typical program looks like, I always coach my clients just like I said, to start with the value case in mind. So typically, what I'll ask them to do is kind of prioritize their top three or five use cases that they really want to drive, and then we'll land a project team that will help them make that migration and really scale out data and analytics on the cloud that are focused on those use cases. >> Great, thank you for that. I'm glad you mentioned the shift in the mindset from product to as a service. We're seeing that across the board now, even infrastructure players are jumping on the bandwagon and borrowing some sort of best practices from the SaaS vendors. And I wanted to ask you guys about, I mean, as you move to the cloud, one of the other things that strikes me is that you actually get greater scale, but there's a broader ecosystem as well. So we're kind of moving from a product centric world and with SaaS we've got this sort of platform centric, and now it seems like ecosystems are really where the innovation is coming from. I wonder if you guys could comment on that, maybe Anupam you could start. >> Yeah, many of our customers as I said right? Are all about sharing data with more and more lines of businesses. So whenever we talk to our CXO partners, our CRO partners, they are being asked to open up the big data system to more tenants. The fear is, of course, if you add more tenants to a system, it could get, you know, the operational actually might get violated. So I think that's a very important part as more and more collaboration across the company, more and more collaboration across industries. So we have customers who create sandboxes. These are healthcare customers who create sandbox environments for other pharma companies to come in and look at clinical trial data. In that case, you need to be able to create these fenced environments that can be run in public cloud, but with the same security that you expect up. >> Yeah thank you. So Manish this is your wheelhouse as Accenture. You guys are one of the top, you know, two or three or four organizations in the world in terms of dealing with complexity, you've got deep industry expertise, and it seems like some of these ecosystems as Anupam was just sort of describing it in a form are around industries, whether it's healthcare, government, financial services and the like. Maybe your thoughts on the power of ecosystems versus the, you know, the power of many versus the resources of one. >> Yeah, listen, I always talk about this is a team sport right? And it's not about doing it alone. It's about developing as ecosystem partners and really leveraging the power of that collective group. And I've been for as my clients to start thinking about, you know, the key thing you want to think about is how you migrate to becoming a data driven enterprise. And in order for you to get there, you're going to need ecosystem partners to go along the journey with you, to help you drive that innovation. You're going to need to adopt a pervasive mindset to data and democratization of that data everywhere in your enterprise. And you're going to need to refocus your decision-making based on that data, right? So I think partner ecosystem partnerships are here to stay. I think what we're going to see Dave is, you know, at the beginning of the maturity cycle, you're going to see the ecosystem expand with lots of different players and technologies kind of focused on industry. And then I think you'll get to a point where it starts to mature and starts to consolidate as ecosystem partners start to join together through acquisitions and mergers and things like that. So I think ecosystem is just starting to change. I think the key message that I would give to our clients is take advantage of that. There's too much complexity for any one person to kind of navigate through on your own. It's a team sport, so take advantage of all the partnerships you can create. >> Well, Manish one of the things you just said that it kind of reminds me, you said data data-driven, you know, organizations and, you know, if you look at the pre-COVID narrative around digital transformation, certainly there was a lot of digital transformation going on, but there was a lot of complacency too. I talked to a lot of folks, companies that say, "you know, we're doing pretty well, our banks kicking butt right now, we're making a ton of money." Or you know, all that stuff that's kind of not on my watch. I'll be retired before then. And then it was the old, "if it ain't broke, don't fix it." And then COVID breaks everything. And now if you're not digital, you're out of business. And so Anupam I'll start with you. I mean, to build a data-driven culture, what does that mean? That means putting data at the center of your organization, as opposed to around in stove pipes. And this, again, we talked about this, it sort of started in there before even the early parts of last decade. And so it seems that there's cultural aspects there's obviously technology, but there's skillsets, there's processes, you've got a data lifecycle and a data, what I sometimes call a data pipeline, meaning an end to end cycle. And organizations are having to rethink really putting data at the core. What are you seeing? And specifically as it relates to this notion of data-driven organization and data culture, what's working? >> Yeah three favorite stories, and you're a 100% right. Digital transformation has been hyperaccelerated with COVID right? So our telco customers for example, you know, Manish had some technical problems with bandwidth just 10 minutes ago. Most likely is going to call his ISP. The ISP will most likely load up a dashboard in his zip code and the reason it gives me stress, this entire story is because most likely it's starting on a big data system that has to collect data every 15 minutes, and make it available. Because you'll have a very angry Manish on the other end, if you can't explain when is the internet coming back, right? So, as you said this is accelerated. Our telco providers, our telco customers ability to ingest data, because they have to get it in 15 minute increments, not in 24 hour increments. So that's one. On the banking sector what we have seen is uncertainty has created more needs for data. So next week is going to be very uncertain all of us know elections are upcoming. We have customers who are preparing for that additional variable capacity, elastic capacity, so that if investment bankers start running hundreds and thousands of reports, they better be ready. So it's changing the culture at a very fundamental level, right? And my last story is healthcare. You're running clinical trials, but everybody wants access to the data. Your partners, the government wants access to the data, manufacturers wants access to the data. So again, you have to actualize digital transformation on how do you share very sensitive, private healthcare data without violating any policy. But you have to do it quick. That's what COVID has started. >> Thank you for that. So I want to come back to hybrid cloud. I know a lot of people in the audience are, want to learn more about that. And they have a mandate really to go to cloud generally but hybrid specifically. So Manish I wonder if you could share with us, maybe there's some challenges, I mean what's the dark side of hybrid. What should people be thinking about that they, you know, they don't want to venture into, you know, this way, they want to go that way. What are some of the challenges that you're seeing with customers? And how are they mitigating them? >> Yeah, Dave it's a great question. I think there's a few items that I would coach my clients to prioritize and really get right when thinking about making the migration happen. First of all, I would say integration. Between your private and public components that can be complex, it can be challenging. It can be complicated based on the data itself, the organizational structure of the company, the number of touches and authors we have on that data and several other factors. So I think it's really important to get this integration right, with some clear accountabilities build automation where you can and really establish some consistent governance that allows you to maintain these assets. The second one I would say is security. When it comes to hybrid cloud management, any transfers of data you need to handle the strict policies and procedures, especially in industries where that's really relevant like healthcare and financial services. So using these policies in a way that's consistent across your environment and really well understood with anyone who's touching your environment is really important. And the third I would say is cost management. All the executives that I talk about have to have a cost management angle to it. Cloud migration provides ample opportunities for cost reduction. However many migration projects can go over budget when all the costs aren't factored in, right? So your cloud vendors. You've got to be mindful of kind of the charges on accessing on premise applications and scaling costs that maybe need to be budgeted for and where if possible anticipated and really plan for. >> Excellent. So Anupam I wonder if we could go a little deeper on, we talked a little bit about this, but kind of what you put where, which workloads. What are you seeing? I mean, how are people making the choice? Are they saying, okay, this cloud is good for analytics. This cloud is good. Well, I'm a customer of their software so I'm going to use this cloud or this one is the best infrastructure and they got, you know, the most features. How are people deciding really what to put where? Or is it, "hey, I don't want to be locked in to one cloud. I want to spread my risk around. What are you seeing specifically? >> I think the biggest thing is just to echo what Manish said. Is business comes in and as a complaint. So most projects that we see on digital transformation and on public cloud adoption is because businesses complaining about something. It's not architectural goodness, it is not for just innovation for innovation's sake. So, the biggest thing that we see is what we call noisy neighbors. A lot of dashboards, you know, because business has become so intense, click, click, click, click, you're actually putting a lot of load on the system. So isolating noisy neighbors into a cloud is one of the biggest patterns that you've seen. It takes the noisiest tenant on your cluster, noisiest workload and you take them to public cloud. The other one is data scientists. They want new libraries, they want to work with GPU's. And to your point Dave, that's where you select a particular cloud. Let's say there's a particular type silicone that is available only in that cloud. That GPU is available only in that cloud or that particular artificial intelligence library is available only in a particular cloud. That's when customers say, Hey miss, they decided, why don't you go to this cloud while the main workload might still be running on them, right? That's the two patterns that we are seeing. >> Right thank you. And I wonder if we can end on a little bit of looking to the future. Maybe how this is all going to evolve over the next several years. I mean, I like to look at it at a spectrum at a journey. It's not going to all come at once. I do think the edge is part of that. But it feels like today we've got, you know, multi clouds are loosely coupled and hybrid is also loosely coupled, but we're moving very quickly to a much more integrated, I think we Manish you talked about integration. Where you've got state, you've got the control plane, you've got the data plane. And all this stuff is really becoming native to the respective clouds and even bring that on-prem and you've got now hybrid applications and much much tighter integration and build this, build out of this massively distributed, maybe going from it's a hyper-converged to hyper-distributed again including the edge. So I wonder Manish we could start with you. How are your customers thinking about the future? How are they thinking about, you know, making sure that they're not going down a path where that's going to, they're going to incur a lot of technical debt? I know there's sort of infrastructure is code and containers and that seems it seems necessary, but insufficient there's a lot of talk about, well maybe we start with a functions based or a serverless architecture. There's some bets that have to be made to make sure that you can future proof yourself. What are you recommending there Manish? >> Yeah, I, listen I think we're just getting started in this journey. And like I said, it's really exciting time and I think there's a lot of evolution in front of us that we're going to see. I, you know, I think for example, I think we're going to see hybrid technologies evolve from public and private thinking to dedicated and shared thinking instead. And I think we're going to see advances in capabilities around automation and computer federation and evolution of consumption models of that data. But I think we've got a lot of kind of technology modifications and enhancements ahead of us. As far as companies and how they future proof themselves. I would offer the following. First of all, I think it's a time for action, right? So I would encourage all my class to take action now. Every day spent in legacy adds to the technical debt that you're going to incur, and it increases your barrier to entry. The second one would be move with agility and flexibility. That's the underlying value of hybrid cloud structures. So organizations really need to learn how to operate in that way and take advantage of that agility and that flexibility. We've talked about creating partnerships in ecosystems I think that's going to be really important. Gathering partners and thought leaders to help you navigate through that complexity. And lastly I would say monetizing your data. Making a value led approach to how you viewed your data assets and force a function where each decision in your enterprise is tied to the value that it creates and is backed by the data that supports it. And I think if you get those things right, the technology and the infrastructure will serve. >> Excellent and Anupam why don't you bring us home, I mean you've got a unique combination of technical acumen and business knowledge. How do you see this evolving over the next three to five years? >> Oh, thank you Dave. So technically speaking, adoption of containers is going to steadily make sure that you're not aware even of what cloud you're running on that day. So the multicloud will not be a requirement even, it will just be obviated when you have that abstraction there. Contrarily, it's going to be a bigger challenge. I would echo what Manish said start today, especially on the cultural side. It is great that you don't have to procure hardware anymore, but that also means that many of us don't know what our cloud bill is going to be next month. It is a very scary feeling for your CIO and your CFO that you don't know how much you're going to to spend next month forget next year, right? So you have to be agile in your financial planning as much you have to be agile in your technical planning. And finally I think you hit on it. Ecosystems are what makes data great. And so you have to start from day one that if I am going on this cloud solution, is the data shareable? Am I able to create an ecosystem around that data? Because without that, it's just somebody running a report may or may not have value to the business. >> That's awesome, guys. Thanks so much for a great conversation. We're at a time and I want to wish everybody a terrific event. Let me now hand it back to Vanita. She's going to take you through the rest of the day. This is Dave Vellante for theCUBE, thanks. (smooth calm music)

Published Date : Oct 30 2020

SUMMARY :

And you have to re-imagine your business you heard my little monologue upfront And I'm just amazed to see that today And you guys have been and the technology evolve. and to get the customer But given that, you know, and attractive to them Or, you know, eventually, Some of the factors that come to my mind and getting the right and I come to you and I mean, in fact, you know, and cloud to the fitness value, right? So I want to ask you Anupam, and you move it to public cloud, the cost issues is, you know, and you lose all metadata, and so it's not like a, you that allow you to capture more value, I wonder if you guys In that case, you need to You guys are one of the top, you know, to see Dave is, you know, the things you just said So again, you have to actualize about that they, you know, that allows you to maintain these assets. and they got, you know, the most features. A lot of dashboards, you know, to make sure that you can to how you viewed your data assets over the next three to five years? It is great that you don't have She's going to take you

ENTITIES

Entity	Category	Confidence
Gary	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John Fard	PERSON	0.99+
Manish Dasaur	PERSON	0.99+
Dallas	LOCATION	0.99+
Anupam Singh	PERSON	0.99+
Ohio	LOCATION	0.99+
100%	QUANTITY	0.99+
Vanita	PERSON	0.99+
32 weeks	QUANTITY	0.99+
20%	QUANTITY	0.99+
Anupam	ORGANIZATION	0.99+
24 hour	QUANTITY	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
15 minute	QUANTITY	0.99+
80%	QUANTITY	0.99+
telco	ORGANIZATION	0.99+
New York	LOCATION	0.99+
14 people	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Einstein	PERSON	0.99+
John	PERSON	0.99+
next year	DATE	0.99+
Oracle	ORGANIZATION	0.99+
30 minutes	QUANTITY	0.99+
Manish	PERSON	0.99+
four	QUANTITY	0.99+
next week	DATE	0.99+
2000 customers	QUANTITY	0.99+
next month	DATE	0.99+
today	DATE	0.99+
second one	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
this week	DATE	0.99+
third	QUANTITY	0.99+
fall of 2010	DATE	0.99+
two patterns	QUANTITY	0.99+
First	QUANTITY	0.99+
Anupam	PERSON	0.99+
one and a half years	QUANTITY	0.98+
10 minutes ago	DATE	0.98+
42 things	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.98+
two design patterns	QUANTITY	0.98+
two great guests	QUANTITY	0.98+
one reason	QUANTITY	0.98+
tomorrow	DATE	0.98+
Manish	ORGANIZATION	0.98+
0 petabyte	QUANTITY	0.98+
one exception	QUANTITY	0.98+
COVID	ORGANIZATION	0.98+
North America	LOCATION	0.97+
five years	QUANTITY	0.97+
2010	DATE	0.97+
last decade	DATE	0.97+
five use cases	QUANTITY	0.97+
one person	QUANTITY	0.97+
42 node	QUANTITY	0.96+
third one	QUANTITY	0.96+

Ashley Tarver, Cloudera | ACG SV Grow! Awards 2019

(upbeat music) >> From Mountain View, California, it's theCUBE covering the 15th annual GROW! Awards. Brought to you by ACG SV. >> Hey, Lisa Martin with theCUBE on the ground at the Computer History Museum in Mountain View, California, for the 15th annual ACG SV GROW! Awards. Can you hear the energy and all the innovation happening behind me? Well, I'm here with one of the board members of ACG SV, Ashley Tarver, big data evangelist for Cloudera. Ashley, thank you so much for joining me on theCUBE tonight. >> My pleasure, I'm glad to be here. >> Lot of collaboration going on behind us, right? >> It's a great networking event. >> It is. >> 'Cause so many people have showed up. >> There's over 230 people. >> Oh, easily. >> Expected tonight, over 100 of those are C-levels. Before we get into your association with ACG SV, talk to us a little bit about what's going on at Cloudera, just the Hortonworks acquisition was just completed, the merger, a couple months ago, what's going on there? >> It's very exciting. As most people might know, we just did a major collaboration merger with a company called Hortonworks. And the two companies together, we're about twice the size as we were before and for the industry and for our customers, it's been really exciting because we've been able to really create what we call the enterprise data cloud that really enables our customers to bring all their data together into one single platform and we call it an edge-to-AI solution. We're really one of the only companies right now in the world who have the ability to do that in a comprehensive manner and we can do it on the premise, we can do it in the cloud, a hybrid cloud environment, so it gives you the ultimate flexibility and the merger has allowed us to really accomplish that for our customers. >> As we and every company that's succeeding today is living in this hybrid, multi-cloud environment where the edge is proliferating, the security perimeters are morphing dramatically, companies need to be able to transform digitally in a secure way, but also enable access to data from decades ago. >> Yeah, most anybody's who's listening to the media will hear IoT is really the big play and the ability to capture all that data from multiple in-points, edge devices, and bring it all into a single data repository is a major challenge. So, having the ability to do that in a. You can do it now with the way we're doing it, the way your company wants to do it. So if you're already in the cloud, you can stay there, if you wanted to keep it on the premise. So there's a lot of options that we now bring to the table. So hopefully, it becomes a little easier for our customers. >> So when you're talking with customers that maybe have a lot of workloads, enterprise workloads, maybe legacy still on prem, and you're talking to them in your role as the big data evangelist, where does the topic of AI come up? I mean, are you talking to them about here is a massive opportunity for you to actually leverage AI, you got to go to the cloud to do it? >> Absolutely. I mean, AI is kind of a marketing term that you hear a lot about. For us, it's really about machine learning and machine learning is taking large sets of data and putting logic on top of it and so you can tease out valuable insights that you might not otherwise get. So the ability to then apply that in an AI environment becomes extremely important and the ability to do that across a large data set is what's really complicated. But if you're a real data scientist, you want to have as much data as you can so your models can run more accurately. And as soon as you can do that, you'll have the ability to really improve your models, extract better insights out of the data you do own, and provide more value to your own company and your own customers. >> Absolutely, it's a fascinating topic, but since we're low on time here, we are at the 15th annual GROW! Awards. ACG SV recognizing Arista Networks for the Outstanding Growth Award and Adesto Technologies for the Emerging Growth Award. You've been involved as a board member of ACG SV for about a year now. What makes this organization worthy of your time? >> Well, it's really exciting 'cause in Silicon Valley, it's unique 'cause it's all about collaboration. The innovation that we create out of this location of the globe is through networking with our peers and ACG opens up that window, provides a door that allows you to meet with your peers, your competitors, your friends, and as a result, you can create insights and capabilities about your own company and technology directions that's really helpful. So, it's the networking, they also put on excellent C-circle events, which is really good because if your company is looking at growing as a startup, you might be able to get some valuable insights from peers who know how to do HR, merger acquisitions, finance. And so, the ability to do networking like at an event like this, the ability to come in and learn how to do business processes more effectively, it all plays a really important role at ACG. >> Well Ashley, thank you so much for carving out some time to join us on theCUBE tonight. >> My pleasure, thanks for having me. >> I'm Lisa Martin, you're watching theCUBE. (upbeat music)

Published Date : Apr 18 2019

SUMMARY :

Brought to you by ACG SV. and all the innovation happening behind me? It's a great the merger, a couple months ago, what's going on there? and for the industry and for our customers, the security perimeters are morphing dramatically, and the ability to capture all that data and the ability to do that across a large data set and Adesto Technologies for the Emerging Growth Award. And so, the ability to do networking Well Ashley, thank you so much for carving out some time I'm Lisa Martin, you're watching theCUBE.

ENTITIES

Entity	Category	Confidence
Hortonworks	ORGANIZATION	0.99+
Ashley Tarver	PERSON	0.99+
Ashley	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
ACG	ORGANIZATION	0.99+
ACG SV	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Mountain View, California	LOCATION	0.99+
one	QUANTITY	0.99+
Arista Networks	ORGANIZATION	0.99+
over 230 people	QUANTITY	0.98+
tonight	DATE	0.98+
single	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
decades ago	DATE	0.93+
ACG SV GROW! Awards	EVENT	0.93+
over 100	QUANTITY	0.92+
about a year	QUANTITY	0.92+
ACG SV Grow! Awards 2019	EVENT	0.92+
one single platform	QUANTITY	0.91+
Computer History Museum	LOCATION	0.91+
15th annual GROW! Awards	EVENT	0.87+
couple months ago	DATE	0.84+
GROW! Awards	EVENT	0.81+
about twice	QUANTITY	0.71+
Emerging Growth Award	TITLE	0.69+
15th annual	EVENT	0.66+
Adesto	ORGANIZATION	0.65+
today	DATE	0.62+
Growth Award	TITLE	0.59+
Technologies	TITLE	0.55+
15th annual	QUANTITY	0.55+

Mick Hollison, Cloudera | theCUBE NYC 2018

(lively peaceful music) >> Live, from New York, it's The Cube. Covering "The Cube New York City 2018." Brought to you by SiliconANGLE Media and its ecosystem partners. >> Well, everyone, welcome back to The Cube special conversation here in New York City. We're live for Cube NYC. This is our ninth year covering the big data ecosystem, now evolved into AI, machine learning, cloud. All things data in conjunction with Strata Conference, which is going on right around the corner. This is the Cube studio. I'm John Furrier. Dave Vellante. Our next guest is Mick Hollison, who is the CMO, Chief Marketing Officer, of Cloudera. Welcome to The Cube, thanks for joining us. >> Thanks for having me. >> So Cloudera, obviously we love Cloudera. Cube started in Cloudera's office, (laughing) everyone in our community knows that. I keep, keep saying it all the time. But we're so proud to have the honor of working with Cloudera over the years. And, uh, the thing that's interesting though is that the new building in Palo Alto is right in front of the old building where the first Palo Alto office was. So, a lot of success. You have a billboard in the airport. Amr Awadallah is saying, hey, it's a milestone. You're in the airport. But your business is changing. You're reaching new audiences. You have, you're public. You guys are growing up fast. All the data is out there. Tom's doing a great job. But, the business side is changing. Data is everywhere, it's a big, hardcore enterprise conversation. Give us the update, what's new with Cloudera. >> Yeah. Thanks very much for having me again. It's, it's a delight. I've been with the company for about two years now, so I'm officially part of the problem now. (chuckling) It's been a, it's been a great journey thus far. And really the first order of business when I arrived at the company was, like, welcome aboard. We're going public. Time to dig into the S-1 and reimagine who Cloudera is going to be five, ten years out from now. And we spent a good deal of time, about three or four months, actually crafting what turned out to be just 38 total words and kind of a vision and mission statement. But the, the most central to those was what we were trying to build. And it was a modern platform for machine learning analytics in the cloud. And, each of those words, when you unpack them a little bit, are very, very important. And this week, at Strata, we're really happy on the modern platform side. We just released Cloudera Enterprise Six. It's the biggest release in the history of the company. There are now over 30 open-source projects embedded into this, something that Amr and Mike could have never imagined back in the day when it was just a couple of projects. So, a very very large and meaningful update to the platform. The next piece is machine learning, and Hilary Mason will be giving the kickoff tomorrow, and she's probably forgotten more about ML and AI than somebody like me will ever know. But she's going to give the audience an update on what we're doing in that space. But, the foundation of having that data management platform, is absolutely fundamental and necessary to do good machine learning. Without good data, without good data management, you can't do good ML or AI. Sounds sort of simple but very true. And then the last thing that we'll be announcing this week, is around the analytics space. So, on the analytic side, we announced Cloudera Data Warehouse and Altus Data Warehouse, which is a PaaS flavor of our new data warehouse offering. And last, but certainly not least, is just the "optimize for the cloud" bit. So, everything that we're doing is optimized not just around a single cloud but around multi-cloud, hybrid-cloud, and really trying to bridge that gap for enterprises and what they're doing today. So, it's a new Cloudera to say the very least, but it's all still based on that core foundation and platform that, you got to know it, with very early on. >> And you guys have operating history too, so it's not like it's a pivot for Cloudera. I know for a fact that you guys had very large-scale customers, both with three letter, letters in them, the government, as well as just commercial. So, that's cool. Question I want to ask you is, as the conversation changes from, how many clusters do I have, how am I storing the data, to what problems am I solving because of the enterprises. There's a lot of hard things that enterprises want. They want compliance, all these, you know things that have either legacy. You guys work on those technical products. But, at the end of the day, they want the outcomes, they want to solve some problems. And data is clearly an opportunity and a challenge for large enterprises. What problems are you guys going after, these large enterprises in this modern platform? What are the core problems that you guys knock down? >> Yeah, absolutely. It's a great question. And we sort of categorize the way we think about addressing business problems into three broad categories. We use the terms grow, connect, and protect. So, in the "grow" sense, we help companies build or find new revenue streams. And, this is an amazing part of our business. You see it in everything from doing analytics on clickstreams and helping people understand what's happening with their web visitors and the like, all the way through to people standing up entirely new businesses based simply on their data. One large insurance provider that is a customer of ours, as an example, has taken on the challenge and asked us to engage with them on building really, effectively, insurance as a service. So, think of it as data-driven insurance rates that are gauged based on your driving behaviors in real time. So no longer simply just using demographics as the way that you determine, you know, all 18-year old young men are poor drivers. As it turns out, with actual data you can find out there's some excellent 18 year olds. >> Telematic, not demographics! >> Yeah, yeah, yeah, exactly! >> That Tesla don't connect to the >> Exactly! And Parents will love this, love this as well, I think. So they can find out exactly how their kids are really behaving by the way. >> They're going to know I rolled through the stop signs in Palo Alto. (laughing) My rates just went up. >> Exactly, exactly. So, so helping people grow new businesses based on their data. The second piece is "Connect". This is not just simply connecting devices, but that's a big part of it, so the IOT world is a big engine for us there. One of our favorite customer stories is a company called Komatsu. It's a mining manufacturer. Think of it as the ones that make those, just massive mines that are, that are all over the world. They're particularly big in Australia. And, this is equipment that, when you leave it sit somewhere, because it doesn't work, it actually starts to sink into the earth. So, being able to do predictive maintenance on that level and type and expense of equipment is very valuable to a company like Komatsu. We're helping them do that. So that's the "Connect" piece. And last is "Protect". Since data is in fact the new oil, the most valuable resource on earth, you really need to be able to protect it. Whether that's from a cyber security threat or it's just meeting compliance and regulations that are put in place by governments. Certainly GDPR is got a lot of people thinking very differently about their data management strategies. So we're helping a number of companies in that space as well. So that's how we kind of categorize what we're doing. >> So Mick, I wonder if you could address how that's all affected the ecosystem. I mean, one of the misconceptions early on was that Hadoop, Big Data, is going to kill the enterprise data warehouse. NoSQL is going to knock out Oracle. And, Mike has always said, "No, we are incremental". And people are like, "Yeah, right". But that's really, what's happened here. >> Yes. >> EDW was a fundamental component of your big data strategies. As Amr used to say, you know, SQL is the killer app for, for big data. (chuckling) So all those data sources that have been integrated. So you kind of fast forward to today, you talked about IOT and The Edge. You guys have announced, you know, your own data warehouse and platform as a service. So you see this embracing in this hybrid world emerging. How has that affected the evolution of your ecosystem? >> Yeah, it's definitely evolved considerably. So, I think I'd give you a couple of specific areas. So, clearly we've been quite successful in large enterprises, so the big SI type of vendors want a, want a piece of that action these days. And they're, they're much more engaged than they were early days, when they weren't so sure all of this was real. >> I always say, they like to eat at the trough and then the trough is full, so they dive right in. (all laughing) They're definitely very engaged, and they built big data practices and distinctive analytics practices as well. Beyond that, sort of the developer community has also begun to shift. And it's shifted from simply people that could spell, you know, Hive or could spell Kafka and all of the various projects that are involved. And it is elevated, in particular into a data science community. So one of additional communities that we sort of brought on board with what we're doing, not just with the engine and SPARK, but also with tools for data scientists like Cloudera Data Science Workbench, has added that element to the community that really wasn't a part of it, historically. So that's been a nice add on. And then last, but certainly not least, are the cloud providers. And like everybody, they're, those are complicated relationships because on the one hand, they're incredibly valuable partners to it, certainly both Microsoft and Amazon are critical partners for Cloudera, at the same time, they've got competitive offerings. So, like most successful software companies there's a lot of coopetition to contend with that also wasn't there just a few years ago when we didn't have cloud offerings, and they didn't have, you know, data warehouse in the cloud offerings. But, those are things that have sort of impacted the ecosystem. >> So, I've got to ask you a marketing question, since you're the CMO. By the way, great message UL. I like the, the "grow, connect, protect." I think that's really easy to understand. >> Thank you. >> And the other one was modern. The phrase, say the phrase again. >> Yeah. It's the "Cloudera builds the modern platform for machine learning analytics optimized for the cloud." >> Very tight mission statement. Question on the name. Cloudera. >> Mmhmm. >> It's spelled, it's actually cloud with ERA in the letters, so "the cloud era." People use that term all the time. We're living in the cloud era. >> Yes. >> Cloud-native is the hottest market right now in the Linux foundation. The CNCF has over two hundred and forty members and growing. Cloud-native clearly has indicated that the new, modern developers here in the renaissance of software development, in general, enterprises want more developers. (laughs) Not that you want to be against developers, because, clearly, they're going to hire developers. >> Absolutely. >> And you're going to enable that. And then you've got the, obviously, cloud-native on-premise dynamic. Hybrid cloud and multi-cloud. So is there plans to think about that cloud era, is it a cloud positioning? You see cloud certainly important in what you guys do, because the cloud creates more compute, more capabilities to move data around. >> Sure. >> And (laughs) process it. And make it, make machine learning go faster, which gives more data, more AI capabilities, >> It's the flywheel you and I were discussing. >> It's the flywheel of, what's the innovation sandwich, Dave? You know? (laughs) >> A little bit of data, a little bit of machine itelligence, in the cloud. >> So, the innovation's in play. >> Yeah, Absolutely. >> Positioning around Cloud. How are you looking at that? >> Yeah. So, it's a fascinating story. You were with us in the earliest days, so you know that the original architecture of everything that we built was intended to be run in the public cloud. It turns out, in 2008, there were exactly zero customers that wanted all of their data in a public cloud environment. So the company actually pivoted and re-architected the original design of the offerings to work on-prim. And, no sooner did we do that, then it was time to re-architect it yet again. And we are right in the midst of doing that. So, we really have offerings that span the whole gamut. If you want to just pick up you whole current Cloudera environment in an infrastructure as a service model, we offer something called Altus Director that allows you to do that. Just pick up the entire environment, step it up onto AWUS, or Microsoft Azure, and off you go. If you want the convenience and the elasticity and the ease of use of a true platform as a service, just this past week we announced Altus Data Warehouse, which is a platform as a service kind of a model. For data warehousing, we have the data engineering module for Altus as well. Last, but not least, is everybody's not going to sign up for just one cloud vendor. So we're big believers in multi-cloud. And that's why we support the major cloud vendors that are out there. And, in addition to that, it's going to be a hybrid world for as far out as we can see it. People are going to have certain workloads that, either for economics or for security reasons, they're going to continue to want to run in-house. And they're going to have other workloads, certainly more transient workloads, and I think ML and data science will fall into this camp, that the public cloud's going to make a great deal of sense. And, allowing companies to bridge that gap while maintaining one security compliance and management model, something we call a Shared Data Experience, is really our core differentiator as a business. That's at the very core of what we do. >> Classic cloud workload experience that you're bringing, whether it's on-prim or whatever cloud. >> That's right. >> Cloud is an operating environment for you guys. You look at it just as >> The delivery mechanism. In effect. Awesome. All right, future for Cloudera. What can you share with us. I know you're a public company. Can't say any forward-looking statements. Got to do all those disclaimers. But for customers, what's the, what's the North Star for Cloudera? You mentioned going after a much more hardcore enterprise. >> Yes. >> That's clear. What's the North Star for you guys when you talk to customers? What's the big pitch? >> Yeah. I think there's a, there's a couple of really interesting things that we learned about our business over the course of the past six, nine months or so here. One, was that the greatest need for our offerings is in very, very large and complex enterprises. They have the most data, not surprisingly. And they have the most business gain to be had from leveraging that data. So we narrowed our focus. We have now identified approximately five thousand global customers, so think of it as kind of Fortune or Forbes 5000. That is our sole focus. So, we are entirely focused on that end of the market. Within that market, there are certain industries that we play particularly well in. We're incredibly well-positioned in financial services. Very well-positioned in healthcare and telecommunications. Any regulated industry, that really cares about how they govern and maintain their data, is really the great target audience for us. And so, that continues to be the focus for the business. And we're really excited about that narrowing of focus and what opportunities that's going to build for us. To not just land new customers, but more to expand our existing ones into a broader and broader set of use cases. >> And data is coming down faster. There's more data growth than ever seen before. It's never stopping.. It's only going to get worse. >> We love it. >> Bring it on. >> Any way you look at it, it's getting worse or better. Mick, thanks for spending the time. I know you're super busy with the event going on. Congratulations on the success, and the focus, and the positioning. Appreciate it. Thanks for coming on The Cube. >> Absolutely. Thank you gentlemen. It was a pleasure. >> We are Cube NYC. This is our ninth year doing all action. Everything that's going on in the data world now is horizontally scaling across all aspects of the company, the society, as we know. It's super important, and this is what we're talking about here in New York. This is The Cube, and John Furrier. Dave Vellante. Be back with more after this short break. Stay with us for more coverage from New York City. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media This is the Cube studio. is that the new building in Palo Alto is right So, on the analytic side, we announced What are the core problems that you guys knock down? So, in the "grow" sense, we help companies by the way. They're going to know I rolled Since data is in fact the new oil, address how that's all affected the ecosystem. How has that affected the evolution of your ecosystem? in large enterprises, so the big and all of the various projects that are involved. So, I've got to ask you a marketing question, And the other one was modern. optimized for the cloud." Question on the name. We're living in the cloud era. Cloud-native clearly has indicated that the new, because the cloud creates more compute, And (laughs) process it. machine itelligence, in the cloud. How are you looking at that? that the public cloud's going to make a great deal of sense. Classic cloud workload experience that you're bringing, Cloud is an operating environment for you guys. What can you share with us. What's the North Star for you guys is really the great target audience for us. And data is coming down faster. and the positioning. Thank you gentlemen. is horizontally scaling across all aspects of the

ENTITIES

Entity	Category	Confidence
Komatsu	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Mick Hollison	PERSON	0.99+
Mike	PERSON	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2008	DATE	0.99+
Palo Alto	LOCATION	0.99+
Tom	PERSON	0.99+
New York	LOCATION	0.99+
Mick	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Tesla	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Hilary Mason	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
three letter	QUANTITY	0.99+
North Star	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
zero customers	QUANTITY	0.99+
five	QUANTITY	0.99+
18 year	QUANTITY	0.99+
ninth year	QUANTITY	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.99+
ten years	QUANTITY	0.98+
four months	QUANTITY	0.98+
over two hundred and forty members	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
first	QUANTITY	0.98+
NoSQL	TITLE	0.98+
The Cube	ORGANIZATION	0.98+
over 30 open-source projects	QUANTITY	0.98+
Amr	PERSON	0.98+
today	DATE	0.98+
SQL	TITLE	0.98+
each	QUANTITY	0.98+
GDPR	TITLE	0.98+
tomorrow	DATE	0.98+
Cube	ORGANIZATION	0.97+
approximately five thousand global customers	QUANTITY	0.97+
Strata	ORGANIZATION	0.96+
about two years	QUANTITY	0.96+
Altus	ORGANIZATION	0.96+
earth	LOCATION	0.96+
EDW	TITLE	0.95+
18-year old	QUANTITY	0.95+
Strata Conference	EVENT	0.94+
few years ago	DATE	0.94+
one	QUANTITY	0.94+
AWUS	TITLE	0.93+
Altus Data Warehouse	ORGANIZATION	0.93+
first order	QUANTITY	0.93+
single cloud	QUANTITY	0.93+
Cloudera Enterprise Six	TITLE	0.92+
about three	QUANTITY	0.92+
Cloudera	TITLE	0.84+
three broad categories	QUANTITY	0.84+
past six	DATE	0.82+

Alison Yu, Cloudera - SXSW 2017 - #IntelAI - #theCUBE

(electronic music) >> Announcer: Live from Austin, Texas, it's The Cube. Covering South By Southwest 2017. Brought to you by Intel. Now, here's John Furrier. >> Hey, welcome back, everyone, we're here live in Austin, Texas, for South By Southwest Cube coverage at the Intel AI Lounge, #IntelAI if you're watching, put it out on Twitter. I'm John Furrier of Silicon Angle for the Cube. Our next guest is Alison Yu who's with Cloudera. And in the news today, although they won't comment on it. It's great to see you, social media manager at Cloudera. >> Yes, it's nice to see you as well. >> Great to see you. So, Cloudera has a strategic relationship with Intel. You guys have a strategic investment, Intel, and you guys partner up, so it's well-known in the industry. But what's going on here is interesting, AI for social good is our theme. >> Alison: Yes. >> Cloudera has always been a pay-it-forward company. And I've known the founders, Mike Olson and Amr Awadallah. >> Really all about the community and paying it forward. So Alison, talk about what you guys are working on. Because you're involved in a panel, but also Cloudera Cares. And you guys have teamed up with Thorn, doing some interesting things. >> Alison: Yeah (laughing). >> Take it away! >> Sure, thanks. Thanks for the great intro. So I'll give you a little bit of a brief introduction to Cloudera Cares. Cloudera Cares was founded roughly about three years ago. It was really an employee-driven and -led effort. I kind of stepped into the role and ended up being a little bit more of the leader just by the way it worked out. So we've really gone from, going from, you know, we're just doing soup kitchens and everything else, to strategic partnerships, donating software, professional service hours, things along those lines. >> Which has been very exciting to see our nonprofit partnerships grow in that way. So it really went from almost grass-root efforts to an organized organization now. And we start stepping up our strategic partnerships about a year and a half ago. We started with DataKind, is our initial one. About two years ago, we initiated that. Then we a year ago, about in September, we finalized our donation of an enterprise data hub to Thorn, which if you're not aware of they're all about using technology and innovation to stop child-trafficking. So last year, around September or so, we announced the partnership and we donated professional service hours. And then in October, we went with them to Grace Hopper, which is obviously the largest Women in Tech Conference in North America. And we hosted a hackathon and we helped mentor women entering into the tech workforce, and trying to come up with some really cool innovative solutions for them to track and see what's going on with the dark web, so we had quite a few interesting ideas coming out of that. >> Okay, awesome. We had Frederico Gomez Suarez on, who was the technical advisor. >> Alison: Yeah. >> A Microsoft employee, but he's volunteering at Thorn, and this is interesting because this is not just donating to the soup kitchens and what not. >> Alison: Yeah. >> You're starting to see a community approach to philanthropy that's coding RENN. >> Yeah. >> Hackathons turning into community galvanizing communities, and actually taking it to the next level. >> Yeah. So, I think one of the things we realize is tech, while it's so great, we have actually introduced a lot of new problems. So, I don't know if everyone's aware, but in the '80s and '90s, child exploitation had almost completely died. They had almost resolved the issue. With the introduction of technology and the Internet, it opened up a lot more ways for people to go ahead and exploit children, arrange things, in the dark web. So we're trying to figure out a way to use technology to combat a problem that technology kind of created as well, but not only solving it, but rescuing people. >> It's a classic security problem, the surface area has increased for this kind of thing. But big data, which is where you guys were founded on in the cloud era that we live in. >> Alison: Yeah. >> Pun intended. (laughing) Using the machine learning now you start with some scale now involved. >> Yes, exactly, and that's what we're really hoping, so we're partnering with Intel in the National Center of Missing Exploited Children. We're actually kicking off a virtual hackathon tomorrow, and our hope is we can figure out some different innovative ways that AI can be applied to scraping data and finding children. A lot of times we'll see there's not a lot of clues, but for example, if we can upload, if there can be a tool that can upload three or four different angles of a child's face when they go missing, maybe what happens is someone posts a picture on Instagram or Twitter that has a geo tag and this kid is in the background. That would be an amazing way of using AI and machine learning-- >> Yeah. >> Alison: To find a child, right. >> Well, I'll give you guy a plug for Cloudera. And I'll reference Dr. Naveen Rao, who's the GM of Intel's AI group, was on earlier. And he was talking about how there's a lot of storage available, not a lot of compute. Now, Cloudera, you guys have really pioneered the data lake, data hub concept where storage is critical. >> Yeah. >> Now, you got this compute power and machine learning, that's kind of where it comes together. Did I get that right? >> Yeah, and I think it's great that with the partnership with Intel we're able to integrate our technology directly into the hardware, which makes it so much more efficient. You're able to compute massive amounts of data in a very short amount of time, and really come up with real results. And with this partnership, specifically with Thorn and NCMEC, we're seeing that it's real impact for thousands of people last year, I think. In the 2016 impact report, Thorn said they identified over 6,000 trafficking victims, of which over 2,000 were children. Right, so that tool that they use is actually built on Cloudera. So, it's great seeing our technology put into place. >> Yeah, that's awesome. I was talking to an Intel person the other day, they have 72 cores now on a processor, on the high-end Xeons. Let's get down to some other things that you're working on. What are you doing here at the show? Do you have things that you're doing? You have a panel? >> Yeah, so at the show, at South by Southwest, we're kicking off a virtual hackathon tomorrow at our Austin offices for South by Southwest. Everyone's welcome to come. I just did the liquor order, so yes, everyone please come. (laughing) >> You just came from Austin's office, you're just coming there. >> Yeah, exactly. So we've-- >> Unlimited Red Bull, pizza, food. (laughing) >> Well, we'll be doing lots and lots tomorrow, but we're kicking that off, we have representatives from Thorn, NCMEC, Google, Intel, all on site to answer questions. That's kind of our kickoff of this month-long virtual hackathon. You don't need to be in Austin to participate, but that is one of the things that we are kicking off. >> And then on Sunday, actually here at the Intel AI Lounge we're doing a panel on AI for Good, and using artificial intelligence to solve problems. >> And we'll be broadcasting that live here on The Cube. So, folks, SiliconAngle.tv will carry that. Alison, talk about the trend that, you weren't here when we were talking about how there's now a new counterculture developing in a good way around community and social change. How real is the trend that you're starting to see these hackathons evolve from what used to be recruiting sessions to people just jamming together to meet each other. Now, you're starting to see the next level of formation where people are organizing collectively-- >> Yeah. >> To impact real issues. >> Yeah. >> Is this a real trend or where is that trend, can you speak to that? >> Sure, so from what I've seen from the hackathons what we've been seeing before was it's very company-specific. Only one company wanted to do it, and they would kind of silo themselves, right? Now, we're kind of seeing this coming together of companies that are generally competitors, but they see a great social cause and they decide that they want to band together, regardless of their differences in technology, product, et cetera, for a common good. And, so. >> Like a Thorn. >> For Thorn, you'll see a lot of competitors, so you'll see Facebook and Twitter or Google and Amazon, right? >> John: Yeah. >> And we'll see all these different competitors come together, lend their workforce to us, and have them code for one great project. >> So, you see it as a real trend. >> I do see it as a trend. I saw Thorn last year did a great one with Facebook and on-site with Facebook. This year as we started to introduce this hackathon, we decided that we wanted to do a hackathon series versus just a one-off hackathon. So we're seeing people being able to share code, contribute, work on top of other code, right, and it's very much a sharing community, so we're very excited for that. >> All right, so I got to ask you what's they culture like at Cloudera these days, as you guys prepare to go public? What's the vibe internally of the company, obviously Mike Olson, the founder, is still around, Amr's around. You guys have been growing really fast. Got your new space. What's the vibe like in Cloudera now? >> Honestly, the culture at Cloudera hasn't really changed. So, when I joined three years ago we were much smaller than we are now. But I think one thing that we're really excited about is everyone's still so collaborative, and everyone makes sure to help one another out. So, I think our common goal is really more along the lines of we're one team, and let's put out the best product we can. >> Awesome. So, what's South by Southwest mean to you this year? If you had to kind of zoom out and say, okay. What's the theme? We heard Robert Scoble earlier say it's a VR theme. We hear at Intel it's AI. So, there's a plethora of different touchpoints here. What do you see? >> Yeah, so I actually went to the opening keynote this morning, which was great. There was an introduction, and then I don't know if you realized, but Cory Booker was on as well, which is great. >> John: Yep. >> But I think a lot of what we had seen was they called out on stage that artificial intelligence is something that will be a trend for the next year. And I think that's very exciting that Intel really hit the nail on the head with the AI Lounge, right? >> Cory Booker, I'm a big fan. He's from my neighborhood, went to the same school I went to, that my family. So in Northern Valley, Old Tappan. Cory, if you're watching, retweet us, hashtag #IntelAI. So AI's there. >> AI is definitely there. >> No doubt, it's on stage. >> Yes, but I think we're also seeing a very large, just community around how can we make our community better versus let's try to go in these different silos, and just be hyper-aware of what's only in front of us, right? So, we're seeing a lot more from the community as well, just being interested in things that are not immediately in front of us, the wider, either nation, global, et cetera. So, I think that's very exciting people are stepping out of just their own little bubbles, right? And looking and having more compassion for other people, and figuring out how they can give back. >> And, of course, open source at the center of all the innovation as always. (laughing) >> I would like to think so, right? >> It is! I would testify. Machine learning is just a great example, how that's now going up into the cloud. We started to see that really being part of all the apps coming out, which is great because you guys are in the big data business. >> Alison: Yeah. >> Okay, Alison, thanks so much for taking the time. Real quick plug for your panel on Sunday here. >> Yeah. >> What are you going to talk about? >> So we're going to be talking a lot about AI for good. We're really going to be talking about the NCMEC, Thorn, Google, Intel, Cloudera partnership. How we've been able to do that, and a lot of what we're going to also concentrate on is how the everyday tech worker can really get involved and give back and contribute. I think there is generally a misconception of if there's not a program at my company, how do I give back? >> John: Yeah. >> And I think Cloudera's a shining example of how a few employees can really enact a lot of change. We went from grassroots, just a few employees, to a global program pretty quickly, so. >> And it's organically grown, which is the formula for success versus some sort of structured company program (laughing). >> Exactly, so we definitely gone from soup kitchen to strategic partnerships, and being able to donate our own time, our engineers' times, and obviously our software, so. >> Thanks for taking the time to come on our Cube. It's getting crowded in here. It's rocking the house, the house is rocking here at the Intel AI Lounge. If you're watching, check out the hashtag #IntelAI or South by Southwest. I'm John Furrie. I'll be back with more after this short break. (electronic music)

Published Date : Mar 10 2017

SUMMARY :

Brought to you by Intel. And in the news today, although they won't comment on it. and you guys partner up, And I've known the founders, Mike Olson and Amr Awadallah. So Alison, talk about what you guys are working on. I kind of stepped into the role for them to track and see what's going on with the dark web, We had Frederico Gomez Suarez on, donating to the soup kitchens and what not. You're starting to see a community approach and actually taking it to the next level. but in the '80s and '90s, child exploitation in the cloud era that we live in. Using the machine learning now and our hope is we can figure out some different the data lake, data hub concept Now, you got this compute power and machine learning, into the hardware, which makes it so much more efficient. on the high-end Xeons. I just did the liquor order, so yes, everyone please come. You just came from Austin's office, So we've-- (laughing) but that is one of the things that we are kicking off. actually here at the Intel AI Lounge Alison, talk about the trend that, you weren't here and they would kind of silo themselves, right? and have them code for one great project. and on-site with Facebook. All right, so I got to ask you the best product we can. What's the theme? and then I don't know if you realized, that Intel really hit the nail on the head I went to, that my family. and just be hyper-aware of And, of course, open source at the center which is great because you guys are in the Okay, Alison, thanks so much for taking the time. and a lot of what we're going to also concentrate on is And I think Cloudera's a shining example of And it's organically grown, and being able to donate our own time, Thanks for taking the time to come on our Cube.

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
Alison	PERSON	0.99+
Robert Scoble	PERSON	0.99+
NCMEC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
John Furrie	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Austin	LOCATION	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
October	DATE	0.99+
Naveen Rao	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Cory Booker	PERSON	0.99+
Alison Yu	PERSON	0.99+
Sunday	DATE	0.99+
Intel	ORGANIZATION	0.99+
Cloudera Cares	ORGANIZATION	0.99+
72 cores	QUANTITY	0.99+
Thorn	ORGANIZATION	0.99+
last year	DATE	0.99+
This year	DATE	0.99+
Amr Awadallah	PERSON	0.99+
a year ago	DATE	0.99+
Facebook	ORGANIZATION	0.99+
Cory	PERSON	0.99+
tomorrow	DATE	0.99+
Austin, Texas	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Northern Valley	LOCATION	0.99+
September	DATE	0.99+
2016	DATE	0.99+
DataKind	ORGANIZATION	0.99+
over 6,000 trafficking victims	QUANTITY	0.99+
Frederico Gomez Suarez	PERSON	0.99+
next year	DATE	0.99+
today	DATE	0.99+
over 2,000	QUANTITY	0.99+
three years ago	DATE	0.99+
National Center of Missing Exploited Children	ORGANIZATION	0.98+
SXSW 2017	EVENT	0.98+
one	QUANTITY	0.98+
About two years ago	DATE	0.98+
Amr	ORGANIZATION	0.98+
thousands of people	QUANTITY	0.97+
North America	LOCATION	0.95+
about a year and a half ago	DATE	0.95+
this year	DATE	0.95+
one team	QUANTITY	0.95+

Josie Gillan, Cloudera - Women Transforming Technology 2017 - #WT2SV - #theCUBE

>> Commentator: Live from Palo Alto, it's theCUBE, covering Women Transforming Technology 2017, brought to you by VMware. >> Welcome back to theCUBE's coverage of Women Transforming Technology here in sunny Palo Alto at the VMware conference. I am Rebecca Knight, your host. I'm joined by Josie Gillan. She is the Senior Director of Engineering at Cloudera and a passionate advocate for getting more women into technology. Josie, thanks so much for joining us. >> Thank you very much for inviting me. Pleasure to be here. >> So I want to start out by asking a question that should be obvious but it may not be. Why do we need more women in technology? >> Right, so that's the classic question and I think I probably would have the classic answer which is just so many studies have shown that diversity results in much better products, much better ideas and we've found numerous stories where products were developed by mostly white males and they just have actually alienated many, many of their customers, right? So it's definitely that we need to have that diversity and I think 50%, 51% I think actually, of the population is women, right? So let's not disregard half of them. I just think women have a lot to offer and a lot to add. It's a generalization, but women generally are more collaborative and supportive so it's the right thing to do and obviously the numbers in tech are just so far skewed off what the actual numbers and population are that it's time to continue to do something about it, but it's hard. >> I want to talk to you about what you just said about women in their approach to work, their approach to being on a team. You said they're more collaborative. You were talking a little bit earlier about EQ and the importance of EQ. Can you comment on the perspective that women bring and the approach that they take to being on a team that is different in your experience? >> It's just that women are generally probably, again I'm really generalizing here, but the way that women network with each other and support each other and generally want to touch and connect, I think that's a lot of what it is about networking. So for example, again this is not all women, but in 101s and meeting with your fellow peers, I think connection is really important and building the relationships and probably being a little more vulnerable I think is really important rather than the stoic I'm here to get what I need. I think women generally tend to say, "Okay, what can we get together?" And I think that's a natural trait that women have, but again purely generalizing. >> In terms of Silicon Valley, you've been around at a lot of different companies. You built your career here. Is it better? Now also particularly now at a time where we are hearing so many horrible stories about overt sexism, everything from subtle biases to overt sexism and sexual harassment. What's it like? Tell us the tales from the trenches. Do you have? >> Well, first of all, I think you were going to start to say, "Is it getting better?" >> Rebecca: Yeah. (laughs) >> Unfortunately it's not and there's a lot of studies to show that. What I think is changing though is that we are talking about it more and more starting with I guess it was two years ago and there was this grassroots effort after one of the Grace Hopper Conferences to get companies to actually publicize their diversity data so I think that's number one, right? That we're actually getting companies to say what their numbers are, both for gender and people of color, right? >> Rebecca: So the first step is really awareness that there could be a problem. >> Exactly. And then that there's a lot of companies investing in and obviously hiring a diversity inclusion leader. I've been at Atlassian before I came to Cloudera and Atlassian is a great company, got some really good two CEOs who really believed in diversity but again like other companies, the numbers were pretty, pretty bad. And it was in Australia too, probably you knew that. I actually moved to Australia for a year and I think it was very young. It was not only not so diverse on the gender but also very young which is again very common in tech companies, but they've gone and hired a diversity inclusion leader and she's doing an amazing job at bringing in more programs, getting awareness out there and trying to make a difference, but it's not an easy job. I think she's doing amazing. I think our folks at Cloudera are doing amazing. Salesforce is doing amazing. There's awareness but it's a very difficult issue. >> So that's the hiring part of it, it's bringing more women in. What about the culture too? We were talking earlier too about the supportive environment and supportive leadership. What will it take for a big cultural shift in the technology industry? >> So when I came back, basically this is my story, is I'm from New Zealand originally but I've lived over here, I moved to America in '98 and worked for several different companies, Oracle, Salesforce and thought always hey I wouldn't mind going back home and being closer to my family so we actually moved to Sydney for a year and that's where I worked for Atlassian which was a really interesting experience, but it made me realize that the bay area was home and I think the culture of Silicon Valley is something that you can't get outside of Silicon Valley. >> For better or for worse. >> For better or for worse but again, back to that collaboration, in Sydney there's not that many tech companies, right? So I didn't find that collaboration. These kinds of events were very, very rare and especially in engineering, right? I could meet people who worked for the Google office in Sydney, but they're more in nontechnical roles. I mean, there were some. So when I came back, it was really important for me to find a company that again, as you mentioned, had that high EQ and a really good culture and what I mean by that is not it's got a free lunch. Cloudera has free lunches, but that's not what attracted me to Cloudera. What attracted me to Cloudera was talking to my manager is the SVP of Engineering and my peers are all VPs of Engineering and it was the conversation in the interviews that really were conversations and just very, very respectful and it wasn't all about this is what I do and this is what you must do. It was about a collaborative conversation. And one thing I really got from talking with both my manager and my peers was that they really were out to support each other. And one thing I think is amazing about the culture we have at Cloudera is that what will happen is I'm leading quality, performance, build and infrastructure and quality is at the top of our list at the moment. We can always improve on quality and we had an extraordinary developer in one of my peer's teams who wanted to come and help with quality problems. Now normally what would happen is the development VP might say, "I don't want to leave him." >> Yes, there are silos. >> But he was like, the development VP was, "Well, really sad to lose him, "but this is a much bigger problem and I'm going to help him. "I'm going to help him move." And I think that is a really interesting leaderships style that isn't prevalent throughout Silicon Valley which is I'm going to do what's good for the company and the overall good of the company and just what's right rather than particularly my own. >> Rebecca: My department, my unit. >> My own turf, yeah. And what we want to do at Cloudera is bring that further through the chains because as a company, as it's growing, we've got many different product teams and we want to make sure that that collaboration goes across the development managers, the quality engineering managers to really learn from each other and support each other. Your question is how do we, that to me is very, very important and I think we need to start talking about it and we need to showcase companies that do it well. We've actually gone through one of those personality tests or it wasn't actually a personality test, what drives you whether it's more strategic or problem solving, people are into the process, and I think those are really good things to do so that you can all work to communicate with each other and work with each other. >> You mentioned earlier that one of the things about working in Sydney that struck you is that conferences like this one, the Women Transforming Technology, are rare. Why are they so important do you think? >> Oh right. I've been to the Grace Hopper Conference four times. You're so used to being the minority. You're so used to being the minority and it's fantastic to come to a conference like that where you're not the minority anymore. And I think one thing that's extraordinary, have you been to the Grace Hopper Conference? >> Rebecca: I have, I was there in Houston in October. >> One thing that I find extraordinary about the Grace Hopper is the camaraderie. And you'll be lining up to get a coffee and just the people that you'll start a conversation and I've actually made some really, really great friends from Grace Hopper that I still keep in contact with and it's the networking and oh hang on a minute, she's having the same problem I'm having. >> Are these professional problems that you're facing or are these strategic? >> A bit of both. It could be technical problems. A lot of it's how do I get a team to collaborate on something. It's how do I overcome my imposter syndrome? How do I be a good leader? And the connections you make. I really feel that you can truly be yourself and I love what Cara was just saying before about being authentic and being genuine. I think something like Grace Hopper is somewhere where you can truly, truly feel authentic and genuine. The thing for me is it always gives me a great big confidence. I just feel great after these conferences and I'm inspired to just go back and really continue to move the needle. >> This is a women's conference. It's mostly women attending. If you could send a message to the men of Silicon Valley, what would it be? If you could just gather all of them in a room and say give them some advice about either helping a young woman in her career or just hey fellas know this. >> I think the big advice is listen, right? Were you at the Grace Hopper Conference two years ago? >> Rebecca: No I wasn't, I missed that one. >> I'm not sure if you heard about the male allies panel, but it was interesting because basically there's a male allies panel which was done with all good intention, but it got a lot of flak because why the hell am I flaking about the space and what the people who were on the panel did which was really interesting is they actually created a second panel the next day and said, "Okay, we're going to shut up. "We're going to listen." And it's really quite hard. For all of us in technology, we're all used to solving problems and we want to have our say and to get them to be quiet and listen is so important and not try and solve the problem, just try and understand and Cara was just saying that before, right, about some of the stuff that's going on with Uber and everything is some of the males she talks to say, "But I don't see it." Well of course you don't see it because you're not experiencing it, right? So listen, talk to women and make it very clear that it's a safe space and that you're just here to listen and you're not going to try and solve the problem, but try and get an understanding because they're in a very, very different space than we are. >> The story that's going on with Uber, it is depressing as a woman, as a woman in technology in Silicon Valley particularly just a couple of years after the Ellen Pao lawsuit. Are you hopeful that things will get better? >> I'm hopeful things will get better. It's brave women like Susan who are they telling their stories. We need to support each other and really support people like Susan who were brave enough to say that and obviously now because she's done it, a lot of other people are coming forward and Uber has to take some responsibility and has to do something so I'm hopeful it's getting better because we're talking about it a lot more, but it's a very, very difficult situation and the more we talk about it and there's people who are a lot smarter than me and a lot different, who are very experienced in this kind of social issue to be able to figure out how the hell we address this, but a lot of it is to get the conversation going and as I said to listen. >> If you could give a piece of advice to the younger version of you, that young girl in New Zealand dreaming of a career in technology, you mentioned imposter syndrome, what would you say? >> Getting back to Cara's talk, she talked about don't worry so much about what people think of you. >> Oh that's so hard though, it's so hard. >> And I remember gosh in my early days in my career, I was sitting there and I can't say anything. I really want to say something but I'm going to look stupid and it's like be curious. I think that's my best advice. What I love when I'm interviewing, I've done a lot interviewing of college grads and what I'll do is see what questions they ask so I think you don't have to have all the answers and you don't have to show I'm the best Java programmer there is, but oh tell me about this and I really love that your company does this and how do you approach this kind of problem? And just their thirst for knowledge and that curiosity and their eagerness to learn, I think it's really important to ask questions. And I think that's a good way to get over the imposter syndrome because you're not necessarily coming up as like I'm trying to be an expert on something, it's like I'm trying to contribute to conversation and help me understand and I think it's a really good way to get people out there and getting people talking. >> So be curious, don't care so much what people think of you. >> Josie: Right, right. >> You don't have to be the smartest person at the table. >> And build your network and especially if you see somebody in a meeting that handled a particular situation very well, I think it's really great to be able to go up to them afterwards and say, "Look, I loved how you said that. "Can you maybe chat to me about how you came up with that? "'Cause I'd love to learn from you." There's a lot of this talk about mentorship and I think it's really true that Sheryl Sandberg says it's not really the best way to say, "Could you be my mentor please?" But to actually just say, "I love this." >> Ask for advice. >> Ask for advice and very few women would say, "I don't want to talk about that." Most women are like, "Wow that's great," and want to be able to help out the younger generation. >> Josie Gillan, thank you so much for joining us. It's been a pleasure talking to you. >> Thank you so much. >> I'm Rebecca Knight for theCUBE in our coverage of Women Transforming Technology. We'll be right back. (modern techno music)

Published Date : Feb 28 2017

SUMMARY :

brought to you by VMware. and a passionate advocate Pleasure to be here. that should be obvious but it may not be. and obviously the numbers in and the approach that they and building the relationships and sexual harassment. and there's a lot of studies to show that. Rebecca: So the first and I think it was very young. and supportive leadership. and being closer to my family and this is what you must do. and the overall good of the and I think we need to that one of the things and it's fantastic to come Rebecca: I have, I was and it's the networking and really continue to move the needle. to the men of Silicon Valley, I missed that one. and to get them to be quiet after the Ellen Pao lawsuit. and has to do something so I'm Getting back to Cara's talk, and their eagerness to learn, don't care so much what You don't have to be the and I think it's really true and want to be able to help It's been a pleasure talking to you. in our coverage of Women

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Josie Gillan	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Josie	PERSON	0.99+
Atlassian	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
Sydney	LOCATION	0.99+
Susan	PERSON	0.99+
Sheryl Sandberg	PERSON	0.99+
Uber	ORGANIZATION	0.99+
New Zealand	LOCATION	0.99+
America	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Houston	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Salesforce	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
October	DATE	0.99+
Cara	PERSON	0.99+
Google	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
'98	DATE	0.99+
51%	QUANTITY	0.99+
second panel	QUANTITY	0.99+
two years ago	DATE	0.99+
Palo Alto	LOCATION	0.98+
a year	QUANTITY	0.98+
both	QUANTITY	0.98+
Java	TITLE	0.98+
first step	QUANTITY	0.98+
Grace Hopper Conference	EVENT	0.97+
Ellen Pao	PERSON	0.96+
one	QUANTITY	0.96+
Grace Hopper Conference	EVENT	0.96+
Grace Hopper Conference	EVENT	0.95+
theCUBE	ORGANIZATION	0.93+
first	QUANTITY	0.92+
next day	DATE	0.91+
Grace Hopper	PERSON	0.91+
One	QUANTITY	0.9+
two CEOs	QUANTITY	0.9+
one thing	QUANTITY	0.87+
Grace Hopper	EVENT	0.86+
Grace Hopper	ORGANIZATION	0.85+
half	QUANTITY	0.78+
times	QUANTITY	0.74+
101s	QUANTITY	0.68+
2017	DATE	0.66+
#WT2SV	EVENT	0.65+
peer	QUANTITY	0.56+
Women	ORGANIZATION	0.54+
years	QUANTITY	0.52+

Aaron T. Myers Cloudera Software Engineer Talking Cloudera & Hadooop

>>so erin you're a technique for a Cloudera, you're a whiz kid from Brown, you have, how many Brown people are engineers here at Cloudera >>as of monday, we have five full timers and two interns at the moment and we're trying to hire more all the time. >>Mhm. So how many interns? >>Uh two interns from Brown this this summer? A few more from other schools? Cool, >>I'm john furry with silicon angle dot com. Silicon angle dot tv. We're here in the cloud era office in my little mini studio hasn't been built out yet, It was studio, we had to break it down for a doctor, ralph kimball, not richard Kimble from uh I called him on twitter but coupon um but uh the data warehouse guru was in here um and you guys are attracting a lot of talent erin so tell us a little bit about, you know, how Claudia is making it happen and what's the big deal here, people smart here, it's mature, it's not the first time around this company, this company has some some senior execs and there's been a lot, a lot of people uh in the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been hearing for some folks in in the, in the trenches that there's been a frustration and start ups out there, that there's a lot of first time entrepreneurs and everyone wants to be the next twitter and there's some kind of companies that are straddling failure out there? And and I was having that conversation with someone just today and I said, they said, what's it like Cloudera and I said, uh, this is not the first time crew here in Cloudera. So, uh, share with the folks out there, what you're seeing for Cloudera and the management team. >>Sure. Well, one of the most attractive parts about working Cloudera for me, one of the reasons I, I really came here was have been incredibly experienced management team, Mike Charles, they've all there at the top of this Oregon, they have all done this before they founded startups, Growing startups, old startups and uh, especially in contrast with my, the place where I worked previously. Uh, the amount of experience here is just tremendous. You see them not making mistakes where I'm sure others would. >>And I mean, Mike Olson is veteran. I mean he's been, he's an adviser to start ups. I know he's been in some investors. Amer was obviously PhD candidates bolted out the startup, sold it to yahoo, worked at, yahoo, came back finish his PhD at stanford under Mendel over there in the PhD program over this, we banged in a speech. He came back entrepreneur residents, Excel partners. Now it does Cloudera. Um, when did you join the company and just take us through who you are and when you join Cloudera, I want your background. >>Sure. So I, I joined a little over a year ago is about 30 people at the time. Uh, I came from a small start up of the music online music store in new york city um uh, which doesn't really exist all that much anymore. Um but you know, I I sort of followed my other colleagues from Brown who worked here um was really sold by the management team and also by the tremendous market opportunity that that Hadoop has right now. Uh Cloudera was very much the first commercial player there um which is really a unique experience and I think you've covered this pretty well before. I think we all around here believe that uh the markets only growing. Um and we're going to see the market and the big data market in general get bigger and bigger in the next few years. >>So, so obviously computer science is all the rage and and I'm particularly proud of hangout, we've had conversations in the hallway while you're tweeting about this and that. Um, but you know, silicon angles home is here, we've had, I've had a chance to watch you and the other guys here grow from, you know, from your other office was a san mateo or san Bruno somewhere in there. Like >>uh it was originally in burlingame, then we relocate the headquarters Palo Alto and now we have a satellite up in san Francisco. >>So you guys bolted out. You know, you have a full on blow in san Francisco office. So um there was a big busting at the seams here in Palo Alto people commuting down uh even building their burning man. Uh >>Oh yeah sure >>skits here and they're constructing their their homes here, but burning man, so we're doing that in san Francisco, what's the vibe like in san Francisco, tell us what's going on >>in san Francisco, san Francisco is great. It's, I'm I live in san Francisco as do a lot of us. About half the engineering team works up there now. Um you know we're running out of space there certainly. Um and you're already, oh yeah, oh yeah, we're hiring as fast as we absolutely can. Um so definitely not space to build the burning man huts there like like there is down, down in Palo Alto but it's great up there. >>What are you working on right now for project insurance? The computer science is one of the hot topics we've been covering on silicon angle, taking more of a social angle, social media has uh you know, moves from this pr kind of, you know, check in facebook fan page to hype to kind of a real deal social marketplace where you know data, social data, gestural data, mobile data geo data data is the center of the value proposition. So you live that every day. So talk about your view on the computer science landscape around data and why it's such a big deal. >>Oh sure. Uh I think data is sort of one of those uh fundamental uh things that can be uh mind for value across every industry, there's there's no industry out there that can't benefit from better understanding what their customers are doing, what their competitors are doing etcetera. And that's sort of the the unique value proposition of, you know, stuff like Hadoop. Um truly we we see interest from every sector that exists, which is great as for what the project that I'm specifically working on right now, I primarily work on H. D. F. S, which is the Hadoop distributed file system underlies pretty much all the other um projects in the Hadoop ecosystem. Uh and I'm particularly working with uh other colleagues at Cloudera and at other companies, yahoo and facebook on high availability for H. D. F. S, which has been um in some deployments is a serious concern. Hadoop is primarily a batch processing system, so it's less of a concern than in others. Um but when you start talking about running H base, which needs to be up all the time serving live traffic than having highly available H DFS is uh necessity and we're looking forward to delivering that >>talk about the criticism that H. D. F. S has been having. Um Well, I wouldn't say criticism. I mean, it's been a great, great product that produced the HDs, a core parts of how do you guys been contributing to the standard of Apache, that's no secret to the folks out there, that cloud area leads that effort. Um but there's new companies out there kind of trying a new approach and they're saying they're doing it better, what are they saying in terms and what's really happening? So, you know, there's some argument like, oh, we can do it better. And what's the what, why are they doing it, that was just to make money do a new venture, or is that, what's your opinion on that? Yeah, >>sure. I mean, I think it's natural to to want to go after uh parts of the core Hadoop system and say, you know, Hadoop is a great ecosystem, but what if we just swapped out this part or swapped out that part, couldn't couldn't we get some some really easy gains. Um and you know, sometimes that will be true. I have confidence that that that just will not simply not be true in in the very near future. One of the great benefits about Apache, Hadoop being open source is that we have a huge worldwide network of developers working at some of the best engineering organizations in the world who are all collaborating on this stuff. Um and, you know, I firmly believe that the collaborative open source process produces the best software and that's that's what Hadoop is at its very core. >>What about the arguments are saying that, oh, I need to commercialize it differently for my installed base bolt on a little proprietary extensions? Um That's legitimate argument. TMC might take that approach or um you know, map are I was trying to trying to rewrite uh H. T. F. >>S. To me, is >>it legitimate? I mean is there fighting going on in the standards? Maybe that's a political question you might want to answer. But give me a shot. >>I mean the Hadoop uh isn't there's no open standard for Hadoop. You can't say like this is uh this is like do compatible or anything like that. But you know what you can say is like this is Apache Hadoop. Uh And so in that sense there's no there's no fighting to be had there. Um Yeah, >>so yeah. Who um struggling as a company. But you know, there's a strong head Duke D. N. A. At yahoo, certainly, I talked with the the founder of the startup. Horton works just announced today that they have a new board member. He's the guy who's the Ceo of Horton works and now on bluster, I'm sorry, cluster announced they have um rob from benchmark on the board. Uh He's the Ceo of Horton works and and one of my not criticisms but points about Horton was this guy's an engineer, never run a company before. He's no Mike Olson. Okay, so you know, Michaelson has a long experience. So this guy comes into running and he's obviously in in open source, is that good for Yahoo and open sources. He they say they're going to continue to invest in Hadoop? They clearly are are still using a lot of Hadoop certainly. Um how is that changing Apache, is that causing more um consolidation, is that causing more energy? What's your view on the whole Horton works? Think >>um you know, yahoo is uh has been and will continue to be a huge contributor. Hadoop, they uh I can't say for sure, but I feel pretty confident that they have more data under management under Hadoop than anyone else in the world and there's no question in my mind that they'll continue to invest huge amounts of both key way effort and engineering effort and uh all of the things that Hadoop needs to to advance. Um I'm sure that Horton works will continue to work very closely with with yahoo. Um And you know, we're excited to see um more and more contributors to to Hadoop um both from Horton works and from yahoo proper. >>Cool, Well, I just want to clarify for the folks out there who don't understand what this whole yahoo thing is, It was not a spin out, these were key Hadoop core guys who left the company to form a startup of which yahoo financed with benchmark capital. So, yahoo is clearly and told me and reaffirm that with me that they are clearly investing more in Hadoop internally as well. So there's more people inside, yahoo that work on Hadoop than they are in the entire Horton's work company. So that's very clear. So just to clear that up out there. Um erin. so you're you're a young gun, right? You're a young whiz like Todd madam on here, explain to the folks out there um a little bit older maybe guys in their thirties or C IOS a lot of people are doing, you know, they're kicking the tires on big data, they're hearing about real time analytics, they're hearing about benefits have never heard before. Uh Dave a lot and I on the cube talk about, you know, the transformations that are going on, you're seeing AMC getting into big data, everyone's transforming at the enterprise level and service provider. What explains the folks why Hadoop is so important. Why is that? Do if not the fastest or one of the fastest growing projects in Apache ever? Sure. Even faster than the web server project, which is one of the better, >>better bigger ones. >>Why is the dupes and explain to them what it is? Well, you know, >>it's been it's pretty well covered that there's been an explosion of data that more data is produced every every year over and over. We talk about exabytes which is a quantity of data that is so large that pretty much no one can really theoretically comprehend it. Um and more and more uh organizations want to store and process and learn from, you know, get insights from that data um in addition to just the explosion of data um you know that there is simply more data, organizations are less willing to discard data. One of the beauties of Hadoop is truly that it's so very inexpensive per terabyte to store data that you don't have to think up front about what you want to store, what you want to discard, store it all and figure out later what is the most useful bits we call that sort of schema on read. Um as opposed to, you know, figuring out the schema a priority. Um and that is a very powerful shift in dynamics of data storage in general. And I think that's very attractive to all sorts of organizations. >>Your, I'll see a Brown graduate and you have some interns from Brown to Brown um, Premier computer science program almost as good as when I went to school at Northeastern University. >>Um >>you know, the unsung heroes of computer science only kidding Brown's great program, but you know, cutting edge computer science areas known as obviously leading in a lot of the computer science areas do in general is known that you gotta be pretty savvy to be either masters level PhD to kind of play in this area? Not a lot of adoption, what I call the grassroots developers. What's your vision and how do you see the computer science, younger generation, even younger than you kind of growing up into this because those tools aren't yet developed. You still got to be, you're pretty strong from a computer science perspective and also explained to the folks who aren't necessarily at the browns of the world or getting into computer science, what about, what is that this revolution about and where is it going? What are some of the things you see happening around the corner that that might not be obvious. >>Sure there's a few questions there. Um part of it is how do people coming out of college get into this thing, It's not uh taught all that much in school, How do how do you sort of make the leap from uh the standard computer science curriculum into this sort of thing? And um you know, part of it is that really we're seeing more and more schools offering distributed computing classes or they have grids available um to to do this stuff there there is some research coming out of Brown actually and lots of other schools about Hadoop proper in the behavior of Hadoop under failure scenarios, that sort of stuff, which is very interesting. Google uh actually has classes that they teach, I believe in conjunction with the University of Washington um where they teach undergraduates and your master's level, graduate students about mass produced and distributed computing and they actually use Hadoop to do it because it is the architecture of Hadoop is modeled after um >>uh >>google's internal infrastructure. Um So you know that that's that's one way we're seeing more and more people who are just coming out of college who have distributed systems uh knowledge like this? Um Another question? the other part of the question you asked is how does um how does the ordinary developer get into this stuff? And the answer is we're working hard, you know, we and others in the hindu community are working hard on making it, making her do just much easier to consume. We released, you cover this fair bit, the ECM Express project that lets you install Hadoop with just minimal effort as close to 11 click as possible. Um and there's lots of um sort of layers built on top of Hadoop to make it more easily consumed by developers Hive uh sort of sequel like interface on top of mass produce. And Pig has its own DSL for programming against mass produce. Um so you don't have to write heart, you don't have to write straight map produced code, anything like that. Uh and it's getting easier for operators every day. >>Well, I mean, evolution was, I mean, you guys actually working on that cloud era. Um what about what about some of the abstractions? You're seeing those big the Rage is, you know, look back a year ago VM World coming up and uh little plugs looking angle dot tv will be broadcasting live and at VM World. Um you know, he has been on the Q XV m where um Spring Source was a big announcement that they made. Um, Haruka brought by Salesforce Cloud Software frameworks are big, what does that look like and how does it relate to do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and networks kind of collide and you got the you got the kind of the intersection of, you know, software frameworks and networks obviously, you know, in the big players, we talk about E M C. And these guys, it's clear that they realize that software is going to be their key differentiator. So it's got to get to a framework stand, what is Hadoop and Apache talking about this kind of uh, evolution for for Hadoop. >>Sure. Well, you know, I think we're seeing very much the commoditization of hardware. Um, you just can't buy bigger and bigger computers anymore. They just don't exist. So you're going to need something that can take a lot of little computers and make it look like one big computer. And that's what Hadoop is especially good at. Um we talk about scaling out instead of scaling up, you can just buy more relatively inexpensive computers. Uh and that's great. And sort of the beauty of Hadoop, um, is that it will grow linearly as your data set as your um, your your scale, your traffic, whatever grows. Um and you don't have to have this exponential price increase of buying bigger and bigger computers, You can just buy more. Um and that that's sort of the beauty of it is a software framework that if you write against it. Um you don't have to think about the scaling anymore. It will do that for you. >>Okay. The question for you, it's gonna kind of a weird question but try to tackle it. You're at a party having a few cocktails, having a few beers with your buddies and your buddies who works at a big enterprise says man we've got all this legacy structured data systems, I need to implement some big data strategy, all this stuff. What do I do? >>Sure, sure. Um Not the question I thought you were going to ask me that you >>were a g rated program here. >>Okay. I thought you were gonna ask me, how do I explain what I do to you know people that we'll get to that next. Okay. Um Yeah, I mean I would say that the first thing to do is to implement a start, start small, implement a proof of concept, get a subset of the data that you would like to analyze, put it, put Hadoop on a few machines, four or five, something like that and start writing some hive queries, start writing some some pig scripts and I think you'll you know pretty quickly and easily see the value that you can get out of it and you can do so with the knowledge that when you do want to operate over your entire data set, you will absolutely be able to trivially scale to that size. >>Okay. So now the question that I want to ask is that you're at a party and I want to say, what do you >>do? You usually tell people in my hedge fund manager? No but seriously um I I tell people I work on distributed supercomputers. Software for distributed supercomputers and that people have some idea what distributed means and supercomputers and they figure that out. >>So final question for I know you gotta go get back to programming uh some code here. Um what's the future of Hadoop in the sense of from a developer standpoint? I was having a conversation with a developer who's a big data jockey and talking about Miss kelly gets anything and get his hands on G. O. Data, text data because the data data junkie and he says I just don't know what to build. Um What are some of the enabling apps that you may see out there and or you have just conceiving just brainstorming out there, what's possible with with data, can you envision the next five years, what are you gonna see evolve and what some of the coolest things you've seen that might that are happening right now. >>Sure. Sure. I mean I think you're going to see uh just the front ends to these things getting just easier and easier and easier to interact with and at some point you won't even know that you're interacting with a Hadoop cluster that will be the engine underneath the hood but you know, you'll you'll be uh from your perspective you'll be driving a Ferrari and by that I mean you know, standard B. I tool, standard sequel query language. Um we'll all be implemented on top of this stuff and you know from that perspective you could implement, you know, really anything you want. Um We're seeing a lot of great work coming out of just identifying trends amongst masses of data that you know, if you tried to analyze it with any other tool, you'd either have to distill it down so far that you would you would question your results or that you could only run the very simplest sort of queries over um and not really get those like powerful deep insights, those sort of correlative insights um that we're seeing people do. So I think you'll see, you'll continue to see uh great recommendations systems coming out of this stuff. You'll see um root cause analysis, you'll see great work coming out of the advertising industry um to you know to really say which ad was responsible for this purchase. Was it really the last ad they clicked on or was it the ad they saw five weeks ago they put the thought in mind that sort of correlative analysis is being empowered by big data systems like a dupe. >>Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college and say I could use big data to create a differentiation and build an airline based on one differentiation. These are cool new ways and, and uh, data we've never seen before. So Aaron, uh, thanks for coming >>on the issue >>um, your inside Palo Alto Studio and we're going to.

Published Date : Sep 28 2011

SUMMARY :

the market who have been talking about uh you know, a lot of first time entrepreneurs doing their startups and I've been Uh, the amount of experience take us through who you are and when you join Cloudera, I want your background. Um but you know, I I sort of followed my other colleagues you know, from your other office was a san mateo or san Bruno somewhere in there. So you guys bolted out. Um you know we're running out of space there certainly. on silicon angle, taking more of a social angle, social media has uh you know, Um but when you start talking about running H base, which needs to be up all the time serving live traffic So, you know, there's some argument like, oh, we can do it better. Um and you know, sometimes that will be true. TMC might take that approach or um you know, map are I was trying to trying to rewrite Maybe that's a political question you might want to answer. But you know what you can say is like this is Apache Hadoop. so you know, Michaelson has a long experience. Um And you know, we're excited to see um more and more contributors to Uh Dave a lot and I on the cube talk about, you know, per terabyte to store data that you don't have to think up front about what Your, I'll see a Brown graduate and you have some interns from Brown to Brown What are some of the things you see happening around the corner that And um you know, part of it is that really we're seeing more and more schools offering And the answer is we're working hard, you know, we and others in the hindu community are working do and the ecosystem around Hadoop where, you know, the rage is the software frameworks and Um and that that's sort of the beauty of it is a software framework I need to implement some big data strategy, all this stuff. Um Not the question I thought you were going to ask me that you the value that you can get out of it and you can do so with the knowledge that when you do and that people have some idea what distributed means and supercomputers and they figure that out. apps that you may see out there and or you have just conceiving just brainstorming out out of just identifying trends amongst masses of data that you know, if you tried Well I'm bullish on big data, I think people I think it's gonna be even bigger than I think you're gonna have some kids come out of college

ENTITIES

Entity	Category	Confidence
Mike Olson	PERSON	0.99+
yahoo	ORGANIZATION	0.99+
Mike Charles	PERSON	0.99+
san Francisco	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Aaron T. Myers	PERSON	0.99+
University of Washington	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
facebook	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
richard Kimble	PERSON	0.99+
Michaelson	PERSON	0.99+
two interns	QUANTITY	0.99+
Oregon	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Todd	PERSON	0.99+
Claudia	PERSON	0.99+
AMC	ORGANIZATION	0.99+
five weeks ago	DATE	0.99+
Northeastern University	ORGANIZATION	0.99+
monday	DATE	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
Dave	PERSON	0.99+
TMC	ORGANIZATION	0.99+
ralph kimball	PERSON	0.99+
burlingame	LOCATION	0.99+
Ferrari	ORGANIZATION	0.98+
today	DATE	0.98+
five	QUANTITY	0.98+
Brown	ORGANIZATION	0.98+
thirties	QUANTITY	0.98+
one	QUANTITY	0.98+
Horton	ORGANIZATION	0.98+
Apache	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
erin	PERSON	0.98+
google	ORGANIZATION	0.97+
One	QUANTITY	0.97+
twitter	ORGANIZATION	0.97+
Brown	PERSON	0.97+
a year ago	DATE	0.97+
Salesforce	ORGANIZATION	0.97+
john furry	PERSON	0.96+
one big computer	QUANTITY	0.95+
new york city	LOCATION	0.95+
Mendel	PERSON	0.94+

Breaking Analysis: Databricks faces critical strategic decisions…here’s why

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
David Floyer	PERSON	0.99+
Mike Olson	PERSON	0.99+
2014	DATE	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
George	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
Dave	PERSON	0.99+
Uber	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Sun Microsystems	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Bob Muglia	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
2010	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Kristin Martin	PERSON	0.99+
Rob Hof	PERSON	0.99+
three	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Databricks'	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
Boston	LOCATION	0.99+
Tristan Handy	PERSON	0.99+
M&A	ORGANIZATION	0.99+
Frank Quattrone	PERSON	0.99+
second element	QUANTITY	0.99+
Daren Brabham	PERSON	0.99+
TechAlpha Partners	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
50 year	QUANTITY	0.99+
40%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
five years	QUANTITY	0.99+

SiliconANGLE News | Beyond the Buzz: A deep dive into the impact of AI

(upbeat music) >> Hello, everyone, welcome to theCUBE. I'm John Furrier, the host of theCUBE in Palo Alto, California. Also it's SiliconANGLE News. Got two great guests here to talk about AI, the impact of the future of the internet, the applications, the people. Amr Awadallah, the founder and CEO, Ed Alban is the CEO of Vectara, a new startup that emerged out of the original Cloudera, I would say, 'cause Amr's known, famous for the Cloudera founding, which was really the beginning of the big data movement. And now as AI goes mainstream, there's so much to talk about, so much to go on. And plus the new company is one of the, now what I call the wave, this next big wave, I call it the fifth wave in the industry. You know, you had PCs, you had the internet, you had mobile. This generative AI thing is real. And you're starting to see startups come out in droves. Amr obviously was founder of Cloudera, Big Data, and now Vectara. And Ed Albanese, you guys have a new company. Welcome to the show. >> Thank you. It's great to be here. >> So great to see you. Now the story is theCUBE started in the Cloudera office. Thanks to you, and your friendly entrepreneurship views that you have. We got to know each other over the years. But Cloudera had Hadoop, which was the beginning of what I call the big data wave, which then became what we now call data lakes, data oceans, and data infrastructure that's developed from that. It's almost interesting to look back 12 plus years, and see that what AI is doing now, right now, is opening up the eyes to the mainstream, and the application's almost mind blowing. You know, Sati Natel called it the Mosaic Moment, didn't say Netscape, he built Netscape (laughing) but called it the Mosaic Moment. You're seeing companies in startups, kind of the alpha geeks running here, because this is the new frontier, and there's real meat on the bone, in terms of like things to do. Why? Why is this happening now? What's is the confluence of the forces happening, that are making this happen? >> Yeah, I mean if you go back to the Cloudera days, with big data, and so on, that was more about data processing. Like how can we process data, so we can extract numbers from it, and do reporting, and maybe take some actions, like this is a fraud transaction, or this is not. And in the meanwhile, many of the researchers working in the neural network, and deep neural network space, were trying to focus on data understanding, like how can I understand the data, and learn from it, so I can take actual actions, based on the data directly, just like a human does. And we were only good at doing that at the level of somebody who was five years old, or seven years old, all the way until about 2013. And starting in 2013, which is only 10 years ago, a number of key innovations started taking place, and each one added on. It was no major innovation that just took place. It was a couple of really incremental ones, but they added on top of each other, in a very exponentially additive way, that led to, by the end of 2019, we now have models, deep neural network models, that can read and understand human text just like we do. Right? And they can reason about it, and argue with you, and explain it to you. And I think that's what is unlocking this whole new wave of innovation that we're seeing right now. So data understanding would be the essence of it. >> So it's not a Big Bang kind of theory, it's been evolving over time, and I think that the tipping point has been the advancements and other things. I mean look at cloud computing, and look how fast it just crept up on AWS. I mean AWS you back three, five years ago, I was talking to Swami yesterday, and their big news about AI, expanding the Hugging Face's relationship with AWS. And just three, five years ago, there wasn't a model training models out there. But as compute comes out, and you got more horsepower,, these large language models, these foundational models, they're flexible, they're not monolithic silos, they're interacting. There's a whole new, almost fusion of data happening. Do you see that? I mean is that part of this? >> Of course, of course. I mean this wave is building on all the previous waves. We wouldn't be at this point if we did not have hardware that can scale, in a very efficient way. We wouldn't be at this point, if we don't have data that we're collecting about everything we do, that we're able to process in this way. So this, this movement, this motion, this phase we're in, absolutely builds on the shoulders of all the previous phases. For some of the observers from the outside, when they see chatGPT for the first time, for them was like, "Oh my god, this just happened overnight." Like it didn't happen overnight. (laughing) GPT itself, like GPT3, which is what chatGPT is based on, was released a year ahead of chatGPT, and many of us were seeing the power it can provide, and what it can do. I don't know if Ed agrees with that. >> Yeah, Ed? >> I do. Although I would acknowledge that the possibilities now, because of what we've hit from a maturity standpoint, have just opened up in an incredible way, that just wasn't tenable even three years ago. And that's what makes it, it's true that it developed incrementally, in the same way that, you know, the possibilities of a mobile handheld device, you know, in 2006 were there, but when the iPhone came out, the possibilities just exploded. And that's the moment we're in. >> Well, I've had many conversations over the past couple months around this area with chatGPT. John Markoff told me the other day, that he calls it, "The five dollar toy," because it's not that big of a deal, in context to what AI's doing behind the scenes, and all the work that's done on ethics, that's happened over the years, but it has woken up the mainstream, so everyone immediately jumps to ethics. "Does it work? "It's not factual," And everyone who's inside the industry is like, "This is amazing." 'Cause you have two schools of thought there. One's like, people that think this is now the beginning of next gen, this is now we're here, this ain't your grandfather's chatbot, okay?" With NLP, it's got reasoning, it's got other things. >> I'm in that camp for sure. >> Yeah. Well I mean, everyone who knows what's going on is in that camp. And as the naysayers start to get through this, and they go, "Wow, it's not just plagiarizing homework, "it's helping me be better. "Like it could rewrite my memo, "bring the lead to the top." It's so the format of the user interface is interesting, but it's still a data-driven app. >> Absolutely. >> So where does it go from here? 'Cause I'm not even calling this the first ending. This is like pregame, in my opinion. What do you guys see this going, in terms of scratching the surface to what happens next? >> I mean, I'll start with, I just don't see how an application is going to look the same in the next three years. Who's going to want to input data manually, in a form field? Who is going to want, or expect, to have to put in some text in a search box, and then read through 15 different possibilities, and try to figure out which one of them actually most closely resembles the question they asked? You know, I don't see that happening. Who's going to start with an absolute blank sheet of paper, and expect no help? That is not how an application will work in the next three years, and it's going to fundamentally change how people interact and spend time with opening any element on their mobile phone, or on their computer, to get something done. >> Yes. I agree with that. Like every single application, over the next five years, will be rewritten, to fit within this model. So imagine an HR application, I don't want to name companies, but imagine an HR application, and you go into application and you clicking on buttons, because you want to take two weeks of vacation, and menus, and clicking here and there, reasons and managers, versus just telling the system, "I'm taking two weeks of vacation, going to Las Vegas," book it, done. >> Yeah. >> And the system just does it for you. If you weren't completing in your input, in your description, for what you want, then the system asks you back, "Did you mean this? "Did you mean that? "Were you trying to also do this as well?" >> Yeah. >> "What was the reason?" And that will fit it for you, and just do it for you. So I think the user interface that we have with apps, is going to change to be very similar to the user interface that we have with each other. And that's why all these apps will need to evolve. >> I know we don't have a lot of time, 'cause you guys are very busy, but I want to definitely have multiple segments with you guys, on this topic, because there's so much to talk about. There's a lot of parallels going on here. I was talking again with Swami who runs all the AI database at AWS, and I asked him, I go, "This feels a lot like the original AWS. "You don't have to provision a data center." A lot of this heavy lifting on the back end, is these large language models, with these foundational models. So the bottleneck in the past, was the energy, and cost to actually do it. Now you're seeing it being stood up faster. So there's definitely going to be a tsunami of apps. I would see that clearly. What is it? We don't know yet. But also people who are going to leverage the fact that I can get started building value. So I see a startup boom coming, and I see an application tsunami of refactoring things. >> Yes. >> So the replatforming is already kind of happening. >> Yes, >> OpenAI, chatGPT, whatever. So that's going to be a developer environment. I mean if Amazon turns this into an API, or a Microsoft, what you guys are doing. >> We're turning it into API as well. That's part of what we're doing as well, yes. >> This is why this is exciting. Amr, you've lived the big data dream, and and we used to talk, if you didn't have a big data problem, if you weren't full of data, you weren't really getting it. Now people have all the data, and they got to stand this up. >> Yeah. >> So the analogy is again, the mobile, I like the mobile movement, and using mobile as an analogy, most companies were not building for a mobile environment, right? They were just building for the web, and legacy way of doing apps. And as soon as the user expectations shifted, that my expectation now, I need to be able to do my job on this small screen, on the mobile device with a touchscreen. Everybody had to invest in re-architecting, and re-implementing every single app, to fit within that model, and that model of interaction. And we are seeing the exact same thing happen now. And one of the core things we're focused on at Vectara, is how to simplify that for organizations, because a lot of them are overwhelmed by large language models, and ML. >> They don't have the staff. >> Yeah, yeah, yeah. They're understaffed, they don't have the skills. >> But they got developers, they've got DevOps, right? >> Yes. >> So they have the DevSecOps going on. >> Exactly, yes. >> So our goal is to simplify it enough for them that they can start leveraging this technology effectively, within their applications. >> Ed, you're the COO of the company, obviously a startup. You guys are growing. You got great backup, and good team. You've also done a lot of business development, and technical business development in this area. If you look at the landscape right now, and I agree the apps are coming, every company I talk to, that has that jet chatGPT of, you know, epiphany, "Oh my God, look how cool this is. "Like magic." Like okay, it's code, settle down. >> Mm hmm. >> But everyone I talk to is using it in a very horizontal way. I talk to a very senior person, very tech alpha geek, very senior person in the industry, technically. they're using it for log data, they're using it for configuration of routers. And in other areas, they're using it for, every vertical has a use case. So this is horizontally scalable from a use case standpoint. When you hear horizontally scalable, first thing I chose in my mind is cloud, right? >> Mm hmm. >> So cloud, and scalability that way. And the data is very specialized. So now you have this vertical specialization, horizontally scalable, everyone will be refactoring. What do you see, and what are you seeing from customers, that you talk to, and prospects? >> Yeah, I mean put yourself in the shoes of an application developer, who is actually trying to make their application a bit more like magic. And to have that soon-to-be, honestly, expected experience. They've got to think about things like performance, and how efficiently that they can actually execute a query, or a question. They've got to think about cost. Generative isn't cheap, like the inference of it. And so you've got to be thoughtful about how and when you take advantage of it, you can't use it as a, you know, everything looks like a nail, and I've got a hammer, and I'm going to hit everything with it, because that will be wasteful. Developers also need to think about how they're going to take advantage of, but not lose their own data. So there has to be some controls around what they feed into the large language model, if anything. Like, should they fine tune a large language model with their own data? Can they keep it logically separated, but still take advantage of the powers of a large language model? And they've also got to take advantage, and be aware of the fact that when data is generated, that it is a different class of data. It might not fully be their own. >> Yeah. >> And it may not even be fully verified. And so when the logical cycle starts, of someone making a request, the relationship between that request, and the output, those things have to be stored safely, logically, and identified as such. >> Yeah. >> And taken advantage of in an ongoing fashion. So these are mega problems, each one of them independently, that, you know, you can think of it as middleware companies need to take advantage of, and think about, to help the next wave of application development be logical, sensible, and effective. It's not just calling some raw API on the cloud, like openAI, and then just, you know, you get your answer and you're done, because that is a very brute force approach. >> Well also I will point, first of all, I agree with your statement about the apps experience, that's going to be expected, form filling. Great point. The interesting about chatGPT. >> Sorry, it's not just form filling, it's any action you would like to take. >> Yeah. >> Instead of clicking, and dragging, and dropping, and doing it on a menu, or on a touch screen, you just say it, and it's and it happens perfectly. >> Yeah. It's a different interface. And that's why I love that UIUX experiences, that's the people falling out of their chair moment with chatGPT, right? But a lot of the things with chatGPT, if you feed it right, it works great. If you feed it wrong and it goes off the rails, it goes off the rails big. >> Yes, yes. >> So the the Bing catastrophes. >> Yeah. >> And that's an example of garbage in, garbage out, classic old school kind of comp-side phrase that we all use. >> Yep. >> Yes. >> This is about data in injection, right? It reminds me the old SQL days, if you had to, if you can sling some SQL, you were a magician, you know, to get the right answer, it's pretty much there. So you got to feed the AI. >> You do, Some people call this, the early word to describe this as prompt engineering. You know, old school, you know, search, or, you know, engagement with data would be, I'm going to, I have a question or I have a query. New school is, I have, I have to issue it a prompt, because I'm trying to get, you know, an action or a reaction, from the system. And the active engineering, there are a lot of different ways you could do it, all the way from, you know, raw, just I'm going to send you whatever I'm thinking. >> Yeah. >> And you get the unintended outcomes, to more constrained, where I'm going to just use my own data, and I'm going to constrain the initial inputs, the data I already know that's first party, and I trust, to, you know, hyper constrain, where the application is actually, it's looking for certain elements to respond to. >> It's interesting Amr, this is why I love this, because one we are in the media, we're recording this video now, we'll stream it. But we got all your linguistics, we're talking. >> Yes. >> This is data. >> Yep. >> So the data quality becomes now the new intellectual property, because, if you have that prompt source data, it makes data or content, in our case, the original content, intellectual property. >> Absolutely. >> Because that's the value. And that's where you see chatGPT fall down, is because they're trying to scroll the web, and people think it's search. It's not necessarily search, it's giving you something that you wanted. It is a lot of that, I remember in Cloudera, you said, "Ask the right questions." Remember that phrase you guys had, that slogan? >> Mm hmm. And that's prompt engineering. So that's exactly, that's the reinvention of "Ask the right question," is prompt engineering is, if you don't give these models the question in the right way, and very few people know how to frame it in the right way with the right context, then you will get garbage out. Right? That is the garbage in, garbage out. But if you specify the question correctly, and you provide with it the metadata that constrain what that question is going to be acted upon or answered upon, then you'll get much better answers. And that's exactly what we solved Vectara. >> Okay. So before we get into the last couple minutes we have left, I want to make sure we get a plug in for the opportunity, and the profile of Vectara, your new company. Can you guys both share with me what you think the current situation is? So for the folks who are now having those moments of, "Ah, AI's bullshit," or, "It's not real, it's a lot of stuff," from, "Oh my god, this is magic," to, "Okay, this is the future." >> Yes. >> What would you say to that person, if you're at a cocktail party, or in the elevator say, "Calm down, this is the first inning." How do you explain the dynamics going on right now, to someone who's either in the industry, but not in the ropes? How would you explain like, what this wave's about? How would you describe it, and how would you prepare them for how to change their life around this? >> Yeah, so I'll go first and then I'll let Ed go. Efficiency, efficiency is the description. So we figured that a way to be a lot more efficient, a way where you can write a lot more emails, create way more content, create way more presentations. Developers can develop 10 times faster than they normally would. And that is very similar to what happened during the Industrial Revolution. I always like to look at examples from the past, to read what will happen now, and what will happen in the future. So during the Industrial Revolution, it was about efficiency with our hands, right? So I had to make a piece of cloth, like this piece of cloth for this shirt I'm wearing. Our ancestors, they had to spend month taking the cotton, making it into threads, taking the threads, making them into pieces of cloth, and then cutting it. And now a machine makes it just like that, right? And the ancestors now turned from the people that do the thing, to manage the machines that do the thing. And I think the same thing is going to happen now, is our efficiency will be multiplied extremely, as human beings, and we'll be able to do a lot more. And many of us will be able to do things they couldn't do before. So another great example I always like to use is the example of Google Maps, and GPS. Very few of us knew how to drive a car from one location to another, and read a map, and get there correctly. But once that efficiency of an AI, by the way, behind these things is very, very complex AI, that figures out how to do that for us. All of us now became amazing navigators that can go from any point to any point. So that's kind of how I look at the future. >> And that's a great real example of impact. Ed, your take on how you would talk to a friend, or colleague, or anyone who asks like, "How do I make sense of the current situation? "Is it real? "What's in it for me, and what do I do?" I mean every company's rethinking their business right now, around this. What would you say to them? >> You know, I usually like to show, rather than describe. And so, you know, the other day I just got access, I've been using an application for a long time, called Notion, and it's super popular. There's like 30 or 40 million users. And the new version of Notion came out, which has AI embedded within it. And it's AI that allows you primarily to create. So if you could break down the world of AI into find and create, for a minute, just kind of logically separate those two things, find is certainly going to be massively impacted in our experiences as consumers on, you know, Google and Bing, and I can't believe I just said the word Bing in the same sentence as Google, but that's what's happening now (all laughing), because it's a good example of change. >> Yes. >> But also inside the business. But on the crate side, you know, Notion is a wiki product, where you try to, you know, note down things that you are thinking about, or you want to share and memorialize. But sometimes you do need help to get it down fast. And just in the first day of using this new product, like my experience has really fundamentally changed. And I think that anybody who would, you know, anybody say for example, that is using an existing app, I would show them, open up the app. Now imagine the possibility of getting a starting point right off the bat, in five seconds of, instead of having to whole cloth draft this thing, imagine getting a starting point then you can modify and edit, or just dispose of and retry again. And that's the potential for me. I can't imagine a scenario where, in a few years from now, I'm going to be satisfied if I don't have a little bit of help, in the same way that I don't manually spell check every email that I send. I automatically spell check it. I love when I'm getting type ahead support inside of Google, or anything. Doesn't mean I always take it, or when texting. >> That's efficiency too. I mean the cloud was about developers getting stuff up quick. >> Exactly. >> All that heavy lifting is there for you, so you don't have to do it. >> Right? >> And you get to the value faster. >> Exactly. I mean, if history taught us one thing, it's, you have to always embrace efficiency, and if you don't fast enough, you will fall behind. Again, looking at the industrial revolution, the companies that embraced the industrial revolution, they became the leaders in the world, and the ones who did not, they all like. >> Well the AI thing that we got to watch out for, is watching how it goes off the rails. If it doesn't have the right prompt engineering, or data architecture, infrastructure. >> Yes. >> It's a big part. So this comes back down to your startup, real quick, I know we got a couple minutes left. Talk about the company, the motivation, and we'll do a deeper dive on on the company. But what's the motivation? What are you targeting for the market, business model? The tech, let's go. >> Actually, I would like Ed to go first. Go ahead. >> Sure, I mean, we're a developer-first, API-first platform. So the product is oriented around allowing developers who may not be superstars, in being able to either leverage, or choose, or select their own large language models for appropriate use cases. But they that want to be able to instantly add the power of large language models into their application set. We started with search, because we think it's going to be one of the first places that people try to take advantage of large language models, to help find information within an application context. And we've built our own large language models, focused on making it very efficient, and elegant, to find information more quickly. So what a developer can do is, within minutes, go up, register for an account, and get access to a set of APIs, that allow them to send data, to be converted into a format that's easy to understand for large language models, vectors. And then secondarily, they can issue queries, ask questions. And they can ask them very, the questions that can be asked, are very natural language questions. So we're talking about long form sentences, you know, drill down types of questions, and they can get answers that either come back in depending upon the form factor of the user interface, in list form, or summarized form, where summarized equals the opportunity to kind of see a condensed, singular answer. >> All right. I have a. >> Oh okay, go ahead, you go. >> I was just going to say, I'm going to be a customer for you, because I want, my dream was to have a hologram of theCUBE host, me and Dave, and have questions be generated in the metaverse. So you know. (all laughing) >> There'll be no longer any guests here. They'll all be talking to you guys. >> Give a couple bullets, I'll spit out 10 good questions. Publish a story. This brings the automation, I'm sorry to interrupt you. >> No, no. No, no, I was just going to follow on on the same. So another way to look at exactly what Ed described is, we want to offer you chatGPT for your own data, right? So imagine taking all of the recordings of all of the interviews you have done, and having all of the content of that being ingested by a system, where you can now have a conversation with your own data and say, "Oh, last time when I met Amr, "which video games did we talk about? "Which movie or book did we use as an analogy "for how we should be embracing data science, "and big data, which is moneyball," I know you use moneyball all the time. And you start having that conversation. So, now the data doesn't become a passive asset that you just have in your organization. No. It's an active participant that's sitting with you, on the table, helping you make decisions. >> One of my favorite things to do with customers, is to go to their site or application, and show them me using it. So for example, one of the customers I talked to was one of the biggest property management companies in the world, that lets people go and rent homes, and houses, and things like that. And you know, I went and I showed them me searching through reviews, looking for information, and trying different words, and trying to find out like, you know, is this place quiet? Is it comfortable? And then I put all the same data into our platform, and I showed them the world of difference you can have when you start asking that question wholeheartedly, and getting real information that doesn't have anything to do with the words you asked, but is really focused on the meaning. You know, when I asked like, "Is it quiet?" You know, answers would come back like, "The wind whispered through the trees peacefully," and you know, it's like nothing to do with quiet in the literal word sense, but in the meaning sense, everything to do with it. And that that was magical even for them, to see that. >> Well you guys are the front end of this big wave. Congratulations on the startup, Amr. I know you guys got great pedigree in big data, and you've got a great team, and congratulations. Vectara is the name of the company, check 'em out. Again, the startup boom is coming. This will be one of the major waves, generative AI is here. I think we'll look back, and it will be pointed out as a major inflection point in the industry. >> Absolutely. >> There's not a lot of hype behind that. People are are seeing it, experts are. So it's going to be fun, thanks for watching. >> Thanks John. (soft music)

Published Date : Feb 23 2023

SUMMARY :

I call it the fifth wave in the industry. It's great to be here. and the application's almost mind blowing. And in the meanwhile, and you got more horsepower,, of all the previous phases. in the same way that, you know, and all the work that's done on ethics, "bring the lead to the top." in terms of scratching the surface and it's going to fundamentally change and you go into application And the system just does it for you. is going to change to be very So the bottleneck in the past, So the replatforming is So that's going to be a That's part of what and they got to stand this up. And one of the core things don't have the skills. So our goal is to simplify it and I agree the apps are coming, I talk to a very senior And the data is very specialized. and be aware of the fact that request, and the output, some raw API on the cloud, about the apps experience, it's any action you would like to take. you just say it, and it's But a lot of the things with chatGPT, comp-side phrase that we all use. It reminds me the old all the way from, you know, raw, and I'm going to constrain But we got all your So the data quality And that's where you That is the garbage in, garbage out. So for the folks who are and how would you prepare them that do the thing, to manage the current situation? And the new version of Notion came out, But on the crate side, you I mean the cloud was about developers so you don't have to do it. and the ones who did not, they all like. If it doesn't have the So this comes back down to Actually, I would like Ed to go first. factor of the user interface, I have a. generated in the metaverse. They'll all be talking to you guys. This brings the automation, of all of the interviews you have done, one of the customers I talked to Vectara is the name of the So it's going to be fun, Thanks John.

ENTITIES

Entity	Category	Confidence
John Markoff	PERSON	0.99+
2013	DATE	0.99+
AWS	ORGANIZATION	0.99+
Ed Alban	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
30	QUANTITY	0.99+
10 times	QUANTITY	0.99+
2006	DATE	0.99+
John Furrier	PERSON	0.99+
two weeks	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Ed Albanese	PERSON	0.99+
John	PERSON	0.99+
five seconds	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Ed	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
10 good questions	QUANTITY	0.99+
Swami	PERSON	0.99+
15 different possibilities	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Vectara	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
Google	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
both	QUANTITY	0.99+
end of 2019	DATE	0.99+
yesterday	DATE	0.98+
Big Data	ORGANIZATION	0.98+
40 million users	QUANTITY	0.98+
two things	QUANTITY	0.98+
two great guests	QUANTITY	0.98+
12 plus years	QUANTITY	0.98+
one	QUANTITY	0.98+
five dollar	QUANTITY	0.98+
Netscape	ORGANIZATION	0.98+
five years ago	DATE	0.98+
SQL	TITLE	0.98+
first inning	QUANTITY	0.98+
Amr	PERSON	0.97+
two schools	QUANTITY	0.97+
first	QUANTITY	0.97+
10 years ago	DATE	0.97+
One	QUANTITY	0.96+
first day	QUANTITY	0.96+
three	DATE	0.96+
chatGPT	TITLE	0.96+
first places	QUANTITY	0.95+
Bing	ORGANIZATION	0.95+
Notion	TITLE	0.95+
first thing	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Beyond the Buzz	TITLE	0.94+
Sati Natel	PERSON	0.94+
Industrial Revolution	EVENT	0.93+
one location	QUANTITY	0.93+
three years ago	DATE	0.93+
single application	QUANTITY	0.92+
one thing	QUANTITY	0.91+
first platform	QUANTITY	0.91+
five years old	QUANTITY	0.91+

Opher Kahane, Sonoma Ventures | CloudNativeSecurityCon 23

(uplifting music) >> Hello, welcome back to theCUBE's coverage of CloudNativeSecurityCon, the inaugural event, in Seattle. I'm John Furrier, host of theCUBE, here in the Palo Alto Studios. We're calling it theCUBE Center. It's kind of like our Sports Center for tech. It's kind of remote coverage. We've been doing this now for a few years. We're going to amp it up this year as more events are remote, and happening all around the world. So, we're going to continue the coverage with this segment focusing on the data stack, entrepreneurial opportunities around all things security, and as, obviously, data's involved. And our next guest is a friend of theCUBE, and CUBE alumni from 2013, entrepreneur himself, turned, now, venture capitalist angel investor, with his own firm, Opher Kahane, Managing Director, Sonoma Ventures. Formerly the founder of Origami, sold to Intuit a few years back. Focusing now on having a lot of fun, angel investing on boards, focusing on data-driven applications, and stacks around that, and all the stuff going on in, really, in the wheelhouse for what's going on around security data. Opher, great to see you. Thanks for coming on. >> My pleasure. Great to be back. It's been a while. >> So you're kind of on Easy Street now. You did the entrepreneurial venture, you've worked hard. We were on together in 2013 when theCUBE just started. XCEL Partners had an event in Stanford, XCEL, and they had all the features there. We interviewed Satya Nadella, who was just a manager at Microsoft at that time, he was there. He's now the CEO of Microsoft. >> Yeah, he was. >> A lot's changed in nine years. But congratulations on your venture you sold, and you got an exit there, and now you're doing a lot of investments. I'd love to get your take, because this is really the biggest change I've seen in the past 12 years, around an inflection point around a lot of converging forces. Data, which, big data, 10 years ago, was a big part of your career, but now it's accelerated, with cloud scale. You're seeing people building scale on top of other clouds, and becoming their own cloud. You're seeing data being a big part of it. Cybersecurity kind of has not really changed much, but it's the most important thing everyone's talking about. So, developers are involved, data's involved, a lot of entrepreneurial opportunities. So I'd love to get your take on how you see the current situation, as it relates to what's gone on in the past five years or so. What's the big story? >> So, a lot of big stories, but I think a lot of it has to do with a promise of making value from data, whether it's for cybersecurity, for Fintech, for DevOps, for RevTech startups and companies. There's a lot of challenges in actually driving and monetizing the value from data with velocity. Historically, the challenge has been more around, "How do I store data at massive scale?" And then you had the big data infrastructure company, like Cloudera, and MapR, and others, deal with it from a scale perspective, from a storage perspective. Then you had a whole layer of companies that evolved to deal with, "How do I index massive scales of data, for quick querying, and federated access, et cetera?" But now that a lot of those underlying problems, if you will, have been solved, to a certain extent, although they're always being stretched, given the scale of data, and its utility is becoming more and more massive, in particular with AI use cases being very prominent right now, the next level is how to actually make value from the data. How do I manage the full lifecycle of data in complex environments, with complex organizations, complex use cases? And having seen this from the inside, with Origami Logic, as we dealt with a lot of large corporations, and post-acquisition by Intuit, and a lot of the startups I'm involved with, it's clear that we're now onto that next step. And you have fundamental new paradigms, such as data mesh, that attempt to address that complexity, and responsibly scaling access, and democratizing access in the value monetization from data, across large organizations. You have a slew of startups that are evolving to help the entire lifecycle of data, from the data engineering side of it, to the data analytics side of it, to the AI use cases side of it. And it feels like the early days, to a certain extent, of the revolution that we've seen in transition from traditional databases, to data warehouses, to cloud-based data processing, and big data. It feels like we're at the genesis of that next wave. And it's super, super exciting, for me at least, as someone who's sitting more in the coach seat, rather than being on the pitch, and building startups, helping folks as they go through those motions. >> So that's awesome. I want to get into some of these data infrastructure dynamics you mentioned, but before that, talk to the audience around what you're working on now. You've been a successful entrepreneur, you're focused on angel investing, so, super-early seed stage. What kind of deals are you looking at? What's interesting to you? What is Sonoma Ventures looking for, and what are some of the entrepreneurial dynamics that you're seeing right now, from a startup standpoint? >> Cool, so, at a macro level, this is a little bit of background of my history, because it shapes very heavily what it is that I'm looking at. So, I've been very fortunate with entrepreneurial career. I founded three startups. All three of them are successful. Final two were sold, the first one merged and went public. And my third career has been about data, moving data, passing data, processing data, generating insights from it. And, at this phase, I wanted to really evolve from just going and building startup number four, from going through the same motions again. A 10 year adventure, I'm a little bit too old for that, I guess. But the next best thing is to sit from a point whereby I can be more elevated in where I'm dealing with, and broaden the variety of startups I'm focused on, rather than just do your own thing, and just go very, very deep into it. Now, what specifically am I focused on at Sonoma Ventures? So, basically, looking at what I refer to as a data-driven application stack. Anything from the low-level data infrastructure and cloud infrastructure, that helps any persona in the data universe maximize value for data, from their particular point of view, for their particular role, whether it's data analysts, data scientists, data engineers, cloud engineers, DevOps folks, et cetera. All the way up to the application layer, in applications that are very data-heavy. And what are very typical data-heavy applications? FinTech, cyber, Web3, revenue technologies, and product and DevOps. So these are the areas we're focused on. I have almost 23 or 24 startups in the portfolio that span all these different areas. And this is in terms of the aperture. Now, typically, focus on pre-seed, seed. Sometimes a little bit later stage, but this is the primary focus. And it's really about partnering with entrepreneurs, and helping them make, if you will, original mistakes, avoid the mistakes I made. >> Yeah. >> And take it to the next level, whatever the milestone they're driving with. So I'm very, very hands-on with many of those startups. Now, what is it that's happening right now, initially, and why is it so exciting? So, on one hand, you have this scaling of data and its complexity, yet lagging value creation from it, across those different personas we've touched on. So that's one fundamental opportunity which is secular. The other one, which is more a cyclic situation, is the fact that we're going through a down cycle in tech, as is very evident in the public markets, and everything we're hearing about funding going slower and lower, terms shifting more into the hands of typical VCs versus entrepreneur-friendly market, and so on and so forth. And a very significant amount of layoffs. Now, when you combine these two trends together, you're observing a very interesting thing, that a lot of folks, really bright folks, who have sold a startup to a company, or have been in the guts of the large startup, or a large corporation, have, hands-on, experienced all those challenges we've spoken about earlier, in turf, maximizing value from data, irrespective of their role, in a specific angle, or vantage point they have on those challenges. So, for many of them, it's an opportunity to, "Now, let me now start a startup. I've been laid off, maybe, or my company's stock isn't doing as well as it used to, as a large corporation. Now I have an opportunity to actually go and take my entrepreneurial passion, and apply it to a product and experience as part of this larger company." >> Yeah. >> And you see a slew of folks who are emerging with these great ideas. So it's a very, very exciting period of time to innovate. >> It's interesting, a lot of people look at, I mean, I look at Snowflake as an example of a company that refactored data warehouses. They just basically took data warehouse, and put it on the cloud, and called it a data cloud. That, to me, was compelling. They didn't pay any CapEx. They rode Amazon's wave there. So, a similar thing going on with data. You mentioned this, and I see it as an enabling opportunity. So whether it's cybersecurity, FinTech, whatever vertical, you have an enablement. Now, you mentioned data infrastructure. It's a super exciting area, as there's so many stacks emerging. We got an analytics stack, there's real-time stacks, there's data lakes, AI stack, foundational models. So, you're seeing an explosion of stacks, different tools probably will emerge. So, how do you look at that, as a seasoned entrepreneur, now investor? Is that a good thing? Is that just more of the market? 'Cause it just seems like more and more kind of decomposed stacks targeted at use cases seems to be a trend. >> Yeah. >> And how do you vet that, is it? >> So it's a great observation, and if you take a step back and look at the evolution of technology over the last 30 years, maybe longer, you always see these cycles of expansion, fragmentation, contraction, expansion, contraction. Go decentralize, go centralize, go decentralize, go centralize, as manifested in different types of technology paradigms. From client server, to storage, to microservices, to et cetera, et cetera. So I think we're going through another big bang, to a certain extent, whereby end up with more specialized data stacks for specific use cases, as you need performance, the data models, the tooling to best adapt to the particular task at hand, and the particular personas at hand. As the needs of the data analysts are quite different from the needs of an NL engineer, it's quite different from the needs of the data engineer. And what happens is, when you end up with these siloed stacks, you end up with new fragmentation, and new gaps that need to be filled with a new layer of innovation. And I suspect that, in part, that's what we're seeing right now, in terms of the next wave of data innovation. Whether it's in a service of FinTech use cases, or cyber use cases, or other, is a set of tools that end up having to try and stitch together those elements and bridge between them. So I see that as a fantastic gap to innovate around. I see, also, a fundamental need in creating a common data language, and common data management processes and governance across those different personas, because ultimately, the same underlying data these folks need, albeit in different mediums, different access models, different velocities, et cetera, the subject matter, if you will, the underlying raw data, and some of the taxonomies right on top of it, do need to be consistent. So, once again, a great opportunity to innovate, whether it's about semantic layers, whether it's about data mesh, whether it's about CICD tools for data engineers, and so on and so forth. >> I got to ask you, first of all, I see you have a friend you brought into the interview. You have a dog in the background who made a little cameo appearance. And that's awesome. Sitting right next to you, making sure everything's going well. On the AI thing, 'cause I think that's the hot trend here. >> Yeah. >> You're starting to see, that ChatGPT's got everyone excited, because it's kind of that first time you see kind of next-gen functionality, large-language models, where you can bring data in, and it integrates well. So, to me, I think, connecting the dots, this kind of speaks to the beginning of what will be a trend of really blending of data stacks together, or blending of models. And so, as more data modeling emerges, you start to have this AI stack kind of situation, where you have things out there that you can compose. It's almost very developer-friendly, conceptually. This is kind of new, but kind of the same concept's been working on with Google and others. How do you see this emerging, as an investor? What are some of the things that you're excited about, around the ChatGPT kind of things that's happening? 'Cause it brings it mainstream. Again, a million downloads, fastest applications get a million downloads, even among all the successes. So it's obviously hit a nerve. People are talking about it. What's your take on that? >> Yeah, so, I think that's a great point, and clearly, it feels like an iPhone moment, right, to the industry, in this case, AI, and lots of applications. And I think there's, at a high level, probably three different layers of innovation. One is on top of those platforms. What use cases can one bring to the table that would drive on top of a ChatGPT-like service? Whereby, the startup, the company, can bring some unique datasets to infuse and add value on top of it, by custom-focusing it and purpose-building it for a particular use case or particular vertical. Whether it's applying it to customer service, in a particular vertical, applying it to, I don't know, marketing content creation, and so on and so forth. That's one category. And I do know that, as one of my startups is in Y Combinator, this season, winter '23, they're saying that a very large chunk of the YC companies in this cycle are about GPT use cases. So we'll see a flurry of that. The next layer, the one below that, is those who actually provide those platforms, whether it's ChatGPT, whatever will emerge from the partnership with Microsoft, and any competitive players that emerge from other startups, or from the big cloud providers, whether it's Facebook, if they ever get into this, and Google, which clearly will, as they need to, to survive around search. The third layer is the enabling layer. As you're going to have more and more of those different large-language models and use case running on top of it, the underlying layers, all the way down to cloud infrastructure, the data infrastructure, and the entire set of tools and systems, that take raw data, and massage it into useful, labeled, contextualized features and data to feed the models, the AI models, whether it's during training, or during inference stages, in production. Personally, my focus is more on the infrastructure than on the application use cases. And I believe that there's going to be a massive amount of innovation opportunity around that, to reach cost-effective, quality, fair models that are deployed easily and maintained easily, or at least with as little pain as possible, at scale. So there are startups that are dealing with it, in various areas. Some are about focusing on labeling automation, some about fairness, about, speaking about cyber, protecting models from threats through data and other issues with it, and so on and so forth. And I believe that this will be, too, a big driver for massive innovation, the infrastructure layer. >> Awesome, and I love how you mentioned the iPhone moment. I call it the browser moment, 'cause it felt that way for me, personally. >> Yep. >> But I think, from a business model standpoint, there is that iPhone shift. It's not the BlackBerry. It's a whole 'nother thing. And I like that. But I do have to ask you, because this is interesting. You mentioned iPhone. iPhone's mostly proprietary. So, in these machine learning foundational models, >> Yeah. >> you're starting to see proprietary hardware, bolt-on, acceleration, bundled together, for faster uptake. And now you got open source emerging, as two things. It's almost iPhone-Android situation happening. >> Yeah. >> So what's your view on that? Because there's pros and cons for either one. You're seeing a lot of these machine learning laws are very proprietary, but they work, and do you care, right? >> Yeah. >> And then you got open source, which is like, "Okay, let's get some upsource code, and let people verify it, and then build with that." Is it a balance? >> Yes, I think- >> Is it mutually exclusive? What's your view? >> I think it's going to be, markets will drive the proportion of both, and I think, for a certain use case, you'll end up with more proprietary offerings. With certain use cases, I guess the fundamental infrastructure for ChatGPT-like, let's say, large-language models and all the use cases running on top of it, that's likely going to be more platform-oriented and open source, and will allow innovation. Think of it as the equivalent of iPhone apps or Android apps running on top of those platforms, as in AI apps. So we'll have a lot of that. Now, when you start going a little bit more into the guts, the lower layers, then it's clear that, for performance reasons, in particular, for certain use cases, we'll end up with more proprietary offerings, whether it's advanced silicon, such as some of the silicon that emerged from entrepreneurs who have left Google, around TensorFlow, and all the silicon that powers that. You'll see a lot of innovation in that area as well. It hopefully intends to improve the cost efficiency of running large AI-oriented workloads, both in inference and in learning stages. >> I got to ask you, because this has come up a lot around Azure and Microsoft. Microsoft, pretty good move getting into the ChatGPT >> Yep. >> and the open AI, because I was talking to someone who's a hardcore Amazon developer, and they said, they swore they would never use Azure, right? One of those types. And they're spinning up Azure servers to get access to the API. So, the developers are flocking, as you mentioned. The YC class is all doing large data things, because you can now program with data, which is amazing, which is amazing. So, what's your take on, I know you got to be kind of neutral 'cause you're an investor, but you got, Amazon has to respond, Google, essentially, did all the work, so they have to have a solution. So, I'm expecting Google to have something very compelling, but Microsoft, right now, is going to just, might run the table on developers, this new wave of data developers. What's your take on the cloud responses to this? What's Amazon, what do you think AWS is going to do? What should Google be doing? What's your take? >> So, each of them is coming from a slightly different angle, of course. I'll say, Google, I think, has massive assets in the AI space, and their underlying cloud platform, I think, has been designed to support such complicated workloads, but they have yet to go as far as opening it up the same way ChatGPT is now in that Microsoft partnership, and Azure. Good question regarding Amazon. AWS has had a significant investment in AI-related infrastructure. Seeing it through my startups, through other lens as well. How will they respond to that higher layer, above and beyond the low level, if you will, AI-enabling apparatuses? How do they elevate to at least one or two layers above, and get to the same ChatGPT layer, good question. Is there an acquisition that will make sense for them to accelerate it, maybe. Is there an in-house development that they can reapply from a different domain towards that, possibly. But I do suspect we'll end up with acquisitions as the arms race around the next level of cloud wars emerges, and it's going to be no longer just about the basic tooling for basic cloud-based applications, and the infrastructure, and the cost management, but rather, faster time to deliver AI in data-heavy applications. Once again, each one of those cloud suppliers, their vendor is coming with different assets, and different pros and cons. All of them will need to just elevate the level of the fight, if you will, in this case, to the AI layer. >> It's going to be very interesting, the different stacks on the data infrastructure, like I mentioned, analytics, data lake, AI, all happening. It's going to be interesting to see how this turns into this AI cloud, like data clouds, data operating systems. So, super fascinating area. Opher, thank you for coming on and sharing your expertise with us. Great to see you, and congratulations on the work. I'll give you the final word here. Give a plugin for what you're looking for for startup seats, pre-seeds. What's the kind of profile that gets your attention, from a seed, pre-seed candidate or entrepreneur? >> Cool, first of all, it's my pleasure. Enjoy our chats, as always. Hopefully the next one's not going to be in nine years. As to what I'm looking for, ideally, smart data entrepreneurs, who have come from a particular domain problem, or problem domain, that they understand, they felt it in their own 10 fingers, or millions of neurons in their brains, and they figured out a way to solve it. Whether it's a data infrastructure play, a cloud infrastructure play, or a very, very smart application that takes advantage of data at scale. These are the things I'm looking for. >> One final, final question I have to ask you, because you're a seasoned entrepreneur, and now coach. What's different about the current entrepreneurial environment right now, vis-a-vis, the past decade? What's new? Is it different, highly accelerated? What advice do you give entrepreneurs out there who are putting together their plan? Obviously, a global resource pool now of engineering. It might not be yesterday's formula for success to putting a venture together to get to that product-market fit. What's new and different, and what's your advice to the folks out there about what's different about the current environment for being an entrepreneur? >> Fantastic, so I think it's a great question. So I think there's a few axes of difference, compared to, let's say, five years ago, 10 years ago, 15 years ago. First and foremost, given the amount of infrastructure out there, the amount of open-source technologies, amount of developer toolkits and frameworks, trying to develop an application, at least at the application layer, is much faster than ever. So, it's faster and cheaper, to the most part, unless you're building very fundamental, core, deep tech, where you still have a big technology challenge to deal with. And absent that, the challenge shifts more to how do you manage my resources, to product-market fit, how are you integrating the GTM lens, the go-to-market lens, as early as possible in the product-market fit cycle, such that you reach from pre-seed to seed, from seed to A, from A to B, with an optimal amount of velocity, and a minimal amount of resources. One big difference, specifically as of, let's say, beginning of this year, late last year, is that money is no longer free for entrepreneurs, which means that you need to operate and build startup in an environment with a lot more constraints. And in my mind, some of the best startups that have ever been built, and some of the big market-changing, generational-changing, if you will, technology startups, in their respective industry verticals, have actually emerged from these times. And these tend to be the smartest, best startups that emerge because they operate with a lot less money. Money is not as available for them, which means that they need to make tough decisions, and make verticals every day. What you don't need to do, you can kick the cow down the road. When you have plenty of money, and it cushions for a lot of mistakes, you don't have that cushion. And hopefully we'll end up with companies with a more agile, more, if you will, resilience, and better cultures in making those tough decisions that startups need to make every day. Which is why I'm super, super excited to see the next batch of amazing unicorns, true unicorns, not just valuation, market rising with the water type unicorns that emerged from this particular era, which we're in the beginning of. And very much enjoy working with entrepreneurs during this difficult time, the times we're in. >> The next 24 months will be the next wave, like you said, best time to do a company. Remember, Airbnb's pitch was, "We'll rent cots in apartments, and sell cereal." Boy, a lot of people passed on that deal, in that last down market, that turned out to be a game-changer. So the crazy ideas might not be that bad. So it's all about the entrepreneurs, and >> 100%. >> this is a big wave, and it's certainly happening. Opher, thank you for sharing. Obviously, data is going to change all the markets. Refactoring, security, FinTech, user experience, applications are going to be changed by data, data operating system. Thanks for coming on, and thanks for sharing. Appreciate it. >> My pleasure. Have a good one. >> Okay, more coverage for the CloudNativeSecurityCon inaugural event. Data will be the key for cybersecurity. theCUBE's coverage continues after this break. (uplifting music)

Published Date : Feb 2 2023

SUMMARY :

and happening all around the world. Great to be back. He's now the CEO in the past five years or so. and a lot of the startups What kind of deals are you looking at? and broaden the variety of and apply it to a product and experience And you see a slew of folks and put it on the cloud, and new gaps that need to be filled You have a dog in the background but kind of the same and the entire set of tools and systems, I call it the browser moment, But I do have to ask you, And now you got open source and do you care, right? and then build with that." and all the use cases I got to ask you, because and the open AI, and it's going to be no longer What's the kind of profile These are the things I'm looking for. about the current environment and some of the big market-changing, So it's all about the entrepreneurs, and to change all the markets. Have a good one. for the CloudNativeSecurityCon

ENTITIES

Entity	Category	Confidence
Satya Nadella	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
Opher	PERSON	0.99+
CapEx	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
John Furrier	PERSON	0.99+
Sonoma Ventures	ORGANIZATION	0.99+
BlackBerry	ORGANIZATION	0.99+
10 fingers	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
nine years	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Origami Logic	ORGANIZATION	0.99+
Origami	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
RevTech	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Opher Kahane	PERSON	0.99+
CloudNativeSecurityCon	EVENT	0.99+
Palo Alto Studios	LOCATION	0.99+
yesterday	DATE	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
third layer	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
two layers	QUANTITY	0.98+
Android	TITLE	0.98+
third career	QUANTITY	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
MapR	ORGANIZATION	0.98+
one	QUANTITY	0.98+
one category	QUANTITY	0.98+
late last year	DATE	0.98+
millions of neurons	QUANTITY	0.98+
a million downloads	QUANTITY	0.98+
three startups	QUANTITY	0.98+
10 years ago	DATE	0.97+
Fintech	ORGANIZATION	0.97+
winter '23	DATE	0.97+
first one	QUANTITY	0.97+
this year	DATE	0.97+
Stanford	LOCATION	0.97+
Cloudera	ORGANIZATION	0.97+
theCUBE Center	ORGANIZATION	0.96+
five years ago	DATE	0.96+
10 year	QUANTITY	0.96+
ChatGPT	TITLE	0.96+
three	QUANTITY	0.95+
first time	QUANTITY	0.95+
XCEL Partners	ORGANIZATION	0.95+
15 years ago	DATE	0.94+
24 startups	QUANTITY	0.93+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Cloudera: