Phil Kippen, Snowflake, Dave Whittington, AT&T & Roddy Tranum, AT&T | | MWC Barcelona 2023

(gentle music) >> Narrator: "TheCUBE's" live coverage is made possible by funding from Dell Technologies, creating technologies that drive human progress. (upbeat music) >> Hello everybody, welcome back to day four of "theCUBE's" coverage of MWC '23. We're here live at the Fira in Barcelona. Wall-to-wall coverage, John Furrier is in our Palo Alto studio, banging out all the news. Really, the whole week we've been talking about the disaggregation of the telco network, the new opportunities in telco. We're really excited to have AT&T and Snowflake here. Dave Whittington is the AVP, at the Chief Data Office at AT&T. Roddy Tranum is the Assistant Vice President, for Channel Performance Data and Tools at AT&T. And Phil Kippen, the Global Head Of Industry-Telecom at Snowflake, Snowflake's new telecom business. Snowflake just announced earnings last night. Typical Scarpelli, they beat earnings, very conservative guidance, stocks down today, but we like Snowflake long term, they're on that path to 10 billion. Guys, welcome to "theCUBE." Thanks so much >> Phil: Thank you. >> for coming on. >> Dave and Roddy: Thanks Dave. >> Dave, let's start with you. The data culture inside of telco, We've had this, we've been talking all week about this monolithic system. Super reliable. You guys did a great job during the pandemic. Everything shifting to landlines. We didn't even notice, you guys didn't miss a beat. Saved us. But the data culture's changing inside telco. Explain that. >> Well, absolutely. So, first of all IoT and edge processing is bringing forth new and exciting opportunities all the time. So, we're bridging the world between a lot of the OSS stuff that we can do with edge processing. But bringing that back, and now we're talking about working, and I would say traditionally, we talk data warehouse. Data warehouse and big data are now becoming a single mesh, all right? And the use cases and the way you can use those, especially I'm taking that edge data and bringing it back over, now I'm running AI and ML models on it, and I'm pushing back to the edge, and I'm combining that with my relational data. So that mesh there is making all the difference. We're getting new use cases that we can do with that. And it's just, and the volume of data is immense. >> Now, I love ChatGPT, but I'm hoping your data models are more accurate than ChatGPT. I never know. Sometimes it's really good, sometimes it's really bad. But enterprise, you got to be clean with your AI, don't you? >> Not only you have to be clean, you have to monitor it for bias and be ethical about it. We're really good about that. First of all with AT&T, our brand is Platinum. We take care of that. So, we may not be as cutting-edge risk takers as others, but when we go to market with an AI or an ML or a product, it's solid. >> Well hey, as telcos go, you guys are leaning into the Cloud. So I mean, that's a good starting point. Roddy, explain your role. You got an interesting title, Channel Performance Data and Tools, what's that all about? >> So literally anything with our consumer, retail, concenters' channels, all of our channels, from a data perspective and metrics perspective, what it takes to run reps, agents, all the way to leadership levels, scorecards, how you rank in the business, how you're driving the business, from sales, service, customer experience, all that data infrastructure with our great partners on the CDO side, as well as Snowflake, that comes from my team. >> And that's traditionally been done in a, I don't mean the pejorative, but we're talking about legacy, monolithic, sort of data warehouse technologies. >> Absolutely. >> We have a love-hate relationship with them. It's what we had. It's what we used, right? And now that's evolving. And you guys are leaning into the Cloud. >> Dramatic evolution. And what Snowflake's enabled for us is impeccable. We've talked about having, people have dreamed of one data warehouse for the longest time and everything in one system. Really, this is the only way that becomes a reality. The more you get in Snowflake, we can have golden source data, and instead of duplicating that 50 times across AT&T, it's in one place, we just share it, everybody leverages it, and now it's not duplicated, and the process efficiency is just incredible. >> But it really hinges on that separation of storage and compute. And we talk about the monolithic warehouse, and one of the nightmares I've lived with, is having a monolithic warehouse. And let's just go with some of my primary, traditional customers, sales, marketing and finance. They are leveraging BSS OSS data all the time. For me to coordinate a deployment, I have to make sure that each one of these units can take an outage, if it's going to be a long deployment. With the separation of storage, compute, they own their own compute cluster. So I can move faster for these people. 'Cause if finance, I can implement his code without impacting finance or marketing. This brings in CI/CD to more reality. It brings us faster to market with more features. So if he wants to implement a new comp plan for the field reps, or we're reacting to the marketplace, where one of our competitors has done something, we can do that in days, versus waiting weeks or months. >> And we've reported on this a lot. This is the brilliance of Snowflake's founders, that whole separation >> Yep. >> from compute and data. I like Dave, that you're starting with sort of the business flexibility, 'cause there's a cost element of this too. You can dial down, you can turn off compute, and then of course the whole world said, "Hey, that's a good idea." And a VC started throwing money at Amazon, but Redshift said, "Oh, we can do that too, sort of, can't turn off the compute." But I want to ask you Phil, so, >> Sure. >> it looks from my vantage point, like you're taking your Data Cloud message which was originally separate compute from storage simplification, now data sharing, automated governance, security, ultimately the marketplace. >> Phil: Right. >> Taking that same model, break down the silos into telecom, right? It's that same, >> Mm-hmm. >> sorry to use the term playbook, Frank Slootman tells me he doesn't use playbooks, but he's not a pattern matcher, but he's a situational CEO, he says. But the situation in telco calls for that type of strategy. So explain what you guys are doing in telco. >> I think there's, so, what we're launching, we launched last week, and it really was three components, right? So we had our platform as you mentioned, >> Dave: Mm-hmm. >> and that platform is being utilized by a number of different companies today. We also are adding, for telecom very specifically, we're adding capabilities in marketplace, so that service providers can not only use some of the data and apps that are in marketplace, but as well service providers can go and sell applications or sell data that they had built. And then as well, we're adding our ecosystem, it's telecom-specific. So, we're bringing partners in, technology partners, and consulting and services partners, that are very much focused on telecoms and what they do internally, but also helping them monetize new services. >> Okay, so it's not just sort of generic Snowflake into telco? You have specific value there. >> We're purposing the platform specifically for- >> Are you a telco guy? >> I am. You are, okay. >> Total telco guy absolutely. >> So there you go. You see that Snowflake is actually an interesting organizational structure, 'cause you're going after verticals, which is kind of rare for a company of your sort of inventory, I'll say, >> Absolutely. >> I don't mean that as a negative. (Dave laughs) So Dave, take us through the data journey at AT&T. It's a long history. You don't have to go back to the 1800s, but- (Dave laughs) >> Thank you for pointing out, we're a 149-year-old company. So, Jesse James was one of the original customers, (Dave laughs) and we have no longer got his data. So, I'll go back. I've been 17 years singular AT&T, and I've watched it through the whole journey of, where the monolithics were growing, when the consolidation of small, wireless carriers, and we went through that boom. And then we've gone through mergers and acquisitions. But, Hadoop came out, and it was going to solve all world hunger. And we had all the aspects of, we're going to monetize and do AI and ML, and some of the things we learned with Hadoop was, we had this monolithic warehouse, we had this file-based-structured Hadoop, but we really didn't know how to bring this all together. And we were bringing items over to the relational, and we were taking the relational and bringing it over to the warehouse, and trying to, and it was a struggle. Let's just go there. And I don't think we were the only company to struggle with that, but we learned a lot. And so now as tech is finally emerging, with the cloud, companies like Snowflake, and others that can handle that, where we can create, we were discussing earlier, but it becomes more of a conducive mesh that's interoperable. So now we're able to simplify that environment. And the cloud is a big thing on that. 'Cause you could not do this on-prem with on-prem technologies. It would be just too cost prohibitive, and too heavy of lifting, going back and forth, and managing the data. The simplicity the cloud brings with a smaller set of tools, and I'll say in the data space specifically, really allows us, maybe not a single instance of data for all use cases, but a greatly reduced ecosystem. And when you simplify your ecosystem, you simplify speed to market and data management. >> So I'm going to ask you, I know it's kind of internal organizational plumbing, but it'll inform my next question. So, Dave, you're with the Chief Data Office, and Roddy, you're kind of, you all serve in the business, but you're really serving the, you're closer to those guys, they're banging on your door for- >> Absolutely. I try to keep the 130,000 users who may or may not have issues sometimes with our data and metrics, away from Dave. And he just gets a call from me. >> And he only calls when he has a problem. He's never wished me happy birthday. (Dave and Phil laugh) >> So the reason I asked that is because, you describe Dave, some of the Hadoop days, and again love-hate with that, but we had hyper-specialized roles. We still do. You've got data engineers, data scientists, data analysts, and you've got this sort of this pipeline, and it had to be this sequential pipeline. I know Snowflake and others have come to simplify that. My question to you is, how is that those roles, how are those roles changing? How is data getting closer to the business? Everybody talks about democratizing business. Are you doing that? What's a real use example? >> From our perspective, those roles, a lot of those roles on my team for years, because we're all about efficiency, >> Dave: Mm-hmm. >> we cut across those areas, and always have cut across those areas. So now we're into a space where things have been simplified, data processes and copying, we've gone from 40 data processes down to five steps now. We've gone from five steps to one step. We've gone from days, now take hours, hours to minutes, minutes to seconds. Literally we're seeing that time in and time out with Snowflake. So these resources that have spent all their time on data engineering and moving data around, are now freed up more on what they have skills for and always have, the data analytics area of the business, and driving the business forward, and new metrics and new analysis. That's some of the great operational value that we've seen here. As this simplification happens, it frees up brain power. >> So, you're pumping data from the OSS, the BSS, the OKRs everywhere >> Everywhere. >> into Snowflake? >> Scheduling systems, you name it. If you can think of what drives our retail and centers and online, all that data, scheduling system, chat data, call center data, call detail data, all of that enters into this common infrastructure to manage the business on a day in and day out basis. >> How are the roles and the skill sets changing? 'Cause you're doing a lot less ETL, you're doing a lot less moving of data around. There were guys that were probably really good at that. I used to joke in the, when I was in the storage world, like if your job is bandaging lungs, you need to look for a new job, right? So, and they did and people move on. So, are you able to sort of redeploy those assets, and those people, those human resources? >> These folks are highly skilled. And we were talking about earlier, SQL hasn't gone away. Relational databases are not going away. And that's one thing that's made this migration excellent, they're just transitioning their skills. Experts in legacy systems are now rapidly becoming experts on the Snowflake side. And it has not been that hard a transition. There are certainly nuances, things that don't operate as well in the cloud environment that we have to learn and optimize. But we're making that transition. >> Dave: So just, >> Please. >> within the Chief Data Office we have a couple of missions, and Roddy is a great partner and an example of how it works. We try to bring the data for democratization, so that we have one interface, now hopefully know we just have a logical connection back to these Snowflake instances that we connect. But we're providing that governance and cleansing, and if there's a business rule at the enterprise level, we provide it. But the goal at CDO is to make sure that business units like Roddy or marketing or finance, that they can come to a platform that's reliable, robust, and self-service. I don't want to be in his way. So I feel like I'm providing a sub-level of platform, that he can come to and anybody can come to, and utilize, that they're not having to go back and undo what's in Salesforce, or ServiceNow, or in our billers. So, I'm sort of that layer. And then making sure that that ecosystem is robust enough for him to use. >> And that self-service infrastructure is predominantly through the Azure Cloud, correct? >> Dave: Absolutely. >> And you work on other clouds, but it's predominantly through Azure? >> We're predominantly in Azure, yeah. >> Dave: That's the first-party citizen? >> Yeah. >> Okay, I like to think in terms sometimes of data products, and I know you've mentioned upfront, you're Gold standard or Platinum standard, you're very careful about personal information. >> Dave: Yeah. >> So you're not trying to sell, I'm an AT&T customer, you're not trying to sell my data, and make money off of my data. So the value prop and the business case for Snowflake is it's simpler. You do things faster, you're in the cloud, lower cost, et cetera. But I presume you're also in the business, AT&T, of making offers and creating packages for customers. I look at those as data products, 'cause it's not a, I mean, yeah, there's a physical phone, but there's data products behind it. So- >> It ultimately is, but not everybody always sees it that way. Data reporting often can be an afterthought. And we're making it more on the forefront now. >> Yeah, so I like to think in terms of data products, I mean even if the financial services business, it's a data business. So, if we can think about that sort of metaphor, do you see yourselves as data product builders? Do you have that, do you think about building products in that regard? >> Within the Chief Data Office, we have a data product team, >> Mm-hmm. >> and by the way, I wouldn't be disingenuous if I said, oh, we're very mature in this, but no, it's where we're going, and it's somewhat of a journey, but I've got a peer, and their whole job is to go from, especially as we migrate from cloud, if Roddy or some other group was using tables three, four and five and joining them together, it's like, "Well look, this is an offer for data product, so let's combine these and put it up in the cloud, and here's the offer data set product, or here's the opportunity data product," and it's a journey. We're on the way, but we have dedicated staff and time to do this. >> I think one of the hardest parts about that is the organizational aspects of it. Like who owns the data now, right? It used to be owned by the techies, and increasingly the business lines want to have access, you're providing self-service. So there's a discussion about, "Okay, what is a data product? Who's responsible for that data product? Is it in my P&L or your P&L? Somebody's got to sign up for that number." So, it sounds like those discussions are taking place. >> They are. And, we feel like we're more the, and CDO at least, we feel more, we're like the guardians, and the shepherds, but not the owners. I mean, we have a role in it all, but he owns his metrics. >> Yeah, and even from our perspective, we see ourselves as an enabler of making whatever AT&T wants to make happen in terms of the key products and officers' trade-in offers, trade-in programs, all that requires this data infrastructure, and managing reps and agents, and what they do from a channel performance perspective. We still ourselves see ourselves as key enablers of that. And we've got to be flexible, and respond quickly to the business. >> I always had empathy for the data engineer, and he or she had to service all these different lines of business with no business context. >> Yeah. >> Like the business knows good data from bad data, and then they just pound that poor individual, and they're like, "Okay, I'm doing my best. It's just ones and zeros to me." So, it sounds like that's, you're on that path. >> Yeah absolutely, and I think, we do have refined, getting more and more refined owners of, since Snowflake enables these golden source data, everybody sees me and my organization, channel performance data, go to Roddy's team, we have a great team, and we go to Dave in terms of making it all happen from a data infrastructure perspective. So we, do have a lot more refined, "This is where you go for the golden source, this is where it is, this is who owns it. If you want to launch this product and services, and you want to manage reps with it, that's the place you-" >> It's a strong story. So Chief Data Office doesn't own the data per se, but it's your responsibility to provide the self-service infrastructure, and make sure it's governed properly, and in as automated way as possible. >> Well, yeah, absolutely. And let me tell you more, everybody talks about single version of the truth, one instance of the data, but there's context to that, that we are taking, trying to take advantage of that as we do data products is, what's the use case here? So we may have an entity of Roddy as a prospective customer, and we may have a entity of Roddy as a customer, high-value customer over here, which may have a different set of mix of data and all, but as a data product, we can then create those for those specific use cases. Still point to the same data, but build it in different constructs. One for marketing, one for sales, one for finance. By the way, that's where your data engineers are struggling. >> Yeah, yeah, of course. So how do I serve all these folks, and really have the context-common story in telco, >> Absolutely. >> or are these guys ahead of the curve a little bit? Or where would you put them? >> I think they're definitely moving a lot faster than the industry is generally. I think the enabling technologies, like for instance, having that single copy of data that everybody sees, a single pane of glass, right, that's definitely something that everybody wants to get to. Not many people are there. I think, what AT&T's doing, is most definitely a little bit further ahead than the industry generally. And I think the successes that are coming out of that, and the learning experiences are starting to generate momentum within AT&T. So I think, it's not just about the product, and having a product now that gives you a single copy of data. It's about the experiences, right? And now, how the teams are getting trained, domains like network engineering for instance. They typically haven't been a part of data discussions, because they've got a lot of data, but they're focused on the infrastructure. >> Mm. >> So, by going ahead and deploying this platform, for platform's purpose, right, and the business value, that's one thing, but also to start bringing, getting that experience, and bringing new experience in to help other groups that traditionally hadn't been data-centric, that's also a huge step ahead, right? So you need to enable those groups. >> A big complaint of course we hear at MWC from carriers is, "The over-the-top guys are killing us. They're riding on our networks, et cetera, et cetera. They have all the data, they have all the client relationships." Do you see your client relationships changing as a result of sort of your data culture evolving? >> Yes, I'm not sure I can- >> It's a loaded question, I know. >> Yeah, and then I, so, we want to start embedding as much into our network on the proprietary value that we have, so we can start getting into that OTT play, us as any other carrier, we have distinct advantages of what we can do at the edge, and we just need to start exploiting those. But you know, 'cause whether it's location or whatnot, so we got to eat into that. Historically, the network is where we make our money in, and we stack the services on top of it. It used to be *69. >> Dave: Yeah. >> If anybody remembers that. >> Dave: Yeah, of course. (Dave laughs) >> But you know, it was stacked on top of our network. Then we stack another product on top of it. It'll be in the edge where we start providing distinct values to other partners as we- >> I mean, it's a great business that you're in. I mean, if they're really good at connectivity. >> Dave: Yeah. >> And so, it sounds like it's still to be determined >> Dave: Yeah. >> where you can go with this. You have to be super careful with private and for personal information. >> Dave: Yep. >> Yeah, but the opportunities are enormous. >> There's a lot. >> Yeah, particularly at the edge, looking at, private networks are just an amazing opportunity. Factories and name it, hospital, remote hospitals, remote locations. I mean- >> Dave: Connected cars. >> Connected cars are really interesting, right? I mean, if you start communicating car to car, and actually drive that, (Dave laughs) I mean that's, now we're getting to visit Xen Fault Tolerance people. This is it. >> Dave: That's not, let's hold the traffic. >> Doesn't scare me as much as we actually learn. (all laugh) >> So how's the show been for you guys? >> Dave: Awesome. >> What're your big takeaways from- >> Tremendous experience. I mean, someone who doesn't go outside the United States much, I'm a homebody. The whole experience, the whole trip, city, Mobile World Congress, the technologies that are out here, it's been a blast. >> Anything, top two things you learned, advice you'd give to others, your colleagues out in general? >> In general, we talked a lot about technologies today, and we talked a lot about data, but I'm going to tell you what, the accelerator that you cannot change, is the relationship that we have. So when the tech and the business can work together toward a common goal, and it's a partnership, you get things done. So, I don't know how many CDOs or CIOs or CEOs are out there, but this connection is what accelerates and makes it work. >> And that is our audience Dave. I mean, it's all about that alignment. So guys, I really appreciate you coming in and sharing your story in "theCUBE." Great stuff. >> Thank you. >> Thanks a lot. >> All right, thanks everybody. Thank you for watching. I'll be right back with Dave Nicholson. Day four SiliconANGLE's coverage of MWC '23. You're watching "theCUBE." (gentle music)

Published Date : Mar 2 2023

SUMMARY :

that drive human progress. And Phil Kippen, the Global But the data culture's of the OSS stuff that we But enterprise, you got to be So, we may not be as cutting-edge Channel Performance Data and all the way to leadership I don't mean the pejorative, And you guys are leaning into the Cloud. and the process efficiency and one of the nightmares I've lived with, This is the brilliance of the business flexibility, like you're taking your Data Cloud message But the situation in telco and that platform is being utilized You have specific value there. I am. So there you go. I don't mean that as a negative. and some of the things we and Roddy, you're kind of, And he just gets a call from me. (Dave and Phil laugh) and it had to be this sequential pipeline. and always have, the data all of that enters into How are the roles and in the cloud environment that But the goal at CDO is to and I know you've mentioned upfront, So the value prop and the on the forefront now. I mean even if the and by the way, I wouldn't and increasingly the business and the shepherds, but not the owners. and respond quickly to the business. and he or she had to service Like the business knows and we go to Dave in terms doesn't own the data per se, and we may have a entity and really have the and having a product now that gives you and the business value, that's one thing, They have all the data, on the proprietary value that we have, Dave: Yeah, of course. It'll be in the edge business that you're in. You have to be super careful Yeah, but the particularly at the edge, and actually drive that, let's hold the traffic. much as we actually learn. the whole trip, city, is the relationship that we have. and sharing your story in "theCUBE." Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Whittington	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Roddy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Phil	PERSON	0.99+
Phil Kippen	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Jesse James	PERSON	0.99+
AT&T.	ORGANIZATION	0.99+
five steps	QUANTITY	0.99+
Dave Nicholson	PERSON	0.99+
John Furrier	PERSON	0.99+
50 times	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Roddy Tranum	PERSON	0.99+
10 billion	QUANTITY	0.99+
one step	QUANTITY	0.99+
17 years	QUANTITY	0.99+
130,000 users	QUANTITY	0.99+
United States	LOCATION	0.99+
1800s	DATE	0.99+
last week	DATE	0.99+
Barcelona	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
last night	DATE	0.99+
MWC '23	EVENT	0.98+
telco	ORGANIZATION	0.98+
one system	QUANTITY	0.98+
one	QUANTITY	0.98+
40 data processes	QUANTITY	0.98+
today	DATE	0.98+
one place	QUANTITY	0.97+
P&L	ORGANIZATION	0.97+
telcos	ORGANIZATION	0.97+
CDO	ORGANIZATION	0.97+
149-year-old	QUANTITY	0.97+
five	QUANTITY	0.97+
single	QUANTITY	0.96+
three components	QUANTITY	0.96+
One	QUANTITY	0.96+

Karthik Narain and Tanuja Randery | AWS Executive Summit 2022

(relaxing intro music) >> Welcome back to theCUBE's Coverage here live at reinvent 2022. We're here at the Executive Summit upstairs with the Accenture Set three sets broadcasting live four days with theCUBE. I'm John Furrier your host, with two great guests, cube alumnis, back Tanuja Randery, managing director Amazon web service for Europe middle East and Africa, known as EMEA. Welcome back to the Cube. >> Thank you. >> Great to see you. And Karthik Narain, who's the Accenture first cloud lead. Great to see you back again. >> Thank you. >> Thanks for coming back on. All right, so business transformation is all about digital transformation taken to its conclusion. When companies transform, they are now a digital business. Technologies powering value proposition, data security all in the keynotes higher level service at industry specific solutions. The dynamics of the industry are changing radically in front of our eyes for for the better. Karthik, what's your position on this as Accenture looks at this, we've covered all your successes during the pandemic with AWS. What, what do you guys see out there now as this next layer of power dynamics in the industry take place? >> I think cloud is getting interesting and I think there's a general trend towards specialization that's happening in the world of cloud. And cloud is also moving from a general purpose technology backbone to providing specific industry capabilities for every customer within various industries. But the industry cloud is not a new term. It has been used in the past and it's been used in the past in various degrees, whether that's building horizontal solutions, certain specialized SaaS software or providing capabilities that are horizontal for certain industries. But we see the evolution of industry cloud a little differently and a lot more dynamic, which is we see this as a marketplace where ecosystem of capabilities are going to come together to interact with a common data platform data backbone, data model with workflows that'll come together and integrate all of this stuff and help clients reinvent their industry with newer capabilities, but at the same time use the power of democratized innovation that's already there within that industry. So that's the kind of change we are seeing where customers in their strategy are going to implement industry cloud as one of the tenants as they go through their strategy. >> Yeah, and I see in my notes, fit for purposes is a buzzword people are talking about right size in the cloud and then just building on that. And what's interesting, Tanuja I want to get your thoughts because in the US we're one country, so yeah, integrating is kind of within services. You have purview over countries and these regions it's global impact. This is now a global environment. So it's not just the US North America, it's Latin America it's EMEA, this is another variable in the cross connecting of these fit for purpose. What's your view of the these industry specific solutions? >> Yeah, no and thanks Karthik 'cause I'm a hundred percent aligned. You know, I mean, you know this better than me, John, but 90% of workloads have not yet moved to the cloud. And the only way that we think that's going to happen is by bringing together business and IT. So what does that mean? It means starting with business use cases whether that's digital banking or smart connected factories or frankly if it's predictive maintenance or connected beds. But how do we take those use cases leverage them to really drive outcomes with the technology behind them? I think that's the key unlock that we have to get to. And very specifically, and Adam talked about this a lot today, but data, data is the single unifier for all of business and IT coming together to drive value, right? However, the issue is there's a ton of it, (John Furrier chuckling) right? In fact, fun fact if you put all the data that's going to be created over the next five years, which is more than the last 30 years, on a one terabyte little floppy, disk drive, remember those? Well that's going to be 15 round trips to the moon (John Furrier chuckling) and back. That's how much data it is. So our perspective is you got to unify, single data lake, you got to modernize with AI and ML, and then you're going to have to drive innovation on that. Now, I'll give you one tiny example if I may which I love Ryanair, big airline, 150 million passengers. They are also the largest supplier of ham and cheese sandwiches in the air. And catering at that scale is really difficult, right? If you have too much food wastage, sustainability issues, too little customers are really unhappy. So we work with them leveraging AWS cloud and AI ML to build a panini predictor. And in essence, it's taking the data they've got, data we've got, and actually giving them the opportunity to have just the right number of paninis. >> I love the lock and and the key is data to unlock the value. We heard that in the keynote. Karthik, you guys have been working together with AWS and a lot of successes. We've covered some of those on the cube. As you look at these industry solutions they're not the obvious big problems. They're like businesses, you know it could be the pizza shop it could be the dentist office, it could be any business any industry specific carries over. What is the key to unlock it? Is it the data? Is it the solution? What's that key? >> I think, you know the easier answer is all of the about, but like Tanuja said it all starts by bringing the data together and this is a funny thing. It's not creating new data. This data is there within enterprises. Our clients have these data the industries have the data, but for ages these data has been trapped in functional silos and organizations have been doing analytics within those functions. It's about bringing the data together whether that's a single data warehouse or a data mesh. Those are architectural considerations. But it's about bringing cross-functional data together as step one. Step two, is about utilizing the power of cloud for democratized innovation. It's no longer about one company trying to reinvent the wheel, or create a a new wheel within their enterprise. It's about looking around through the power of cloud marketplace to see if there's a solution that is already existing can we use that? Or if I've created something within my company can I use that as a service for others to use? So, the number one thing is using the power of democratized innovation. Second thing is how do you standardize and digitize functions that does not need to be reinvented every single time so that, you know, your organization can do it or you could use that or take that from elsewhere. And the third element is using the power of the platform economy or platforms to find new avenues of revenue opportunity, customer engagement and experiences. So these are all the things that differentiates organization, but all of this is underpinned by a unified data model that helps, you know, use all the (indistinct) there. >> Tanuja, you have mentioned earlier that not everyone has their journey of the cloud looks the same and certainly in the US and EMEA you have different countries and different areas. >> Yep. >> Their journeys are different. Some want speed and fees, some will roll their own. I mean data brick CEO, when I interviewed them that last week, they started database on a credit card swiped it and they didn't want any support. Amazon's knocking on their door saying, "you want support?" "No, we got it covered." Obviously they're from Berkeley and they're nerds, and they're cool. They can roll their own, but not everyone can. >> Yeah. >> And so you have a mix of customer profiles. How do you view that and what's your strategy? How do you get them over productive seeing that business value? What's that transformation look like? >> Yeah, John, you're absolutely right. So you've got those who are born in cloud, they're very savvy, they know exactly what they need. However, what I do find increasingly, even with these digital native customers, is they're also starting to talk business use cases. So they're talking about, "okay how do I take my platform and build a whole bunch of new services on top of that platform?" So, we still have to work with them on this business use case dimension for the next curve of growth that they want to drive. Currently with the global macroeconomic factors obviously they're also very concerned about profitability and costs. So that's one model. In the enterprise space, you have differences. >> Yeah. >> Right, You have the sort of very, very, very savvy enterprises, right? Who know exactly what they're looking for. But for them then it's about how do I lean into sustainability? In fact, we did a survey, and 77% of users that we surveyed said that they could accelerate their sustainably goals by using cloud. So in many cases they haven't cracked that and we can help them do that. So it's really about horses for courses there. And then, then with some other companies, they've done a lot of the basic infrastructure modernization. However, what they haven't been able to yet do is figure out how they're going to actually become a tech company. So I keep getting asked, can I become a tech company? How do I do that? Right? And then finally there are companies which don't have the skills. So if I go to the SMB segment, they don't always have the skills or the resources. And there using scalable market platforms like AWS marketplace, >> Yeah. >> Allows them to get access to solutions without having to have all the capabilities. So it really is- >> This is where partner network really kind of comes in. >> Absolutely. >> Huge value. Having that channel of solution providers I use that term specifically 'cause you're providing the solution for those folks. >> Yeah. Exact- >> And then the folks at the enterprise, we had a quote on the analyst segment earlier on our Cube, "spend more, save more." >> Yeah. >> That's the cloud equations, >> Yeah. because you're going to get it on sustainability you're going to save it on, you're going to save on cost recovery for revenue, time to revenue. So the cloud is the answer for a lot of enterprises out of the recession. >> Absolutely, and in fact, we need to lean in now you heard Adam say this, right? I mean the cost savings potential alone from on-prem to cloud is between 40 and 60 percent. Just that. But I don't think that's it John. >> The bell tightening he said is reigning some right size. Okay, but then also do more, he didn't say that, but analysts are generally saying, if you spend right on the cloud, you'll save more. That's a general thesis. >> Yeah. >> Do you agree with that? >> I absolutely think so. And by the way, usage is, people use it differently as they get smarter. We're constantly working with our customers by the way though, to continuously cost optimize. So you heard about our Graviton3 instances for example. We're using that to constantly optimize, but at the same time, what are the workloads that you haven't yet brought over to the cloud? (John Furrier chuckling) And so supply chain is a great idea. Our health cloud initiative. So we worked with Accenture on the Accenture Health Insights platform, which runs on AWS as an example or the Goldman Sachs one last year, if you remember. >> I do >> The financial cloud. So those, those are some of the things that I think make it easier for people to consume cloud and reimagine their businesses. >> It's funny, I was talking with Adam and we had a little debate about what an ISV is and I talked to the CEO of Mongo. They don't see themselves on the ISV. As they grew up on the cloud, they become platforms, they have their own ISVs and data bricks and Snowflake and others are developing that dynamic. But there's still ISVs out there. So there's a dynamic of growth going on and the need for partners and our belief is that the ecosystem is going to start doubling in size we believe, because of the demand for purpose built or so out of the box. I hate to use that word "out of the box", but you know turnkey solutions that you can buy another one if it breaks. But use the building blocks if you want to build the foundation. That is more durable, more customizable. Do that if you can. >> Well, >> but- >> we've got a phenomenal, >> shall we talk about this? >> Yeah, go get into- >> So, we've built a five year vision together, Accenture and us. which is called Velocity and you'll be much better in describing it, but I'll give you the simple version of Velocity which is taking AWS powered industry solutions and bringing it to market faster, more repeatable and at lower cost. And so think about vertical solutions sitting on a horizontal accelerator platform able to be deployed making transformation less complex. >> Yeah. >> Karthik, weight in on this, because I've talked to you about this before. We've said years ago the horizontal scalability of the cloud's a beautiful thing but verticals where the ML works great too. Now you got ML in all aspects of it. Horizontal verticals here now. >> Yeah, Yeah, absolutely. Again, the power of this kind of platform that we are launching, by the way we're launching tomorrow we are very excited about it, is, create a platform- >> What are you launching tomorrow? Hold on, I got news out there. What's launching? >> We are going to launch a giant platform, which will help clients accelerate their journey to industry cloud. So that's going to happen tomorrow. So what this platform would provide is that this is going to provide the horizontal capabilities that will help clients bootstrap their launch into cloud. And once they get into cloud, they would be able to build industry solutions on this. The way I imagine this is create the chassis that you need for your industry and then add the cartridges, industry cartridges, which are going to be solutions that are going to be built on top of it. And we are going to do this across various industries starting from, you know, healthcare, life sciences to energy to, you know, public services and so on and so forth >> You're going to create a channel machine. A channel creation machine, you're going to allow people to build their own solutions on top of that platform. And that's launching tomorrow. Make sure we get the news on that. >> Exactly. And- >> Ah, No, >> Sorry, and we genuinely believe the power of industry cloud, if you think about it in the past to create a solution one had to be an ISV to create a solution. What cloud is providing for industry today in the concept of industry clouds, this, industry companies are creating industry solution. The best example is, along with, you know, AWS and Accenture, Ecopetrol, which is a leader in the energy industry, has created a platform, you know called Water Intelligence and Management platform. And through this platform, they are attacking the audacious goal of water sustainability, which is going to be a huge problem for humanity that everybody needs to solve. As part of this platform, the goal is to reduce, you know, fresh water usage by 66% or zero, you know, you know, impact to, you know, groundwater is going to be the goal or ambition of Ecopetrol. So all of this is possible because industry players want to jump to the bandwagon because they have all the toolkit of of the cloud that's available with which they could build a software platform with which they can power their entire industry. >> And make money and have a good business. You guys are doing great. Final word, partnership. Where's it go next? You're doing great. Put a plugin for the Accenture AWS partnership. >> Well, I mean we have a phenomenal relationship and partnership, which is amazing. We really believe in the power of three which is the GSI, the ISV, and us together. And I have to go back to the thing I keep focused on 90% of workloads not in cloud. I think together we can enable those companies to come into the cloud. Very importantly, start to innovate launch new products and refuel the economy. So I think- >> We'll have to check on that >> Very, very optimistic. >> We'll have to check on that number. >> That seems a little- >> You got to check on that number. >> 90 seems a little bit amazing. >> 90% of workloads. >> That sounds, maybe, I'd be surprised. Maybe a little bit lower than that. Maybe. We'll see. >> We got to start turning it. >> It's still a lot. >> (laughs) It's still a lot. >> A lot more. Still first, still early days. Thanks so much for the conversation Karthik great to see you again Tanuja, thanks for your time. >> Thank you, John. >> Congratulations, on your success. Okay, this is theCube up here in the executive summit. You're watching theCube, the leader in high tech coverage, we'll be right back with more coverage here, and the Accenture set after the short break. (calm outro music)

Published Date : Nov 30 2022

SUMMARY :

We're here at the Great to see you. in front of our eyes for for the better. So that's the kind of change So it's not just the US North the opportunity to have just and the key is data to unlock the value. And the third element is using and certainly in the US and they're nerds, And so you have a mix for the next curve of growth of the basic infrastructure modernization. to have all the capabilities. This is where partner Having that channel of solution providers we had a quote on the So the cloud is the answer I mean the cost savings potential alone if you spend right on the are the workloads that you the things that I think make it of the box", but you know and bringing it to market the cloud's a beautiful thing Again, the power of this What are you create the chassis that you need You're going to create the goal is to reduce, you know, Put a plugin for the and refuel the economy. You got to check 90 seems a little Maybe a little bit lower than that. great to see you again Tanuja, and the Accenture set

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Tanuja Randery	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adam	PERSON	0.99+
Tanuja	PERSON	0.99+
Karthik	PERSON	0.99+
90%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
Karthik Narain	PERSON	0.99+
US	LOCATION	0.99+
Ryanair	ORGANIZATION	0.99+
zero	QUANTITY	0.99+
77%	QUANTITY	0.99+
third element	QUANTITY	0.99+
tomorrow	DATE	0.99+
Ecopetrol	ORGANIZATION	0.99+
last year	DATE	0.99+
Mongo	ORGANIZATION	0.99+
five year	QUANTITY	0.99+
66%	QUANTITY	0.99+
four days	QUANTITY	0.99+
last week	DATE	0.99+
three	QUANTITY	0.99+
Europe	LOCATION	0.99+
one	QUANTITY	0.99+
60 percent	QUANTITY	0.99+
one terabyte	QUANTITY	0.98+
one model	QUANTITY	0.98+
first	QUANTITY	0.98+
Africa	LOCATION	0.98+
today	DATE	0.98+
Berkeley	LOCATION	0.98+
Latin America	LOCATION	0.98+
theCUBE	ORGANIZATION	0.98+
single	QUANTITY	0.98+
one country	QUANTITY	0.97+
150 million passengers	QUANTITY	0.97+
Second thing	QUANTITY	0.97+
two great guests	QUANTITY	0.97+
40	QUANTITY	0.97+
hundred percent	QUANTITY	0.96+
step one	QUANTITY	0.96+
three sets	QUANTITY	0.96+
90	QUANTITY	0.96+
GSI	ORGANIZATION	0.95+
Step two	QUANTITY	0.93+
Accenture AWS	ORGANIZATION	0.93+
one company	QUANTITY	0.92+
15 round trips	QUANTITY	0.91+
Snowflake	TITLE	0.91+
EMEA	LOCATION	0.9+
ISV	ORGANIZATION	0.89+
EMEA	ORGANIZATION	0.88+
US North America	LOCATION	0.88+
first cloud	QUANTITY	0.85+
last 30 years	DATE	0.84+

theCUBE Insights with Industry Analysts | Snowflake Summit 2022

>>Okay. Okay. We're back at Caesar's Forum. The Snowflake summit 2022. The cubes. Continuous coverage this day to wall to wall coverage. We're so excited to have the analyst panel here, some of my colleagues that we've done a number. You've probably seen some power panels that we've done. David McGregor is here. He's the senior vice president and research director at Ventana Research. To his left is Tony Blair, principal at DB Inside and my in the co host seat. Sanjeev Mohan Sanremo. Guys, thanks so much for coming on. I'm glad we can. Thank you. You're very welcome. I wasn't able to attend the analyst action because I've been doing this all all day, every day. But let me start with you, Dave. What have you seen? That's kind of interested you. Pluses, minuses. Concerns. >>Well, how about if I focus on what I think valuable to the customers of snowflakes and our research shows that the majority of organisations, the majority of people, do not have access to analytics. And so a couple of things they've announced I think address those are helped to address those issues very directly. So Snow Park and support for Python and other languages is a way for organisations to embed analytics into different business processes. And so I think that will be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most most people in the organisation or not, analysts, they're doing some line of business function. Their HR managers, their marketing people, their salespeople, their finance people right there, not sitting there mucking around in the data. They're doing a job and they need analytics in that job. So, >>Tony, I thank you. I've heard a lot of data mesh talk this week. It's kind of funny. Can't >>seem to get away from it. You >>can't see. It seems to be gathering momentum, but But what have you seen? That's been interesting. >>What I have noticed. Unfortunately, you know, because the rooms are too small, you just can't get into the data mesh sessions, so there's a lot of interest in it. Um, it's still very I don't think there's very much understanding of it, but I think the idea that you can put all the data in one place which, you know, to me, stuff like it seems to be kind of sort of in a way, it sounds like almost like the Enterprise Data warehouse, you know, Clouded Cloud Native Edition, you know, bring it all in one place again. Um, I think it's providing, sort of, You know, it's I think, for these folks that think this might be kind of like a a linchpin for that. I think there are several other things that actually that really have made a bigger impression on me. Actually, at this event, one is is basically is, um we watch their move with Eunice store. Um, and it's kind of interesting coming, you know, coming from mongo db last week. And I see it's like these two companies seem to be going converging towards the same place at different speeds. I think it's not like it's going to get there faster than Mongo for a number of different reasons, but I see like a number of common threads here. I mean, one is that Mongo was was was a company. It's always been towards developers. They need you know, start cultivating data, people, >>these guys going the other way. >>Exactly. Bingo. And the thing is that but they I think where they're converging is the idea of operational analytics and trying to serve all constituencies. The other thing, which which also in terms of serving, you know, multiple constituencies is how snowflake is laid out Snow Park and what I'm finding like. There's an interesting I economy. On one hand, you have this very ingrained integration of Anaconda, which I think is pretty ingenious. On the other hand, you speak, let's say, like, let's say the data robot folks and say, You know something our folks wanna work data signs us. We want to work in our environment and use snowflake in the background. So I see those kind of some interesting sort of cross cutting trends. >>So, Sandy, I mean, Frank Sullivan, we'll talk about there's definitely benefits into going into the walled garden. Yeah, I don't think we dispute that, but we see them making moves and adding more and more open source capabilities like Apache iceberg. Is that a Is that a move to sort of counteract the narrative that the data breaks is put out there. Is that customer driven? What's your take on that? >>Uh, primarily I think it is to contract this whole notion that once you move data into snowflake, it's a proprietary format. So I think that's how it started. But it's hugely beneficial to the customers to the users, because now, if you have large amounts of data in parquet files, you can leave it on s three. But then you using the the Apache iceberg table format. In a snowflake, you get all the benefits of snowflakes. Optimizer. So, for example, you get the, you know, the micro partitioning. You get the meta data. So, uh, in a single query, you can join. You can do select from a snowflake table union and select from iceberg table, and you can do store procedures, user defined functions. So I think they what they've done is extremely interesting. Uh, iceberg by itself still does not have multi table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache iceberg in a raw format, they don't have it. But snowflake does, >>right? There's hence the delta. And maybe that maybe that closes over time. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I mean, it reminds me of, like reinvent in 2013, you know? But then I'm struck by the complexity of the last big data era and a dupe and all the different tools. And is this different, or is it the sort of same wine new new bottle? You guys have any thoughts on that? >>I think it's different and I'll tell you why. I think it's different because it's based around sequel. So if back to Tony's point, these vendors are coming at this from different angles, right? You've got data warehouse vendors and you've got data lake vendors and they're all going to meet in the middle. So in your case, you're taught operational analytical. But the same thing is true with Data Lake and Data Warehouse and Snowflake no longer wants to be known as the Data Warehouse. There a data cloud and our research again. I like to base everything off of that. >>I love what our >>research shows that organisation Two thirds of organisations have sequel skills and one third have big data skills, so >>you >>know they're going to meet in the middle. But it sure is a lot easier to bring along those people who know sequel already to that midpoint than it is to bring big data people to remember. >>Mrr Odula, one of the founders of Cloudera, said to me one time, John Kerry and the Cube, that, uh, sequel is the killer app for a Yeah, >>the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. Animals really have thought out the ease of use, you know? I mean, they thought about I mean, from the get go, they thought of too thin to polls. One is ease of use, and the other is scale. And they've had. And that's basically, you know, I think very much differentiates it. I mean, who do have the scale, but it didn't have the ease of use. But don't I >>still need? Like, if I have, you know, governance from this vendor or, you know, data prep from, you know, don't I still have to have expertise? That's sort of distributed in those those worlds, right? I mean, go ahead. Yeah. >>So the way I see it is snowflake is adding more and more capabilities right into the database. So, for example, they've they've gone ahead and added security and privacy so you can now create policies and do even set level masking, dynamic masking. But most organisations have more than snowflake. So what we are starting to see all around here is that there's a whole series of data catalogue companies, a bunch of companies that are doing dynamic data masking security and governance data observe ability, which is not a space snowflake has gone into. So there's a whole ecosystem of companies that that is mushrooming, although, you know so they're using the native capabilities of snowflake, but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other, like relational databases, you can run these cross platform capabilities in that layer. So so that way, you know, snowflakes done a great job of enabling that ecosystem about >>the stream lit acquisition. Did you see anything here that indicated there making strong progress there? Are you excited about that? You're sceptical. Go ahead. >>And I think it's like the last mile. Essentially. In other words, it's like, Okay, you have folks that are basically that are very, very comfortable with tableau. But you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency, um, to San James Point. I think part of it, this kind of plays into it is what makes this different from the ado Pere is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously to make put this native obviously snowflake acquired stream. Let's so we can expect that's extremely capabilities are going to be native. >>And the other thing, too, about the Hadoop ecosystem is Claudia had to help fund all those different projects and got really, really spread thin. I want to ask you guys about this super cloud we use. Super Cloud is this sort of metaphor for the next wave of cloud. You've got infrastructure aws, azure, Google. It's not multi cloud, but you've got that infrastructure you're building a layer on top of it that hides the underlying complexities of the primitives and the a p I s. And you're adding new value in this case, the data cloud or super data cloud. And now we're seeing now is that snowflake putting forth the notion that they're adding a super path layer. You can now build applications that you can monetise, which to me is kind of exciting. It makes makes this platform even less discretionary. We had a lot of talk on Wall Street about discretionary spending, and that's not discretionary. If you're monetising it, um, what do you guys think about that? Is this something that's that's real? Is it just a figment of my imagination, or do you see a different way of coming any thoughts on that? >>So, in effect, they're trying to become a data operating system, right? And I think that's wonderful. It's ambitious. I think they'll experience some success with that. As I said, applications are important. That's a great way to deliver information. You can monetise them, so you know there's there's a good economic model around it. I think they will still struggle, however, with bringing everything together onto one platform. That's always the challenge. Can you become the platform that's hard, hard to predict? You know, I think this is This is pretty exciting, right? A lot of energy, a lot of large ecosystem. There is a network effect already. Can they succeed in being the only place where data exists? You know, I think that's going to be a challenge. >>I mean, the fact is, I mean, this is a classic best of breed versus the umbrella play. The thing is, this is nothing new. I mean, this is like the you know, the old days with enterprise applications were basically oracle and ASAP vacuumed up all these. You know, all these applications in their in their ecosystem, whereas with snowflake is. And if you look at the cloud, folks, the hyper scale is still building out their own portfolios as well. Some are, You know, some hyper skills are more partner friendly than others. What? What Snowflake is saying is that we're going to give all of you folks who basically are competing against the hyper skills in various areas like data catalogue and pipelines and all that sort of wonderful stuff will make you basically, you know, all equal citizens. You know the burden is on you to basically we will leave. We will lay out the A P. I s Well, we'll allow you to basically, you know, integrate natively to us so you can provide as good experience. But the but the onus is on your back. >>Should the ecosystem be concerned, as they were back to reinvent 2014 that Amazon was going to nibble away at them or or is it different? >>I find what they're doing is different. Uh, for example, data sharing. They were the first ones out the door were data sharing at a large scale. And then everybody has jumped in and said, Oh, we also do data sharing. All the hyper scholars came in. But now what snowflake has done is they've taken it to the next level. Now they're saying it's not just data sharing. It's up sharing and not only up sharing. You can stream the thing you can build, test deploy, and then monetise it. Make it discoverable through, you know, through your marketplace >>you can monetise it. >>Yes. Yeah, so So I I think what they're doing is they are taking it a step further than what hyper scale as they are doing. And because it's like what they said is becoming like the data operating system You log in and you have all of these different functionalities you can do in machine learning. Now you can do data quality. You can do data preparation and you can do Monetisation. Who do you >>think is snowflakes? Biggest competitor? What do you guys think? It's a hard question, isn't it? Because you're like because we all get the we separate computer from storage. We have a cloud data and you go, Okay, that's nice, >>but there's, like, a crack. I think >>there's uniqueness. I >>mean, put it this way. In the old days, it would have been you know, how you know the prime household names. I think today is the hyper scholars and the idea what I mean again, this comes down to the best of breed versus by, you know, get it all from one source. So where is your comfort level? Um, so I think they're kind. They're their co op a Titian the hyper scale. >>Okay, so it's not data bricks, because why they're smaller. >>Well, there is some okay now within the best of breed area. Yes, there is competition. The obvious is data bricks coming in from the data engineering angle. You know, basically the snowflake coming from, you know, from the from the data analyst angle. I think what? Another potential competitor. And I think Snowflake, basically, you know, admitted as such potentially is mongo >>DB. Yeah, >>Exactly. So I mean, yes, there are two different levels of sort >>of a on a longer term collision course. >>Exactly. Exactly. >>Sort of service now and in salesforce >>thing that was that we actually get when I say that a lot of people just laughed. I was like, No, you're kidding. There's no way. I said Excuse me, >>But then you see Mongo last week. We're adding some analytics capabilities and always been developers, as you say, and >>they trashed sequel. But yet they finally have started to write their first real sequel. >>We have M c M Q. Well, now we have a sequel. So what >>were those numbers, >>Dave? Two thirds. One third. >>So the hyper scale is but the hyper scale urz are you going to trust your hyper scale is to do your cross cloud. I mean, maybe Google may be I mean, Microsoft, perhaps aws not there yet. Right? I mean, how important is cross cloud, multi cloud Super cloud Whatever you want to call it What is your data? >>Shows? Cloud is important if I remember correctly. Our research shows that three quarters of organisations are operating in the cloud and 52% are operating across more than one cloud. So, uh, two thirds of the organisations are in the cloud are doing multi cloud, so that's pretty significant. And now they may be operating across clouds for different reasons. Maybe one application runs in one cloud provider. Another application runs another cloud provider. But I do think organisations want that leverage over the hyper scholars right they want they want to be able to tell the hyper scale. I'm gonna move my workloads over here if you don't give us a better rate. Uh, >>I mean, I I think you know, from a database standpoint, I think you're right. I mean, they are competing against some really well funded and you look at big Query barely, you know, solid platform Red shift, for all its faults, has really done an amazing job of moving forward. But to David's point, you know those to me in any way. Those hyper skills aren't going to solve that cross cloud cloud problem, right? >>Right. No, I'm certainly >>not as quickly. No. >>Or with as much zeal, >>right? Yeah, right across cloud. But we're gonna operate better on our >>Exactly. Yes. >>Yes. Even when we talk about multi cloud, the many, many definitions, like, you know, you can mean anything. So the way snowflake does multi cloud and the way mongo db two are very different. So a snowflake says we run on all the hyper scalar, but you have to replicate your data. What Mongo DB is claiming is that one cluster can have notes in multiple different clouds. That is right, you know, quite something. >>Yeah, right. I mean, again, you hit that. We got to go. But, uh, last question, um, snowflake undervalued, overvalued or just about right >>in the stock market or in customers. Yeah. Yeah, well, but, you know, I'm not sure that's the right question. >>That's the question I'm asking. You know, >>I'll say the question is undervalued or overvalued for customers, right? That's really what matters. Um, there's a different audience. Who cares about the investor side? Some of those are watching, but But I believe I believe that the from the customer's perspective, it's probably valued about right, because >>the reason I I ask it, is because it has so hyped. You had $100 billion value. It's the past service now is value, which is crazy for this student Now. It's obviously come back quite a bit below its IPO price. So But you guys are at the financial analyst meeting. Scarpelli laid out 2029 projections signed up for $10 billion.25 percent free time for 20% operating profit. I mean, they better be worth more than they are today. If they do >>that. If I If I see the momentum here this week, I think they are undervalued. But before this week, I probably would have thought there at the right evaluation, >>I would say they're probably more at the right valuation employed because the IPO valuation is just such a false valuation. So hyped >>guys, I could go on for another 45 minutes. Thanks so much. David. Tony Sanjeev. Always great to have you on. We'll have you back for sure. Having us. All right. Thank you. Keep it right there. Were wrapping up Day two and the Cube. Snowflake. Summit 2022. Right back. Mm. Mhm.

Published Date : Jun 16 2022

SUMMARY :

What have you seen? And I also think that the native applications as part of the I've heard a lot of data mesh talk this week. seem to get away from it. It seems to be gathering momentum, but But what have you seen? but I think the idea that you can put all the data in one place which, And the thing is that but they I think where they're converging is the idea of operational that the data breaks is put out there. So, for example, you get the, you know, the micro partitioning. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I think it's different and I'll tell you why. But it sure is a lot easier to bring along those people who know sequel already the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. you know, data prep from, you know, don't I still have to have expertise? So so that way, you know, snowflakes done a great job of Did you see anything here that indicated there making strong is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously I want to ask you guys about this super cloud we Can you become the platform that's hard, hard to predict? I mean, this is like the you know, the old days with enterprise applications You can stream the thing you can build, test deploy, You can do data preparation and you can do We have a cloud data and you go, Okay, that's nice, I think I In the old days, it would have been you know, how you know the prime household names. You know, basically the snowflake coming from, you know, from the from the data analyst angle. Exactly. I was like, No, But then you see Mongo last week. But yet they finally have started to write their first real sequel. So what One third. So the hyper scale is but the hyper scale urz are you going to trust your hyper scale But I do think organisations want that leverage I mean, I I think you know, from a database standpoint, I think you're right. not as quickly. But we're gonna operate better on our Exactly. the hyper scalar, but you have to replicate your data. I mean, again, you hit that. but, you know, I'm not sure that's the right question. That's the question I'm asking. that the from the customer's perspective, it's probably valued about right, So But you guys are at the financial analyst meeting. But before this week, I probably would have thought there at the right evaluation, I would say they're probably more at the right valuation employed because the IPO valuation is just such Always great to have you on.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Frank Sullivan	PERSON	0.99+
Tony	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Tony Blair	PERSON	0.99+
Tony Sanjeev	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
David McGregor	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
$100 billion	QUANTITY	0.99+
Ventana Research	ORGANIZATION	0.99+
2013	DATE	0.99+
last week	DATE	0.99+
52%	QUANTITY	0.99+
Sanjeev Mohan Sanremo	PERSON	0.99+
more than one cloud	QUANTITY	0.99+
2014	DATE	0.99+
2029 projections	QUANTITY	0.99+
two companies	QUANTITY	0.99+
45 minutes	QUANTITY	0.99+
San James Point	LOCATION	0.99+
$10 billion.25 percent	QUANTITY	0.99+
one application	QUANTITY	0.99+
Odula	PERSON	0.99+
John Kerry	PERSON	0.99+
Python	TITLE	0.99+
Summit 2022	EVENT	0.99+
Data Warehouse	ORGANIZATION	0.99+
Snowflake	EVENT	0.98+
Scarpelli	PERSON	0.98+
Data Lake	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
this week	DATE	0.98+
today	DATE	0.98+
10 different tables	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
one	QUANTITY	0.97+
Apache	ORGANIZATION	0.97+
Day two	QUANTITY	0.97+
DB Inside	ORGANIZATION	0.96+
one place	QUANTITY	0.96+
one source	QUANTITY	0.96+
one third	QUANTITY	0.96+
Snowflake Summit 2022	EVENT	0.96+
One third	QUANTITY	0.95+
two thirds	QUANTITY	0.95+
Claudia	PERSON	0.94+
one time	QUANTITY	0.94+
one cloud provider	QUANTITY	0.94+
Two thirds	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
data lake	ORGANIZATION	0.92+
Snow Park	LOCATION	0.92+
Cloudera	ORGANIZATION	0.91+
two different levels	QUANTITY	0.91+
three	QUANTITY	0.91+
one cluster	QUANTITY	0.89+
single query	QUANTITY	0.87+
aws	ORGANIZATION	0.84+
first ones	QUANTITY	0.83+
Snowflake summit 2022	EVENT	0.83+
azure	ORGANIZATION	0.82+
mongo db	ORGANIZATION	0.82+
One	QUANTITY	0.81+
Eunice store	ORGANIZATION	0.8+
wave of	EVENT	0.78+
cloud	ORGANIZATION	0.77+
first real sequel	QUANTITY	0.77+
M c M Q.	PERSON	0.76+
Red shift	ORGANIZATION	0.74+
Anaconda	ORGANIZATION	0.73+
Snowflake	ORGANIZATION	0.72+
ASAP	ORGANIZATION	0.71+
Snow	ORGANIZATION	0.68+
snowflake	TITLE	0.66+
Park	TITLE	0.64+
Cube	COMMERCIAL_ITEM	0.63+
Apache	TITLE	0.63+
Mrr	PERSON	0.63+
senior vice president	PERSON	0.62+
Wall Street	ORGANIZATION	0.6+

Analyst Predictions 2022: The Future of Data Management

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

ENTITIES

Entity	Category	Confidence
381 databases	QUANTITY	0.99+
2014	DATE	0.99+
2022	DATE	0.99+
2021	DATE	0.99+
january of 2022	DATE	0.99+
100 users	QUANTITY	0.99+
jamal dagani	PERSON	0.99+
last week	DATE	0.99+
dave meninger	PERSON	0.99+
sanji	PERSON	0.99+
second question	QUANTITY	0.99+
15 converged data stores	QUANTITY	0.99+
dave vellante	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
three	QUANTITY	0.99+
sanjeev	PERSON	0.99+
2023	DATE	0.99+
15 data stores	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
last year	DATE	0.99+
sanjeev mohan	PERSON	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
carl	PERSON	0.99+
tony	PERSON	0.99+
carl olufsen	PERSON	0.99+
six years	QUANTITY	0.99+
david	PERSON	0.99+
carlos specter	PERSON	0.98+
both sides	QUANTITY	0.98+
2010s	DATE	0.98+
first backlash	QUANTITY	0.98+
five years	QUANTITY	0.98+
today	DATE	0.98+
dave	PERSON	0.98+
each	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
first	QUANTITY	0.98+
single platform	QUANTITY	0.98+
lake house	ORGANIZATION	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
doug	PERSON	0.97+
one word	QUANTITY	0.97+
this year	DATE	0.97+
wikibon.com	OTHER	0.97+
one platform	QUANTITY	0.97+
39	QUANTITY	0.97+
about 600 percent	QUANTITY	0.97+
two analysts	QUANTITY	0.97+
ten years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
five	QUANTITY	0.96+
one	QUANTITY	0.96+
three quarters	QUANTITY	0.96+
california	LOCATION	0.96+
google	ORGANIZATION	0.96+
single	QUANTITY	0.95+

Nipun Agarwal, Oracle | CUBEconversation

(bright upbeat music) >> Hello everyone, and welcome to the special exclusive CUBE Conversation, where we continue our coverage of the trends of the database market. With me is Nipun Agarwal, who's the vice president, MySQL HeatWave in advanced development at Oracle. Nipun, welcome. >> Thank you Dave. >> I love to have technical people on the Cube to educate, debate, inform, and we've extensively covered this market. We were all over the Snowflake IPO and at that time I remember, I challenged organizations bring your best people. Because I want to better understand what's happening at Database. After Oracle kind of won the Database wars 20 years ago, Database kind of got boring. And then it got really exciting with the big data movement, and all the, not only SQL stuff coming out, and Hadoop and blah, blah, blah. And now it's just exploding. You're seeing huge investments from many of your competitors, VCs are trying to get into the action. Meanwhile, as I've said many, many times, your chairman and head of technology, CTO, Larry Ellison, continues to invest to keep Oracle relevant. So it's really been fun to watch and I really appreciate you coming on. >> Sure thing. >> We have written extensively, we talked to a lot of Oracle customers. You get the leading mission critical database in the world. Everybody from Fortune 100, we evaluated what Gardner said about the operational databases. I think there's not a lot of question there. And we've written about that on WikiBound about you're converged databases, and the strategy there, and we're going to get into that. We've covered Autonomous Data Warehouse Exadata Cloud at Customer, and then we just want to really try to get into your area, which has been, kind of caught our attention recently. And I'm talking about the MySQL Database Service with HeatWave. I love the name, I laugh. It was an unveiled, I don't know, a few months ago. So Nipun, let's start the discussion today. Maybe you can update our viewers on what is HeatWave? What's the overall focus with Oracle? And how does it fit into the Cloud Database Service? >> Sure Dave. So HeatWave is a in-memory query accelerator for the MySQL Database Service for speeding up analytic queries as well as long running complex OLTP queries. And this is all done in the context of a single database which is the MySQL Database Service. Also, all existing MySQL applications or MySQL compatible tools and applications continue to work as is. So there is no change. And with this HeatWave, Oracle is delivering the only MySQL service which provides customers with a single unified platform for both analytic as well as transaction processing workloads. >> Okay, so, we've seen open source databases in the cloud growing very rapidly. I mentioned Snowflake, I think Google's BigQuery, get some mention, we'll talk, we'll maybe talk more about Redshift later on, but what I'm wondering, well let's talk about now, how does MySQL HeatWave service, how does that compare to MySQL-based services from other cloud vendors? I can get MySQL from others. In fact, I think we do. I think we run WikiBound on the LAMP stack. I think it's running on Amazon, but so how does your service compare? >> No other vendor, like, no other vendor offers this differentiated solution with an open source database namely, having a single database, which is optimized both for transactional processing and analytics, right? So the example is like MySQL. A lot of other cloud vendors provide MySQL service but MySQL has been optimized for transaction processing so when customs need to run analytics they need to move the data out of MySQL into some other database for any analytics, right? So we are the only vendor which is now offering this unified solution for both transactional processing analytics. That's the first point. Second thing is, most of the vendors out there have taken open source databases and they're basically hosting it in the cloud. Whereas HeatWave, has been designed from the ground up for the cloud, and it is a 100% compatible with MySQL applications. And the fact that we have designed it from the ground up for the cloud, maybe I'll spend 100s of person years of research and engineering means that we have a solution, which is very, very scalable, it's very optimized in terms of performance, and it is very inexpensive in terms of the cost. >> Are you saying, well, wait, are you saying that you essentially rewrote MySQL to create HeatWave but at the same time maintained compatibility with existing applications? >> Right. So we enhanced MySQL significantly and we wrote a whole bunch of new code which is brand new code optimized for the cloud in such a manner that yes, it is 100% compatible with all existing MySQL applications. >> What does it mean? And if I'm to optimize for the cloud, I mean, I hear that and I say, okay, it's taking advantage of cloud-native. I hear kind of the buzzwords, cloud-first, cloud-native. What does it specifically mean from a technical standpoint? >> Right. So first, let's talk about performance. What we have done is that we have looked at two aspects. We have worked with shapes like for instance, like, the compute shapes which provide the best performance for dollar, per dollar. So I'll give you a couple of examples. We have optimized for certain shifts. So, HeatWave is in-memory query accelerator. So the cost of the system is dominated by the cost. So we are working with chips which provide the cheapest cost per terabyte of memory. Secondly, we are using commodity cloud services in such a manner that it's in-optimized for both performance as well as performance per dollar. So, example is, we are not using any locally-attached SSDs. We use ObjectStore because it's very inexpensive. And then I guess at some point I will get into the details of the architecture. The system has been really, really designed for massive scalability. So as you add more compute, as you add more service, the system continues to scale almost perfectly linearly. So this is what I mean in terms of being optimized for the cloud. >> All right, great. >> And furthermore, (indistinct). >> Thank you. No, carry on. >> Over the next few months, you will see a bunch of other announcements where we're adding a whole bunch of machine learning and data driven-based automation which we believe is critical for the cloud. So optimized for performance, optimized for the cloud, and machine learning-based automation which we believe is critical for any good cloud-based service. >> All right, I want to come back and ask you more about the architecture, but you mentioned some of the others taking open source databases and shoving them into the cloud. Let's take the example of AWS. They have a series of specialized data stores and, for different workloads, Aurora is for OLTP I actually think it's based on MySQL Redshift which is based on ParAccel. And so, and I've asked Amazon about this, and their response is, actually kind of made sense to me. Look, we want the right tool for the right job, we want access to the primitives because when the market changes we can change faster as opposed to, if we put, if we start building bigger and bigger databases with more functionality, it's, we're not as agile. So that kind of made sense to me. I know we, again, we use a lot, we use, I think I said MySQL in Amazon we're using DynamoDB, works, that's cool. We're not huge. And I, we fully admit and we've researched this, when you start to get big that starts to get maybe expensive. But what do you think about that approach and why is your approach better? >> Right, we believe that there are multiple drawbacks of having different databases or different services, one, optimized for transactional processing and one for analytics and having to ETL between these different services. First of all, it's expensive because you have to manage different databases. Secondly, it's complex. From an application standpoint, applications need, now need to understand the semantics of two different databases. It's inefficient because you have to transfer data at some PRPC from one database to the other one. It's not secure because there is security aspects involved when your transferring data and also the identity of users in the two different databases is different. So it's, the approach which has been taken by Amazons and such, we believe, is more costly, complex, inefficient and not secure. Whereas with HeatWave, all the data resides in one database which is MySQL and it can run both transaction processing and analytics. So in addition to all the benefits I talked about, customers can also make their decisions in real time because there is no need to move the data. All the data resides in a single database. So as soon as you make any changes, those changes are visible to customers for queries right away, which is not the case when you have different siloed specialized databases. >> Okay, that, a lot of ways to skin a cat and that what you just said makes sense. By the way, we were saying before, companies have taken off the shelf or open source database has shoved them in the cloud. I have to give Amazon some props. They actually have done engineering to Aurora and Redshift. And they've got the engineering capabilities to do that. But you can see, for example, in Redshift the way they handle separating compute from storage it's maybe not as elegant as some of the other players like a Snowflake, for example, but they get there and they, maybe it's a little bit more brute force but so I don't want to just make it sound like they're just hosting off the shelf in the cloud. But is it fair to say that there's like a crossover point? So in other words, if I'm smaller and I'm not, like doing a bunch of big, like us, I mean, it's fine. It's easy, I spin it up. It's cheaper than having to host my own servers. So there's, presumably there's a sweet spot for that approach and a sweet spot for your approach. Is that fair or do you feel like you can cover a wider spectrum? >> We feel we can cover the entire spectrum, not wider, the entire spectrum. And we have benchmarks published which are actually available on GitHub for anyone to try. You will see that this approach you have taken with the MySQL Database Service in HeatWave, we are faster, we are cheaper without having to move the data. And the mileage or the amount of improvement you will get, surely vary. So if you have less data the amount of improvement you will get, maybe like say 100 times, right, or 500 times, but smaller data sizes. If you get to lots of data sizes this improvement amplifies to 1000 times or 10,000 times. And similarly for the cost, if the data size is smaller, the cost advantage you will have is less, maybe MySQL HeatWave is one third the cost. If the data size is larger, the cost advantage amplifies. So to your point, MySQL Database Service in HeatWave is going to be better for all sizes but the amount of mileage or the amount of benefit you will get increases as the size of the data increases. >> Okay, so you're saying you got better performance, better cost, better price performance. Let me just push back a little bit on this because I, having been around for awhile, I often see these performance and price comparisons. And what often happens is a vendor will take the latest and greatest, the one they just announced and they'll compare it to an N-1 or an N-2, running on old hardware. So, is, you're normalizing for that, is that the game you're playing here? I mean, how can you, give us confidence that this is easier kind of legitimate benchmarks in your GitHub repo. >> Absolutely. I'll give you a bunch of like, information. But let me preface this by saying that all of our scripts are available in the open source in the GitHub repo for anyone to try and we would welcome feedback otherwise. So we have taken, yes, the latest version of MySQL Database Service in HeatWave, we have optimized it, and we have run multiple benchmarks. For instance, TBC-H, TPC-DS, right? Because the amount of improvement a query will get depends upon the specific query, depends upon the predicates, it depends on the selectivity so we just wanted to use standard benchmarks. So it's not the case that if you're using certain classes of query, excuse me, benefit, get them more. So, standard benchmarks. Similarly, for the other vendors or other services like Redshift, we have run benchmarks on the latest shapes of Redshift the most optimized configuration which they recommend, running their scripts. So this is not something that, hey, we're just running out of the box. We have optimized Aurora, we have optimized (indistinct) to the best and possible extent we can based on their guidelines, based on their latest release, and that's what you're talking about in terms of the numbers. >> All right. Please continue. >> Now, for some other vendors, if we get to the benchmark section, we'll talk about, we are comparing with other services, let's say Snowflake. Well there, there are issues in terms of you can't legally run Snowflake numbers, right? So there, we have looked at some reports published by Gigaom report. and we are taking the numbers published by the Gigaom report for Snowflake, Google BigQuery and as you'll see maps numbers, right? So those, we have not won ourselves. But for AWS Redshift, as well as AWS Aurora, we have run the numbers and I believe these are the best numbers anyone can get. >> I saw that Gigaom report and I got to say, Gigaom, sometimes I'm like, eh, but I got to say that, I forget the guy's name, he knew what he was talking about. He did a good job, I thought. I was curious as to the workload. I always say, well, what's the workload. And, but I thought that report was pretty detailed. And Snowflake did not look great in that report. Oftentimes, and they've been marketing the heck out of it. I forget who sponsored it. It is, it was sponsored content. But, I did, I remember seeing that and thinking, hmm. So, I think maybe for Snowflake that sweet spot is not, maybe not that performance, maybe it's the simplicity and I think that's where they're making their mark. And most of their databases are small and a lot of read-only stuff. And so they've found a market there. But I want to come back to the architecture and really sort of understand how you've able, you've been able to get this range of both performance and cost you talked about. I thought I heard that you're optimizing the chips, you're using ObjectStore. You're, you've got an architecture that's not using SSD, it's using ObjectStore. So this, is their cashing there? I wonder if you could just give us some details of the architecture and tell us how you got to where you are. >> Right, so let me start off saying like, what are the kind of numbers we are talking about just to kind of be clear, like what the improvements are. So if you take the MySQL Database Service in HeatWave in Oracle Cloud and compare it with MySQL service in any other cloud, and if you look at smaller data sizes, say data sizes which are about half a terabyte or so, HeatWave is 400 times faster, 400 times faster. And as you get to... >> Sorry. Sorry to interrupt. What are you measuring there? Faster in terms of what? >> Latency. So we take TCP-H 22 queries, we run them on HeatWave, and we run the same queries on MySQL service on any other cloud, half a terabyte and the performance in terms of latency is 400 times faster in HeatWave. >> Thank you. Okay. >> If you go to a lot of other data sites, then the other data point of view, we're looking at say something like, 4 TB, there, we did two comparisons. One is with AWS Aurora, which is, as you said, they have taken MySQL. They have done a bunch of innovations over there and we are offering it as a premier service. So on 4 TB TPC-H, MySQL Database Service with HeatWave is 1100 times faster than Aurora. It is three times faster than the fastest shape of Redshift. So Redshift comes in different flavors some talking about dense compute too, right? And again, looking at the most recommended configuration from Redshift. So 1100 times faster that Aurora, three times faster than Redshift and at one third, the cost. So this where I just really want to point out that it is much faster and much cheaper. One third the cost. And then going back to the Gigaom report, there was a comparison done with Snowflake, Google BigQuery, Redshift, Azure Synapse. I wouldn't go into the numbers here but HeatWave was faster on both TPC-H as well as TPC-DS across all these products and cheaper compared to any of these products. So faster, cheaper on both the benchmarks across all these products. Now let's come to, like, what is the technology underneath? >> Great. >> So, basically there are three parts which you're going to see. One is, improve performance, very good scale, and improve a lower cost. So the first thing is that HeatWave has been optimized and, for the cloud. And when I say that, we talked about this a bit earlier. One is we are using the cheapest shapes which are available. We're using the cheapest services which are available without having to compromise the performance and then there is this machine learning-based automation. Now, underneath, in terms of the architecture of HeatWave there are basically, I would say, four key things. First is, HeatWave is an in-memory engine that a presentation which we have in memory is a hybrid columnar representation which is optimized for vector process. That's the first thing. And that's pretty table stakes these days for anyone who wants to do in-memory analytics except that it's hybrid columnar which is optimized for vector processing. So that's the first thing. The second thing which starts getting to be novel is that HeatWave has a massively parallel architecture which is enabled by a massively partitioned architecture. So we take the data, we read the data from MySQL into the memory of the HeatWave and we massively partition this data. So as we're reading the data, we're partitioning the data based on the workload, the sizes of these partitions is such that it fits in the cache of the underlying processor and then we're able to consume these partitions really, really fast. So that's the second bit which is like, massively parallel architecture enabled by massively partitioned architecture. Then the third thing is, that we have developed new state-of-art algorithms for distributed query processing. So for many of the workloads, we find that joints are the long pole in terms of the amount of time it takes. So we at Oracle have developed new algorithms for distributed joint processing and similarly for many other operators. And this is how we're being able to consume this data or process this data, which is in-memory really, really fast. And finally, and what we have, is that we have an eye for scalability and we have designed algorithms such that there's a lot of overlap between compute and communication, which means that as you're sending data across various nodes and there could be like, dozens of of nodes or 100s of nodes that they're able to overlap the computation time with the communication time and this is what gives us massive scalability in the cloud. >> Yeah, so, some hard core database techniques that you've brought to HeatWave, that's impressive. Thank you for that description. Let me ask you, just to go to quicker side. So, MySQL is open source, HeatWave is what? Is it like, open core? Is it open source? >> No, so, HeatWave is something which has been designed and optimized for the cloud. So it can't be open source. So any, it's not open service. >> It is a service. >> It is a service. That's correct. >> So it's a managed service that I pay Oracle to host for me. Okay. Got it. >> That's right. >> Okay, I wonder if you could talk about some of the use cases that you're seeing for HeatWave, any patterns that you're seeing with customers? >> Sure, so we've had the service, we had the HeatWave service in limited availability for almost 15 months and it's been about five months since we have gone G. And there's a very interesting trend of our customers we're seeing. The first one is, we are seeing many migrations from AWS specifically from Aurora. Similarly, we are seeing many migrations from Azure MySQL we're migrations from Google. And the number one reason customers are coming is because of ease of use. Because they have their databases currently siloed. As you were talking about some for optimized for transactional processing, some for analytics. Here, what customers find is that in a single database, they're able to get very good performance, they don't need to move the data around, they don't need to manage multiple databaes. So we are seeing many migrations from these services. And the number one reason is reduce complexity of ease of use. And the second one is, much better performance and reduced costs, right? So that's the first thing. We are very excited and delighted to see the number of migrations we're getting. The second thing which we're seeing is, initially, when we had the service announced, we were like, targeting really towards analytics. But now what are finding is, many of these customers, for instance, who have be running on Aurora, when they are moving from MySQL in HeatWave, they are finding that many of the OLTP queries as well, are seeing significant acceleration with the HeatWave. So now customers are moving their entire applications or, to HeatWave. So that's the second trend we're seeing. The third thing, and I think I kind of missed mentioning this earlier, one of the very key and unique value propositions we provide with the MySQL Database Service in HeatWave, is that we provide a mechanism where if customers have their data stored on premise they can still leverage the HeatWave service by enabling MySQL replication. So they can have their data on premise, they can replicate this data in the Oracle Cloud and then they can run analytics. So this deployment which we are calling the hybrid deployment is turning out to be very, very popular because there are customers, there are some customers who for various reasons, compliance or regulatory reasons cannot move the entire data to the cloud or migrate the data to the cloud completely. So this provides them a very good setup where they can continue to run their existing database and when it comes to getting benefits of HeatWave for query acceleration, they can set up this replication. >> And I can run that on anyone, any available server capacity or is there an appliance to facilitate that? >> No, this is just standard MySQL replication. So if a customer is running MySQL on premise they can just turn off this application. We have obviously enhanced it to support this inbound replication between on-premise and Oracle Cloud with something which can be enabled as long as the source and destination are both MySQL. >> Okay, so I want to come back to this sort of idea of the architecture a little bit. I mean, it's hard for me to go toe to toe with the, I'm not an engineer, but I'm going to try anyway. So you've talked about OLTP queries. I thought, I always thought HeatWave was optimized for analytics. But so, I want to push on this notion because people think of this the converged database, and what you're talking about here with HeatWave is sort of the Swiss army knife which is great 'cause you got a screwdriver and you got Phillips and a flathead and some scissors, maybe they're not as good. They're not as good necessarily as the purpose-built tool. But you're arguing that this is best of breed for OLTP and best of breed for analytics, both in terms of performance and cost. Am I getting that right or is this really a Swiss army knife where that flathead is really not as good as the big, long screwdriver that I have in my bag? >> Yes, so, you're getting it right but I did want to make a clarification. That HeatWave is definitely the accelerator for all your queries, all analytic queries and also for the long running complex transaction processing inquiries. So yes, HeatWave the uber query accelerator engine. However, when it comes to transaction processing in terms of your insert statements, delete statements, those are still all done and served by the MySQL database. So all, the transactions are still sent to the MySQL database and they're persistent there, it's the queries for which HeatWave is the accelerator. So what you said is correct. For all query acceleration, HeatWave is the engine. >> Makes sense. Okay, so if I'm a MySQL customer and I want to use HeatWave, what do I have to do? Do I have to make changes to my existing applications? You applied earlier that, no, it's just sort of plugs right in. But can you clarify that. >> Yes, there are absolutely no changes, which any MySQL or MySQL compatible application needs to make to take advantage of HeatWave. HeatWave is an in-memory accelerator and it's completely transparent to the application. So we have like, dozens and dozens of like, applications which have migrated to HeatWave, and they are seeing the same thing, similarly tools. So if you look at various tools which work for analytics like, Tableau, Looker, Oracle Analytics Cloud, all of them will work just seamlessly. And this is one of the reasons we had to do a lot of heavy lifting in the MySQL database itself. So the MySQL database engineering team was, has been very actively working on this. And one of the reasons is because we did the heavy lifting and we meet enhancements to the MySQL optimizer in the MySQL storage layer to do the integration of HeatWave in such a seamless manner. So there is absolutely no change which an application needs to make in order to leverage or benefit from HeatWave. >> You said earlier, Nipun, that you're seeing migrations from, I think you said Aurora and Google BigQuery, you might've said Redshift as well. Do you, what kind of tooling do you have to facilitate migrations? >> Right, now, there are multiple ways in which customers may want to do this, right? So the first tooling which we have is that customers, as I was talking about the replication or the inbound replication mechanism, customers can set up heat HeatWave in the Oracle Cloud and they can send the data, they can set up replication within their instances in their cloud and HeatWave. Second thing is we have various kinds of tools to like, facilitate the data migration in terms of like, fast ingestion sites. So there are a lot of such customers we are seeing who are kind of migrating and we have a plethora of like, tools and applications, in addition to like, setting up this inbound application, which is the most seamless way of getting customers started with HeatWave. >> So, I think you mentioned before, I have my notes, machine intelligence and machine learning. We've seen that with autonomous database it's a big, big deal obviously. How does HeatWave take advantage of machine intelligence and machine learning? >> Yeah, and I'm probably going to be talking more about this in the future, but what we have already is that HeatWave uses machine learning to intelligently automate many operations. So we know that when there's a service being offered in the cloud, our customers expect automation. And there're a lot of vendors and a lot of services which do a good job in automation. One of the places where we're going to be very unique is that HeatWave uses machine learning to automate many of these operations. And I'll give you one such example which is provisioning. Right now with HeatWave, when a customer wants to determine how many nodes are needed for running their workload, they don't need to make a guess. They invoke a provisioning advisor and this advisor uses machine learning to sample a very small percentage of the data. We're talking about, like, 0.1% sampling and it's able to predict the amount of memory with 95% accuracy, which this data is going to take. And based on that, it's able to make a prediction of how many servers are needed. So just a simple operation, the first step of provisioning, this is something which is done manually across, on any of the service, whereas at HeatWave, we have machine learning-based advisor. So this is an example of what we're doing. And in the future, we'll be offering many such innovations as a part of the MySQL Database and the HeatWave service. >> Well, I've got to say I was skeptic but I really appreciate it, you're, answering my questions. And, a lot of people when you made the acquisition and inherited MySQL, thought you were going to kill it because they thought it would be competitive to Oracle Database. I'm happy to see that you've invested and figured out a way to, hey, we can serve our community and continue to be the steward of MySQL. So Nipun, thanks very much for coming to the CUBE. Appreciate your time. >> Sure. Thank you so much for the time, Dave. I appreciate it. >> And thank you for watching everybody. This is Dave Vellante with another CUBE Conversation. We'll see you next time. (bright upbeat music)

Published Date : Apr 28 2021

SUMMARY :

of the trends of the database market. So it's really been fun to watch and the strategy there, for the MySQL Database Service on the LAMP stack. And the fact that we have designed it optimized for the cloud I hear kind of the buzzwords, So the cost of the system Thank you. critical for the cloud. So that kind of made sense to me. So it's, the approach which has been taken By the way, we were saying before, the amount of improvement you will get, is that the game you're playing here? So it's not the case All right. and we are taking the numbers published of the architecture and if you look at smaller data sizes, Sorry to interrupt. and the performance in terms of latency Thank you. So faster, cheaper on both the benchmarks So for many of the workloads, to go to quicker side. and optimized for the cloud. It is a service. So it's a managed cannot move the entire data to the cloud as long as the source and of the architecture a little bit. and also for the long running complex Do I have to make changes So the MySQL database engineering team to facilitate migrations? So the first tooling which and machine learning? and the HeatWave service. and continue to be the steward of MySQL. much for the time, Dave. And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Nipun Agarwal	PERSON	0.99+
Nipun	PERSON	0.99+
AWS	ORGANIZATION	0.99+
400 times	QUANTITY	0.99+
Dave	PERSON	0.99+
1000 times	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
10,000 times	QUANTITY	0.99+
100%	QUANTITY	0.99+
HeatWave	ORGANIZATION	0.99+
second bit	QUANTITY	0.99+
MySQL	TITLE	0.99+
95%	QUANTITY	0.99+
100 times	QUANTITY	0.99+
two aspects	QUANTITY	0.99+
500 times	QUANTITY	0.99+
0.1%	QUANTITY	0.99+
half a terabyte	QUANTITY	0.99+
dozens	QUANTITY	0.99+
1100 times	QUANTITY	0.99+
4 TB	QUANTITY	0.99+
first point	QUANTITY	0.99+
First	QUANTITY	0.99+
Phillips	ORGANIZATION	0.99+
Amazons	ORGANIZATION	0.99+
three times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
One third	QUANTITY	0.99+
one database	QUANTITY	0.99+
second thing	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
Snowflake	TITLE	0.99+

Juan Loaiza, Oracle | CUBE Conversation 2021

(upbeat music) >> The innovation around databases has exploded over the last few years. Not only do organizations continue to rely on database technology to manage their most mission critical business data. But new use cases have emerged that process and analyze unstructured data. They share data at scale, protect data, provide greater heterogeneity. New technologies are being injected into the database equation. Not just cloud which has been a huge force in the space, but also AI to drive better insights and automation, blockchain to protect data and provide better auditability, new file formats to expand the utility of database technology and more. Debates are bound as to who's the best number one, the fastest, the most cloudy, the least expensive, et cetera. But there is no debate, when it comes to leadership and mission critical database technologies. That status goes to Oracle. And with me to talk about the developments of database technology in the market is cube alum Juan Loaiza, who's executive vice president of Mission Critical Database Technology at Oracle. Juan always great to see you, thanks for making some time. >> Thanks, great to see you Dave, always a pleasure to join you. >> Yeah and I hope you have some time because they've got a lot of questions for you. (chuckles) I want to start with- >> All right I love questions. >> Good I want to start and we'll go deep if you're up for it. I want to start with the GoldenGate announcement. We're covering that recent announcement, the service on OCI. GoldenGate it's part of this your super high availability capabilities that Oracle is so well known for. What do we need to know about the new service and what it brings for your customers? >> Yeah, so first of all, GoldenGate is all about creating real time data throughout an enterprise. So it does replication, data integration, moving data into analytic workloads, streaming analytics of data, migrating of databases and making databases highly available. All those are use cases for real-time data movement. And GoldenGate is really the leading product in the market, has been for many years. We have about 80% of the global fortune 500 running GoldenGate today, in addition to thousands and thousands of smaller customers. So it is the premier data integration, replication, high availability, anything involving moving data in real time, GoldenGate is the premier platform. And so we've had that available as a product for many years. And what we just recently done is we've released it as a cloud service, as a fully managed and automated cloud service. So that's kind of the big new thing that's happening right now. >> So is that what's unique about this, is it's now a service, or there are other attributes that are unique to Oracle? >> Yeah, so the service is kind of the most basic part to it. But the big thing about the service is it makes this product dramatically easier to use. So traditionally the data integration, replication products, although very powerful, also are very complex to use. And one of the big benefits of the service is we've made a dramatically simpler. So not just super experts can use it, but anyone can use it. And also as part of releasing it as a cloud service, we've done a number of unique things including making it completely elastically scalable, pay per use and dynamic scalability. So just in time, real time scalability. So as your workload increases we automatically increase the throughput of GoldenGate. So previously you had to figure all this stuff out ahead of time. It was very static. All these products have been very static. Now it's completely dynamic a native cloud product and that's very unique in the market. >> So, I mean, from an availability standpoint, I guess IBM sort of has this with Db2 but it doesn't offer the heterogeneity that GoldenGate has. But at what about like AWS, Microsoft, Google, do they provide services like, like GoldenGate? >> There's really nothing like the GoldenGate service. When you're talking about people like Google and Azure, they really have do it yourself third-party products. So there'll be a third party data integration replication product, and it's kind of available in their marketplace and customers have to do everything. So it's basically a put it together, your own kit. And it's very complicated. I mean these data integration products have always been complicated, and they're even more complicated in the cloud, if you have to do everything yourself. Amazon has a product but it's really focused on basic data migration to their cloud. It doesn't have the same capabilities as Oracle has. It doesn't have the elasticity, it doesn't have pay peruse, so it's really not very clavy at all. >> Well, so I mean the biggest customers have always glommed onto GoldenGate because they need that super ultra high availability. And they're capable of do it yourself. So, tell us how this compares to two DIY. >> Yeah, so you have mentioned the big customers so you're absolutely right. The big customers have been big users of GoldenGate. Smaller customers or users as well, however, it's been challenging because it's complicated. Data integration has been a complicated area of data management. More and most complicated. And so one of the things this does, is that it expands the market. Makes it much dramatically easier for smaller companies that don't have as many it resources to use the product. Also, smaller companies obviously don't have as much data as the really large giants. So they don't have as much data throughput. So traditionally the price has been high for a small customer. But now, with pay per use in the cloud, it eliminates the two big blockers for smaller enterprises. Which are the costs, the high fixed costs and the complexity of the products. So in which, by the way, it's helpful for everyone also. And for big customers they've also struggled with elasticity. So sometimes a huge batch job will kick in, the rate of change increases and suddenly the replication product doesn't keep up. Because on-prem products aren't really very elastic. So it helps large customers as well. Everybody loves these reviews but the elasticity pay per use, on demand nature of it's really helpful for everybody. >> Well, and because it's delivered as a service I would imagine for the large customers that you're giving them more granularity, so they can apply it maybe for a single application, as opposed to trying to have to justify it across a whole suite. And because the cost is higher, but now if you're allowing me to pay by the drink, is that right? I could just sort of apply it in a more granular level. >> Yes, that's exactly right. It's really pay per use. You can use it as much or as little as you want. You just pay for what you use. And as I mentioned, it's not a static payment either. So if you have a lot of data loads going on and right now you pay a little more, at night when you have less going on, you pay a lot less. So you really just paying for what use. It's very easy to set it up for a single application or all your applications. >> How about for things like continuous replication or real-time analytics, is the service designed to support that? >> Yes, so that's the heritage of GoldenGate. GoldenGate has been around for decades and we've worked with some of the most demanding customers in the world on exactly those things. So real time data all over the enterprise is really the goal that everyone wants. Real-time data from OTP and to analytics, from one system to another system, and for availability. That is the key benefit of GoldenGate. And that's the key technology that we've been working on for decades. And now we have it very easy to use in the cloud. >> Well what would be the overheads associated with that? I mean, for instance, you've go it, you need a second copy. You need the other database copies, and where does it make sense to incur that overhead? Obviously the super high availability apps that can exploit real time. Think like fraud detection is the obvious one, but what else can you add there? >> Well, GoldenGate itself doesn't require any extra copies of anything. However, it does enable customers that want to create for example, an analytics system, a data warehouse, to feed data from all their systems in real time into that data warehouse for example. And it also enables the real-time capabilities, enable high availability and you can get high availability within the cloud with it, between on premises in the cloud, between clouds. Also, you can migrate data. Migrate databases without having to take them down. So all these capabilities are available now and they're very easy to use. >> Okay. Thanks for that clarification. What about autonomous? Is that on the roadmap or what you thinking? >> Yeah, the GoldenGate is essentially an autonomous service. And it works with the Oracle Autonomous Database. So you can both use it as a source for data and as a sink for data, as a place you're writing data. So for example, you can have an autonomous OTP database, that's replicating to another autonomous OTP database in real time. And both of them are replicating changes to the autonomous data warehouse. But it doesn't all have to be autonomous. You can have any mix of, autonomous not autonomous, on-prem in cloud, in anybody's cloud. So that's the beauty of GoldenGate, It's extremely flexible. >> Well, you mentioned the plasticity a couple of times. I mean, why is that so important that that GoldenGate on OCI gives you that elastic, whatever billing the auto-scaling talk, talk to me in terms of what that does for the customer. >> Yeah, there's really two big benefits. One benefit is it's very difficult to predict workloads. So normally on an on-prem configuration, you have to say, okay what is the max possible workload that's going to happen here? And then you have to buy the product, configure the product, get hardware, basically size, everything for that. And then if you guess wrong, you're either spending too much because you oversized it or you have a big data real-time problem. The data can't keep up with the real-time because you've undersized the configuration. So that's hard to do. So the beauty of elasticity and the dynamic elasticity, the pay per use, is you don't have to figure all this stuff out. So if you have more workload, we grow it automatically. If you have less workload, we shrink it automatically. And you don't have to guess ahead of time. You don't have to price ahead of time. So you, you just use what, what you use, right? You don't pay for something that you're not using. So it's a very big change in the whole model of how you use these data, replication, integration, high availability technologies. >> Well, I think I'm correct to say GoldenGate primarily has been for big companies. You mentioned that small companies can now take advantage of this service. We talked about the granularity. And I could definitely see, can they afford it? I guess this is part one and then, and then the other part of the question is, I can see GoldenGate really satisfying your on-prem customers and them taking advantage of it, but do you think this will attract new customers beyond your core? So two part question there. >> Yeah, absolutely. So small customers have been challenged by the complexity of data integration. And that's one of the great things about the cloud services is it's dramatically simpler. So Oracle manages everything. Oracle does the patching, the upgrades. Oracle does the monitoring. It takes care of the high availability of the product. So all that management, complexity, all the configuration set up, everything like that, that's all automated, that's owned by Oracle. So small customers were always challenged by the complexity of product, along with everything else that they had to do. And then the other of course benefit is small customers were challenged by the large fixed price. So now with pay per use, they pay only for what they use. It's really usable by easily by small customers also. So it really expands the market and makes it more broadly applicable. >> So kind of same answer for beyond your existing customer base, beyond the on-prem that that's kind of... You answered >> Right. >> my two part question with one answer, so that was pretty efficient, (chuckles) pun intended. So the bottom line for me and squinting through this announcement is you've got the heterogeneity piece with GoldenGate OCI and as such it's going to give you the capability to create what I'll call an architecturally coherent decentralized data mesh. Big on this data mesh these days, could have decentralized data. With the proviso then I going to be able to connect to OCI, which of course you can do with Azure or I guess you could bring cloud to a customer on prem, first of all, is this correct? And can we expect you over time to do this with AWS or other cloud providers? >> It can move data from Amazon or to Amazon. It can actually handle, any data wherever it lives. So, yeah, it's very flexible and it's really just the automation of all the management, that we're running in our public cloud But the data can be from anywhere to anywhere. >> Cool, all right, let's switch topics here a little bit. Just talk about some of the things that you've been working on, some of the innovation. I sat through your blockchain announcement, it was very cool. Of course I love anything blockchain and crypto, NFTs are exploding, so that Coinbase IPO. It's just really an exciting time out there. I think a lot of people don't really appreciate the innovation that's occurring. So you've been making a lot of big announcements last several months. You've been taking your R and D bringing it into product, So that's great, we love to always see that because that's where really the rubber meets the road. Just for the database side of the house, you announced 21c the next generation of the self-driving data warehouse, ADW, blockchain tables, now you got GoldenGate running on OCI. Take us inside the development organizations. What are the underlying drivers other than your boss. >> When we talk about our autonomous database, it is the mission critical Oracle database, but it's dramatically easier to do. So Oracle does all the management all on automation, but also we use machine learning to tune, and to make it highly available, and to make it highly secure. So that that's been one of our biggest products we've been working on for many years. And recently we enhanced our autonomous data warehouse taking it beyond being a data warehouse to complete a data analytics platform. So it includes things like ETL. So we built ETL into the autonomous data warehouse. We're building our GoldenGate replication into autonomous data warehousing. We built machine learning directly natively into the database. So now, if someone wants to run some machine learning they just run a machine learning queries. They no longer have to stand up a separate system. So a big move that we've been making is, taking it beyond just a database to a full analytic platform. And this goes beyond what anyone else in the industry is doing, because we have a lot more technology. So for example, the ML machine learning directly in the database, the ETL directly in the database. The data replication is directly in the database. All these things are very unique to Oracle. And they dramatically simplify for customers how they manage data. In addition to that, we've also been working in our database product. We've enhanced it tremendously. So our big goal there is to provide what we call it converged database. So everything you need, all the data types. Whether it's JSON, relational, spatial, graph, all that different kinds of data types, all the different kinds of workloads. Analytics, OTP, things like blockchain, microservices events, all built into the Oracle database, making it dramatically easier to both develop and deploy new applications. So those are some of our big, big goals. Make it simple, make it integrated. Take the complexity, we'll take on the complexity. So developers and customers find it easy to develop an easy to use. And we've made huge strides in all these areas in the last couple of years. >> That's awesome. I wonder if we could land on blockchain again for now it's kind of jogging, but sort of on crypto. Though you're not about crypto but you are about applying blockchain. Maybe you can help our audience understand what are some of the real use cases where blockchain tech can be used with Oracle database. >> Yeah, so that's a very interesting topic. As you mentioned, blockchain is very currently, we see a lot of cryptocurrencies. I distributed applications for blockchain. So in general, in the past, we've had two worlds. We've had the enterprise data management world and we've had the blockchain world. And these are very distinct, right? And on the blockchain side the applications have mostly centered around, distributed multi-party applications, right? So where you have multiple parties that all want to reach consensus and then that consensus is stored in a blockchain. So that's kind of been the focus of blockchain. And what we've done is very innovative. We're the first company to ever do this. Is we've taken the core architecture, ideas. And really a lot of it has to do with the cryptography of blockchain. And we've built, we've engineered that natively into the mainstream Oracle database. So now in mainstream Oracle database, we have blockchain technology built in. And it's very dramatically simpler to use. And the use cases, you asked about the use case, that's what we've done. And it's taken us about five years to do this. Now it's been released into the market in our mainstream 19c Oracle database. So the use case is different from the conventional blockchain use case. Which I mentioned was really multi-party consensus based apps. We're trying to make blockchain useful for mainstream, enterprise and government applications. So any kind of mainstream government application, or enterprise application. And that idea of blockchain, the core concept of blockchain, is it addresses a different kind of security problem. So when you look at conventional security, it's really trying to keep people out. So we have things like firewalls, passwords, networking cryption, data encryption. It's all about keeping bad people out of the data. And there's really two big problems that it doesn't address well. One problem is that there's always new security exploits being published. So you have hackers out there that are working overtime. Sometimes they're nation States that are trying to attack data providers. And every week, every month there's a new security exploit that's discovered and this happens all the time. So that's one big problem. So we're building up these elaborate walls of protection around our core data assets. And in the meantime, we have basically barbarians attacking on every side.(chuckles) And every once in a while, they get over the walls and this is just what's happening. So that's one big problem. And the second big problem is elicit changes made by people with credentials. So sometimes you have an insider in your, in your company. Whether it's an administrator or a sales person, a support person, that has valid credentials, but then uses those valid credentials in some illicit way. They go out and change somebody's data for their own gain. And even more common than that cause there's not that many bad guys inside the company to they exist, is stolen credentials. So what's happened in many cases is hackers or nation States will steal for example, administrative credentials and then use those administrative credentials to come into a system and steal data. So that's the kind of problem that is not well addressed by security mechanism. So if you have privileges security mechanism says, yeah you're fine. If somebody steals your privileges, again you get the pass through the gate. And so what we've done with blockchain is we've taken the cryptography elements of blockchain. We call it crypto secure data management. And we've built those into the Oracle database. So think of it this way. If someone actually makes it through over the walls that we built, and in into the core data, what we've done with that cryptographic technology of blockchain, is we've made that immutable. So you can't change it. So even if you make it over the gate you can't get into the core data assets and change those assets. And that's not built into Oracle databases is super easy to adopt. And I think it's going to really enhance and expand the community of people that can actually use that blockchain technology. >> I mean, that's awesome. I could talk all day about blockchain. And I mean, when you think about hackers, it's all there. They're all about ROI, value over cost. And if you can increase the denominator they're going to go somewhere else, right? Because the value will will decline. And this is really the intersection of software engineering cryptography. And I guess even when you bring crypto currency into it, it's like sort of the game theory. That's really kind of not what you're all about, but the first two pieces are really critical in terms of just next generation of raising that security hurdle. Love it. Now, go ahead. >> Yeah it's a different approach. I was just going to say, it's a different approach. Because think about trying to keep people out with things like passwords and firewalls, you can have basically bugs in that software that allow people to exploit and get in. When you're talking about cryptography, that's math, it's very difficult. I mean, you really can't fight pass math. Once the data is cryptographically protected on a blockchain, a hacker can't really do anything with that. It's just, math is math. There's nothing you can do to break it, right. It's very different from trying to get through some algorithm. That's really trying to keep you out. >> Awesome. I said, I could talk forever on this topic. But let me, let me go into some competitive dynamics. You recently announced Autonomous Data Warehouse. You've got service capabilities that are really trying to appeal to the line of business. I want to get your take on that announcement and specifically how you think it compares name names. I'm going to name names you don't have to. But Snowflake, obviously a lot of momentum in the marketplace. AWS with Redshift is doing very, very well. Obviously there are others. But those are two prominent ones that we've tracked in our data shows that have momentum. How do you compare? >> Yeah, so there's a number of different ways to look at the comparison. So the most simplest and straightforward is there's a lot more functionality in Oracle data warehousing. Oracle has been doing this for decades. We have a lot of built-in functionality. For example, machine learning natively built into the database makes it super easy to use. We have mixed workloads, we have spatial capabilities. We have graph capabilities. We have JSON capabilities. We have a microservice capabilities. We have-- So there's a lot more capabilities. So that's number one. Number two, our cloud service is dramatically more elastic. So with our cloud service all you really do, is you basically move the slide. You say hey, I want more resources, I want less resources. In fact, we'll do that automatically, that's called auto-scaling. In contrast when you look at people like Snowflake or Redshift they want you to stand up a new cluster. Hey you have some more workload on Monday, stand up another cluster and then we'll have two sets of clusters or maybe you'd want a third cluster, maybe you want a fourth cluster. So you end up with all these different systems which is how they scale. They say, hey, I can have multiple sets of servers access the same data. With Oracle you don't have to even think about those things. We auto scale, you get more workload. We just give it more resources. You don't even have to think about that. And then the other thing is we're looking at the whole data management end to end problem. So starting with capturing the data, moving the data in real time, transforming the data, loading the data, running machine learning and analytics on the data. Putting all kinds of data in a single place that you can do analytics on all of it together. And then having very rich screen capabilities for viewing the data, graphing the data, modeling the data, all those things. So it's all integrated. It makes it super easy to use. So a much easier, much more functionality and much more elastic than any of our competitors in the market. >> Interesting, thank you for those comments. I mean, it's a different world, right? I mean, you guys got all the market share, they got all the growth, those things over time, you've been around, you see it, they come together and you fight it out and may the best approach wins. >> So we'll be watching >> Yeah also I forgot to mention the obvious thing, which is Oracle runs everywhere. So you can run Oracle on premises. You can run Oracle on the public cloud. You can run what we call cloud at customer. Our competitors really are just public cloud only. So you customers don't get the choice of where they want to run their data warehouse. >> Now Juan a while ago I sat down with David foyer and Mark steamer. We reviewed how Gartner looks at the marketplace and it wasn't surprise that when it came to operational workloads, Oracle stood out. I mean, that's kind of an understatement relative to the major competitors. Most of our viewers, I don't think expected for instance Microsoft or AWS to be that far away from you. But at the same time, the database magic quadrant maybe didn't reflect that gap as widely. So there's some dissonance there with the detailed workload drill downs were dramatic. And I wonder what your take on the results. I mean, obviously you're happy with them. You came out leading in virtually every category or you will one and two, and some of that sort of not even non-mission critical operational stuff. But what can you add to my narrative there? >> Yeah, so Gartner, first of all, we're talking about cloud databases. >> Right. >> Right, so this is not on premises databases this is pure cloud databases. And what they did is they did two things. One is, the main thing was a technical rating of the databases, of the cloud databases. And, there's other vendors that have been had database in the cloud for longer than we have. But in the most recent Gartner analysis report, as you mentioned, Oracle came out on top for cloud database technology, in almost every single operational use case including things like Internet of Things, things like JSON data, variable data, analytics as well as a traditional OTP and mixed workloads. So Oracle was rated the highest technology which isn't a big surprise. We've been doing this for decades. Over 90% of the global fortune 500 run Oracle. And there's a reason, because this is what we're good at. This our core strength. Our availability, our security, our scalability, our functionality, both for OTP and analytics. All the capabilities, built-in machine learning, graph analytics, everything. So even when we compare narrowly things like Internet of Things or variable data against niche competitors that that's what all they do. We came up dramatically ahead. But what surprised a lot of people is how far ahead of some of the other cloud vendors like Amazon, like Azure, like Google, Oracle came out ahead in the cloud database category. So a lot of people think, well, some of these other pure cloud vendors must be ahead of Oracle in cloud database. But actually not. I mean, if you look at the Gartner analyst report, it was very clear. It was Oracle was dramatically ahead of their cloud database technologies with our cloud database. >> So I'm pretty much out of time but last question. I've had some interesting discussions lately and we've pointed out for years in our research that of course you're delivering the entire stack, the database, part of the infrastructure the applications, you have the whole engineered system strategy. And for the most part you're kind of unique in this regard. I mean, Dell just announced that it's spinning off VMware and it could have gone the other direction. And become more integrated hardware and software player, for the data center. But look, it's working for Dell based on the reaction, from the street post announcement. Cisco they got a hardware and software model that's sort of integrated but the company's value that peaked back in the .com boom, it's been very slow to bounce back. But my point is for these companies the street doesn't value, the integrated model. Oracle is kind of the exception. You know, it's at trading at all time highs, I know you're not going to comment on the stock price, but I guess in SAP until it missed it guided conservatively, was kind of on the good trajectory. But so I'm wondering, why do you think Oracle strategy resonates with investors, but not so much those companies? Is it, because you have the applications piece? I mean, maybe that's kind of my premise for, for SAP but what's your take? Why is it working for you? >> Well, okay. I think it's pretty simple, which is some of our competitors, for example, they might have a software product and a hardware product. But mostly those are acquired in their separate products that just happen to be in a portfolio. They are not a single company with a single vision and joint engineering going on. It's really, hey, I got the software on over here. I got the hardware over there, but they don't really talk to each other, they don't really work together. They're not trying to develop something where the stack is actually not just integrated but engineered together. And that is really the key. Oracle focuses on data management top to bottom. So we have everything from our ERP, CRM applications talking to our database, talking to our engineered systems, running in our cloud. And it's all completely engineered together. So Oracle doesn't just acquire these things and kind of glue them together. We actually engineer them and that's fundamentally the difference. You can buy two things and have them as two separate divisions in your company but it doesn't really get you a whole lot. >> Juan it's always a pleasure, I love these conversations and hope we can do more in the future. Really appreciate your time. Thanks for coming to the CUBE >> Pleasure, Dave nice to talk to you. >> All right keep it right there, everybody. This is Dave Vellante for theCUBE, we'll see you next time. (upbeat musiC)

Published Date : Apr 21 2021

SUMMARY :

of database technology in the market Thanks, great to see you Dave, Yeah and I hope you have some time about the new service So that's kind of the big new thing of the most basic part to it. but it doesn't offer the complicated in the cloud, Well, so I mean the biggest customers And so one of the things this does, And because the cost is higher, So if you have a lot And that's the key technology is the obvious one, And it also enables the Is that on the roadmap So that's the beauty of GoldenGate, that does for the customer. the pay per use, is you don't have of the question is, I can see GoldenGate So it really expands the market beyond the on-prem that that's kind of... So the bottom line for me and it's really just the of the self-driving data So for example, the ML but you are about applying blockchain. And the use cases, you of the game theory. Once the data is in the marketplace. So the most simplest and straightforward may the best approach wins. You can run Oracle on the public cloud. But at the same time, the Yeah, so Gartner, first of all, of the databases, of the cloud databases. And for the most part you're And that is really the key. Thanks for coming to the CUBE theCUBE, we'll see you next time.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Juan Loaiza	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Juan	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Monday	DATE	0.99+
two things	QUANTITY	0.99+
One problem	QUANTITY	0.99+
Mark steamer	PERSON	0.99+
One benefit	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
OCI	ORGANIZATION	0.99+
fourth cluster	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
one answer	QUANTITY	0.99+
third cluster	QUANTITY	0.99+
one big problem	QUANTITY	0.99+
two big problems	QUANTITY	0.99+
two sets	QUANTITY	0.99+
Coinbase	ORGANIZATION	0.99+
two part	QUANTITY	0.99+
about five years	QUANTITY	0.98+
two big benefits	QUANTITY	0.98+
first company	QUANTITY	0.97+
two separate divisions	QUANTITY	0.97+
Over 90%	QUANTITY	0.97+
GoldenGate	ORGANIZATION	0.97+
second copy	QUANTITY	0.97+
David foyer	PERSON	0.97+
first two pieces	QUANTITY	0.96+
single	QUANTITY	0.96+
two big blockers	QUANTITY	0.96+
single application	QUANTITY	0.96+

George Lumpkin & Neil Mendelson, Oracle | CUBE Conversation, April 2021

(bright upbeat music) >> Hi well, this is Dave Vellante. We're digging deeper into the world of database. You know, there are a lot of ways to skin a cat and different vendors take different approaches and we're reaching out to the technologists to get their perspective on the major trends that they're seeing in the market, 'cause we want to understand the different ways in which you can solve problems. So look, if you have thoughts and the technical chops on this topic, I'd love to interview you. Just ping me at at DVellante, on Twitter, a lot of ways to get ahold of me. Anyway, we recently spoke with Andrew Mendelsohn, who is Oracle's EVP and he's responsible for database server technologies. And we talked a lot about Oracle's ADW, Autonomous Data Warehouse. And we looked at the cloud database strategy that Oracle is taking and the company's plans and how they're different maybe from other solutions in the marketplace, but I wanted to dig deeper. And so today we have two members of Mendelsohn's team on The Cube, and we're going to probe a little bit. George Lumpkin, is the Vice President of Autonomous Data Warehouse. And Neil Mendelson is the VP of Modern Data Warehouse, that business for Oracle. They're both 20-year veterans of Oracle. When I reached out to Steve Savannah, who's a colleague of mine for many years, he's always telling me how great Oracle is relative to the competition. So I said, okay, come on The Cube and talk about this, give me your best people. And he said, whatever these two don't know about cloud data warehouse, it isn't worth knowing anyway. So with that said gentlemen, welcome to The Cube. Thanks so much for coming on. >> Thank you. >> Hey, glad to be here. >> So George, let's start with you. And maybe we could recap for some of the viewers who might not be familiar with the interview that I did with Andy. In your words, what exactly is an Autonomous Data Warehouse? Is this cloud native? Is it an Oracle buzzword? What is it? >> Well, I mean, Autonomous Data Warehouse is Oracle's cloud data warehouse. It's a service that built to allow business users to get more value from their data. That's what the cloud data warehouse market is. Autonomous Data Warehouse is absolutely cloud native. This is a huge misconception that people might have when they first sort of hear about something, this service because they think this is a Oracle database, right? Oracle makes databases. This is the same old database I knew from 10 years ago. And that's absolutely not true. We built a cloud native service or data warehousing built it with cloud features. You know, if your understanding of the cloud data warehouse market is based upon how you thought things look 10 years ago, like Snowflake wouldn't have even existed, right? You can't base your understanding of Oracle based upon that. We have a modern service that's highly elastic, provides cloud capabilities like online patching and it's fully autonomous. It's really built the business users so they don't need to worry about administering their database. >> So I want to come back and actually ask you some questions about that, but let me follow up and talk about some of the evolution of the ADW. And where did you start? I think it was 2018, maybe where you came from, where you are today, maybe you can take us through the technological progression and maybe the path you took to get here. >> And so 2018, was when we released the service and made generally available, but of course, you know we started much earlier than that. And this was started within my product management team, and other organization. So we really sat down with a blank sheet of paper and we said, what should the data warehouse in the cloud look like? You know, let's put aside everything that Oracle does for its on-prem customers and think about how the cloud should be different. And the first thing that we said was, well, you know, if Oracle writes the database software, and Oracle builds its own hardware, and Oracle has created its own cloud, why do we need customers to manage a database? And that's where the idea of autonomous database came from. That Oracle is managing the entire ecosystem. And therefore we built a database that we believe it's far and away the simplest to use simplest data warehouse in the market. And that's been our focus since we started with 2018. And that continues to be our focus, looking at more ways that we can make an Autonomous Data Warehouse as simpler and easier for business users to get more value out of their data. >> Awesome, one more question. And actually Neil, you might want to chime in on this as well. So just from a technical perspective, you know forget the marketing claims and all the BS. How do you compare ADW to the so-called born in the cloud data warehouses? You mentioned Snowflake, you know Redshift, is Redshift born in the cloud. Well, it was par XL but Amazon's done some good work around Redshift. I think big query is maybe probably a better example 'cause it was, you know, like Snowflake started in the cloud but how do you compare ADW to some of these other so-called born in the cloud data warehouses? >> I think part of this, you mentioned Redshift wasn't important in the cloud. It was, you know, a code base taken from a prior company that was on-premise company. So they adapted it to the cloud, right? And you know, we have done, as George said, much of the same, which is, you know, our starting point was not you know another company's code base, but our starting point was our own code base. But as George said, it's less about the starting point and it's more about where you envision the end point, right? Which is that, you know, whatever your starting point is, I think we have a fundamental different view of the endpoint. Amazon talks about how they're literally built for you know, a cloud built for developers, right? You know, builders, right? And you know Oracle wasn't first in the infrastructure business, we entered through applications business. And all of a sudden, you know we began taking on 100s of 1000s and 100s of even more customers that were SAS customers. Underneath was the database and all the infrastructure. One of the things that we took away from that was that we couldn't possibly hire enough people DBA, to manage all the infrastructure below our applications customers. So one of the things that influenced this is that, you know customers expect SAS applications to just take care of themselves, right? So we had to essentially modify the infrastructure to allow it to do so as well, right? And we're bringing that capability to those people who, you know, may or may not have an application, but their interest is, you know more of this self-service agility type of aspect. >> So it seems to me and Georgia was sort of alluding to this before. I mean, when you mentioned Snowflake a couple of times, and then Neil, something you just said, I'm going to pick up on is you've been around for a long time. And you know, when I talked to the Snowflake people, they know Oracle, a lot of them came from Oracle. They understand I think how you can't just build Oracle overnight and build in the capabilities that Oracle has and the recovery. And you talk to customers and you know you are the gold standard of, you know especially mission critical databases, so I get that. But now you just sort of hit on it, is it takes a lot of people and skill to run the database. So that's the problem that you're saying you were attacking, is that, am I getting that right? >> Right, right, so the people that you talked about who originally built Snowflake came from Oracle, but they came from Oracle more than a decade ago. So their context is over a decade old, right? In the meantime, we've been busy, you know building a economies and many other capabilities, right? Their view of Oracle is that view that was back more than 10 years ago, right? They're still adding capability. So a really good example of this illustration is Oracle as you said, it's the most capable system that's out there and has been for many years. We've been focusing on how do we simplify that and how do we use machine learning embedded within the system itself? Because core to the concept of autonomous is that inside, is this machine learning system that's continually improving, right? That's the whole notion. Where in Snowflakes case, they're still adding functionality. Last year, they added masking which you know functionality they didn't have, but when they added the capability, they added it without, you know, the ability for a business user to actually take advantage of it. There's no capability for a business user to actually find the information that needs to be masked. And then after the information is found, you require a technical person to actually implement the mask. In Oracle's case, we've had masking and those capabilities for a long time, our focus was to be able to provide a simple tool that a business user can use that doesn't need technical or security experience. Find the data that needs to be masked PII data, and then hit a button and have it masked for you. So, you know, they're still, you know, without this notion of a strategy to move toward the system to heal itself and to manage itself, they're just going to continue. As they continue to add more capability, they will in turn add more complexity. What we're trying to do is take complexity out while others are adding it in, its an ironic twist. >> It is an ironic twist. It is interesting to look at it. And I don't want to make this about Snowflake. But I mean, Hey, I like what they're doing. I like them. I know the management, they're growing like crazy and you know and the customers tell me, hey, this is really simple. And it's simple by design. I mean, to your point over time it's going to get, you know, more and more complex. I was talking to Andy, I think it was Andy. He was saying, you know, they've got the different sizes you've got to shape some, you know, they call it t-shirt sizes. And I was like, okay, I got a small, I got a medium and a large, maybe that's okay. But you guys would say, we give more granular you know, a scaling, I guess is the point there, right? I mean George, I don't know if you can comment on that. It just a different strategy. You've got a company that was founded well, I guess, 2015 versus one that was founded in 1977. So you would think the latter has, you know way more function than the former, but George, anything you'd add to this conversation? >> Yeah, I mean, I'm always amazed that there are these database systems that are perceived as cloud native and they do things like sell you database sizes by t-shirt sizes, as you described. I mean, if you look at Snowflake, it's small, medium, large extra large too extra large, but they're all factors of two. You're getting a size of your database of two, four, eight, six, 32, et cetera. Or if you look at AWS Redshift, you're buying your database by the nodes. You say, how many nodes do you want? And in both those cases, this is a cloud native. This is saying we have some hardware underneath our database and we need you, Mr. Customer, to tell us how many servers you want. That's not the way the clouds should work, right? And I think this is one of the things that we did with Autonomous Data Warehouse. We said, no, that's not how the rules should work. We still run our database on hardware, we still have nodes and servers. We should tell the customer, how many CPU's you would like for your data warehouse? You want 16? Sounds good. You want 18? Yeah, we can give you 18. We're not, you know, we're not selling these to you in bundles of eight or bundles of six or powers of two. We'll sell you what you need. That's what cloud elasticity should be. Not this idea that oh, we are a database that should be managed by IT. IT already knows about servers and nodes. Therefore it's okay if we tell people your cloud data warehouse runs on nodes. Within Oracle as Neil said, we wouldn't. The data warehouse should be used by the people who want to actually analyze their data, should be used by the business users. >> Well, and so the other piece of cloud native that has become popular, is this idea of separating compute from storage and being able to scale those two independent of each other which is pretty important, right? Because you don't want to have to pay for a chunk of compute if you don't need the storage and vice versa. Maybe you could talk about that, how you solve that problem, to the extent that you solve that problem. >> Absolutely, we do separate compute print storage with Autonomous Data Warehouse. When you come in and you say, I need 10 CPU's for my data warehouse and I need two terabytes of storage. Those are two dependent decisions that you make. So they're not tied together in any way. And, you are exactly right, Dave, this is how things should work in the cloud. You should pay for what you need, pay for what you use, not be constrained by having big sets of storage you have to use for a given amount CPU or vice versa. >> Okay, go ahead Neil, please. >> Oh, just to add on to that, you know, the other aspect that comes into play is that, you know, so your starting point is X, whatever that happens to be. Over time that changes. And we all know that workloads vary right throughout the day throughout the month, throughout the year by various events that occur maybe the close of the year, close of business at the end of the quarter, it maybe you know, holiday season for retailers and so forth. So, you know, it's not only the starting point, but how do you actually manage the growth, right? scaling up and scaling down, right? In our case, we tried, as George said, we abstracted that completely for the customer basically said check a box, which has auto scale. So, if the system is required more resources, will apply more resources. And we do so instantaneously without any downtime whatsoever, right? Because you know, again, you know, people think in terms of these systems have now become business critical. So if the business critical, you can't just shut down to expand. Imagine during the holiday season is your business is ramping up. And then all of a sudden you have to scale, right? And your system either shuts down, reboots itself, right? Or it slows down to the point that it's a crawl and all your customers get frustrated. We don't do that. You click a button, auto scale and we take care of it for you smoothing out those lumps, right? Without any technical assistance. And again, if you look at Redshift, you look at all these various systems, they require technical assistance to be able to figure out not only your initial data, but how you scale out over time. >> Interesting, okay. So all is said, you know, a lot of companies are using Azure, AWS Google for infrastructure, why would these customers not just use their database? Why would they switch to Oracle or ADW? >> Well, I think Neil will probably add something. I want to start by saying a huge number of our existing Autonomous Data Warehouse customers today are customers of AWS and Azure. They are pulling data from AWS and Azure and bringing it into an Oracle Autonomous Data Warehouse. And we built feature Joe, I focused on product managers. We feel featured for that. And so it's perfectly viable and it it's almost commonplace, that the very largest enterprises to be doing that. But then coming to the question of why would they want to do it? I don't know, Neil, you want to take that? >> Yeah, yeah, so one of the things that we've really see emerge here is you know, a data warehouse doesn't generate the transactions on itself, right? So the data has to come from somewhere, right? And you ask yourself, well, where does the data come from? Well, in a lot of cases, that data is coming from applications and increasingly SAS applications that the company has deployed. And those are, you know, HR applications, you know, CRM applications, you know ERP applications and many vertical applications. In Oracle's case, what we've done is we say, okay, well, we have the application, this transactional thing, we have the infrastructure from the economist data warehouse, why don't we just make it really, really easy? And if you're an Oracle applications customer, that's already running on the Oracle cloud, we will essentially provide you the ability to create a data warehouse from that information, right? With a clicker, with largely either with a product and service or quick start kit. You don't start from scratch, you start from where you are. And there are many cases that where you are has data, very much as George mentioned before telcos, banks, insurance companies, governments, all of the data that they want to analyze, a lot of that data guess where it's coming from, it's coming from Oracle applications. So it makes sense to be able to have both the data that's generated and the data that's being analyzed close to the same place. Because at the end of the day, the payoff pitch for any form of analysis is not coming up with an insight, oh, I realized X, Y, Z, but it's rather putting the insight directly into production. And that's where, when you have this stuff spread all over God's greener trying to go from insight into action can take months, if not years. The reason that a lot of customers are now turning to us is that they need to be much more agile and they need to be able to turn that insight into action immediately without it being a science project. >> Okay, thank you for that. So let's tick them off. Like what are the top things that customers can get from Oracle Autonomous Data Warehouse, that they couldn't get from say a Snowflake or Redshift or Big query or SQL server or something yet. I appreciate you guys' willingness to talk about the competition. Let's tick them off. What are the most important things that we should know about that they can't get elsewhere? >> So first, I mean, we already talked about a couple of what we think are really the major themes of Autonomous Data Warehouse. The services is autonomous. You don't need to worry about managing it, anyone can manage the data warehouse. The service is elastic. You can buy and pay for what you use. You know, those are just what we think of as being the general characteristics of Autonomous Data Warehouse. But you know, when you come to your question of, hey, what do we give that other vendors don't provide? And I think the one angle that Autonomous Data Warehouse does a really good job is and Neil was just discussing this, it focuses on the business problems, right? We have years and years of experience with not just database security, but data security, right? You know, every cloud vendor can say, oh we encrypt all your data, we have these compliance certifications, all of these things. And what they're saying is, we are securing your database, we are securing your database infrastructure. At Oracle of course has to do those as well. But where we go further, is we say, hey, no, no, no, no, no, we know what business users want. They want to secure their data. What kind of data am I storing? Do I have PII data? Could you detect whether there's PII data and tell me about it in case some user loaded something that I wasn't aware of? What kind of privileges did I give my users? Can you make sure that those privileges are right? And can you tell me if users were given privileges that they're not using maybe I need to take them away. These are the problems that Oracle's tackled in security over the last 20 years. It's really more about the business problem. Yeah, some other, oh, go ahead. >> Oh, I'm sorry, I got so many questions for you guys. We'll get back to that 'cause it sounds like there's a long list. (laughs) >> We have nowhere to go.(laughs) I want to pick up with George on something you said about elasticity. Is it true pay by the drink? Do you have a consumption pricing? I mean, can I dial it up and dial it down whenever I want? How does that work? >> Yes, I mean not to be too many technical details, but you say, I want 14 CPU's that's what your database runs at. You can change that default number anytime you want online, right? You can say, okay, I'm coming up on my quarter end, I'm going to raise my database 20 CPU. We just do it on the ply. We just adjust the size--- >> What about the other way? What about coming down? Can I go down to one? >> You go down, you can go down to one--- >> And you're not going to charge me for 14 if I go down to one? >> No, if you set it down to one, you get charged for one, right? >> Okay, that's good, that's good. >> In the background, you know we are also allowing levels of auto scaling. You say, if you say hey, I want to charged for 14 and Oracle, can you take care of all those scaling for me? So if a bunch of people jump on at 5:00 PM, to run some queries, 'cause the executive said, hey, I need a report by tomorrow morning. We'll take care of that for you. We'll let you go beyond 14 and only charge you for exactly what you use for those extra CPU's beyond 14. >> Okay, thank you. Go ahead, Neil. >> And maybe, if we add, you know, Andy talked about this when he was on that show with you last week, right? And you know, he talked about this concept of a converged database, but let me talk about it in the way that we see it from a business point of view, right? You know, business users are looking to, you know ask a variety of questions, right? And those questions need to be able to relate to both you know, the customer themselves, the relationship that the customer might have with others. You know, today we talk about like the social network and who are influencers within that, and then where they actually conduct business. Which is really, you know, in every case, it's on some form of increasingly on a mobile device. So in that case, you want to be able to ask questions, which is not only, you know, who should I focus on, but who are the key influencers within this community, right? That could influence others? And does that happen in a particular place in time? Meaning, you know, let's say pre COVID, it might happen at a coffee shop or somewhere else. We can answer all of those questions and more inside of the autonomous system without having to replicate the data out to one system that does graph and another system that does spatial, a third system that does this. It's like a business user. It's like, wait a minute, come on, you're trying to tell me that I need a separate system and replicate the data just be able to understand location? The answer in many cases is yes, you have to have separate, which a business person says, well, that's absurd. Can't I just do this all in one system? You can with Oracle. >> So look, I'm not trying to be the snarky journalist or analyst here but I want to keep pushing on this issue. So here we are, it's 2021. It's April. We're like a third of the way through the year. And so far, nobody has come out and said, okay, we're going to deliver Autonomous Data Warehouse just like Oracle. So I asked myself, well, why is Oracle doing this? You guys answered, you know, to reduce the labor cost. But I asked myself, is this how they're solving the problem of keeping relevant a database that spans five decades? And you guys said, no, no, this is cloud native born in the cloud, you know started essentially with a new mindset. But is this a trend that others are going to follow? You know, and if so, why haven't we seen it this idea of a self-driving databases? Why is it right now unique to Oracle? What's really going on here? >> So I think there's a really interesting thing that's happening, it's not visible outside of Oracle. It's very visible for those of us who work inside of the development organization. You know, if you look at Oracle, I can tell you bad. I mean, I think it's safe to presume Oracle has the largest database development organization on the planet, right? I mean, it was kind of the largest database or large most used database for the past two decades. And what's happened is we pivoted to building a cloud platform. We're not just building a database, we're taking all of these resources that we have with all these expertise of building database software. We were saying, we now have to build the platform to run and manage the database software in the cloud, right? And it's a little bit like, you know I think to make people relate to it a little better, there was a really good quote from Elon Musk couple of years ago, talking about Tesla. Like everyone looks at the car, right? Tesla, the car is really great. The hard part of this, is building the factory, and that's analogy holds for Oracle. What we're building is the cloud battery. And what we have transitioned is our database development organization is now building as robust a cloud as possible. So that you know, when we increase the number of databases by 10 X, we don't add 10 X, more cloud ops people to manage it. We are ramping up developer building features to automate the management of our cloud infrastructure. And with that automation, we get better ability, less errors, more security. We give benefits to our cloud data warehouse customers with it. And I think this something really important to realize, right? We build database software. We build, you know, an engineered system built for databases called exit data, and we build a cloud platform. And these are really equal tiers in what we are building and developing today in 2021 from Oracle database development organization. >> Well, you mentioned exit data, I want to shift gears here a little bit and talk about we're seeing this hybrid cloud on-premises clouds, they're finally gaining some traction. I got to give props Oracle's cloud of customers really the early to that game. I think it was the first in my view anyway, true same same vision, took you guys a little while to get there but it was the right vision. And the thing I always say about Oracle people don't understand is Oracle invest in R and D, your chairman is also the CTO. You guys are serious about technical investment so you know, that's where innovation comes from. But, and we heard during your recent earnings call, we heard some positive comments on this. So what's your take on delivering autonomous data warehouse on-prem and how do you compare with say Snowflake and AWS in that area? Snowflake, Frank Slootman, I've had him on record saying we're not going to do that halfway house. Forget it, we are always going to be in the cloud. We're never going to do an on-prem installation. AWS, we'll see to date. Yeah, I don't think you can get a Redshift for instance in outposts, but maybe that'll come. But, how do you see that emerging? What's your difference there? Maybe Neil, you could talk about that. >> Yeah, so, you know, I think, you know, customers had a lot of regulated industries, right? Still have concerns about the public cloud. And I think that when you hear statements like, you know, we're never going to do, you know, on-prem. Well, economist cloud at customer, it's not a classic on-prem solution. What it is, it's a piece of our cloud delivered in your data center. It's still the cloud software. Oracle manages it, Oracle, you know, the system itself manages itself and we take care of that responsibility so you don't have to. The differences is that we can make that available in a public cloud as well as in a private cloud, right? And there are so many use cases, you know, that you can imagine from a regulatory point of view, or just from a comfort point of view, where customers are choosing, they want the ability to decide for themselves where to place this stuff as compared to only having one option, right? And you know, you look at a lot of what's happening in the emerging world where, you know, there are a lot of places in the world that may not have, you know, really really high-speed internet connections to make, you know a public cloud feasible. Well, in that case, whether you're talking about, you know an oil rig or you're talking about something else, right? We can put that capability where it needs to be close to the operation that you're talking about, irrespective of the deployment option. >> Well, let me just follow up on that because I think it's interesting that, you know Frank Slootman said that to me, I oftentimes around AWS I say, never say never 'cause they'll surprise you, right? And I've learned that with Andy Jassy, but one of the things that seems difficult for on-prem, would be to separate that compute from storage because you have to actually physically move in resources. I think about Vertica Xeon mode. It's not quite the same, same. So, I mean, in that regard, maybe you're not the same same. And maybe that dogma makes sense for some companies. For Oracle, obviously you've got a huge on-prem state, thoughts on that. >> So, you know, clearly, you know, so typically what we'll do is that we'll provide additional hardware beyond what the customer might expect and that allows them to use the capabilities of expansion, right? We also have the ability to allow the customer to expand from their cloud of customer into the public cloud as well, of which we have a lot of those situations. So we can provide a level of elasticity, even on-premises by over provisioning the systems, well not charging the customer until they use only based on what they consume, right? Combined together with the ability for us to augment their usage in the public cloud as well, right? Where others, again are constraint, right? Because they only have a single option. >> Right, well, you've got the capital resources to do that as well which is not to be overlooked. Okay, I mean, I've blown our time here but you guys are so awesome. (laughs) I appreciate the candor. So last question and George, if you want to throw in a couple of those other tick boxes, you know the differentiators, please feel free, but for both of you, if you can leave customers with the one key point or the top key points on how Oracle Autonomous Data Warehouse can really help them improve their business in the near term, what would they be? Maybe George, you could start and then Neil you bring us home. >> Yeah, I mean, I think that, as I said before, our starting point with Autonomous Data Warehouse, is how can we build a better customer experience in the cloud? And I think, and this continues throughout 2021, and I think that the big theme here is the business users should be able to get value directly from their data warehouses. We talked a few times about how a line of business user should be able to manage their own data, should be able to load their own data warehouse, should be able to start to work with their own data, should be able to run machine learning, model of build machine learning, models against that data and all of that built in, and delivered in Autonomous Data Warehouse. And we think that this is, you know we see our customer organizations large and small, the light bulbs starting to go on how easy the services to use to and how completed it is for helping business users get value from their data. And just adding onto what George said, you know, the development organization has done a tremendous job of really simplifying this cooperation. What we also tried to do that on the business side. You know, when a customer has an on-prem situation, they're looking at moving to the cloud, whether lift and shift or modernized, they're looking at costs, they're looking at risk and they're looking at time. So one of the things we look at is how do we mitigate that? How do we mitigate the cost, the risk and the time? Well, this week, I think we announced our new cloud lift program and the cloud lift program is what Oracle will provide to its cloud engineering resources around the world is that we will do, we will take the cost, the risk and the time out of the equation and Oracle will work directly with the customer or the customer's partner of choice, maybe an Accenture or Deloitte, and we will move them, right? You know, at little or no cost, most cases there's no cost whatsoever, right? We mitigate the risk because we're taking the risk on. And we've built a lot of automated tools to make that go very quickly, right? And securely, and then finally, we do it in a very very short amount of time as compared to what you would need to do with, you know 'cause there is no Redshift on-premises. There is no Snowflake on-premises. You have to convert from what you already have to that, right? And, but the company beyond the technological barriers that George talked about were also trying to smooth the operation so that a business itself can make a decision that not only did they not need the technical people to operate it, they won't need an entire consulting contract with millions of dollars in order to actually do the movement to the cloud. >> Well, guys, I really appreciate you coming on the program and again, your candor to speak openly about you know, your approach, the competitors. And so it's great having you, really really thank you for, for your time. >> Appreciate it. >> And thank you for watching everybody. Look, if you guys want to come back, go toe to toe with these guys, say the word you're always welcome to come on The Cube. One thing for sure, Oracle are serious, when it comes to database. Thank you for watching. This is Dave Vellante. We'll see you next time. (bright music)

Published Date : Apr 7 2021

SUMMARY :

And Neil Mendelson is the for some of the viewers of the cloud data warehouse and maybe the path you took to get here. And the first thing that we And actually Neil, you might want to chime And you know, we have And you know, when I talked In the meantime, we've been busy, you know it's going to get, you know, not selling these to you to the extent that you solve that problem. decisions that you make. Oh, just to add on to that, you know, So all is said, you know, I don't know, Neil, you want to take that? And those are, you know, HR applications, I appreciate you guys' And can you tell me if many questions for you guys. George on something you said but you say, I want 14 CPU's In the background, you Okay, thank you. And maybe, if we add, you know, born in the cloud, you So that you know, when we really the early to that game. And I think that when you hear interesting that, you know We also have the ability to you know the differentiators, And we think that this is, you know speak openly about you know, And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
George	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Andrew Mendelsohn	PERSON	0.99+
Neil	PERSON	0.99+
Neil Mendelson	PERSON	0.99+
Dave	PERSON	0.99+
George Lumpkin	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Deloitte	ORGANIZATION	0.99+
Steve Savannah	PERSON	0.99+
1977	DATE	0.99+
AWS	ORGANIZATION	0.99+
Frank Slootman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
2015	DATE	0.99+
Andy Jassy	PERSON	0.99+
2018	DATE	0.99+
April	DATE	0.99+
100s	QUANTITY	0.99+
5:00 PM	DATE	0.99+
April 2021	DATE	0.99+
tomorrow morning	DATE	0.99+
Tesla	ORGANIZATION	0.99+
10 CPU	QUANTITY	0.99+
Last year	DATE	0.99+
Oracle Autonomous Data Warehouse	ORGANIZATION	0.99+

Breaking Analysis: Unpacking Oracle’s Autonomous Data Warehouse Announcement

(upbeat music) >> On February 19th of this year, Barron's dropped an article declaring Oracle, a cloud giant and the article explained why the stock was a buy. Investors took notice and the stock ran up 18% over the next nine trading days and it peaked on March 9th, the day before Oracle announced its latest earnings. The company beat consensus earnings on both top-line and EPS last quarter, but investors, they did not like Oracle's tepid guidance and the stock pulled back. But it's still, as you can see, well above its pre-Barron's article price. What does all this mean? Is Oracle a cloud giant? What are its growth prospects? Now many parts of Oracle's business are growing including Fusion ERP, Fusion HCM, NetSuite, we're talking deep into the double digits, 20 plus percent growth. It's OnPrem legacy licensed business however, continues to decline and that moderates, the overall company growth because that OnPrem business is so large. So the overall Oracle's growing in the low single digits. Now what stands out about Oracle is it's recurring revenue model. That figure, the company says now it represents 73% of its revenue and that's going to continue to grow. Now two other things stood out on the earnings call to us. First, Oracle plans on increasing its CapEX by 50% in the coming quarter, that's a lot. Now it's still far less than AWS Google or Microsoft Spend on capital but it's a meaningful data point. Second Oracle's consumption revenue for Autonomous Database and Cloud Infrastructure, OCI or Oracle Cloud Infrastructure grew at 64% and 139% respectively and these two factors combined with the CapEX Spend suggest that the company has real momentum. I mean look, it's possible that the CapEx announcements maybe just optics in they're front loading, some spend to show the street that it's a player in cloud but I don't think so. Oracle's Safra Catz's usually pretty disciplined when it comes to it's spending. Now today on March 17th, Oracle announced updates towards Autonomous Data Warehouse and with me is David Floyer who has extensively researched Oracle over the years and today we're going to unpack the Oracle Autonomous Data Warehouse, ADW announcement. What it means to customers but we also want to dig into Oracle's strategy. We want to compare it to some other prominent database vendors specifically, AWS and Snowflake. David Floyer, Welcome back to The Cube, thanks for making some time for me. >> Thank you Vellante, great pleasure to be here. >> All right, I want to get into the news but I want to start with this idea of the autonomous database which Oracle's announcement today is building on. Oracle uses the analogy of a self-driving car. It's obviously powerful metaphor as they call it the self-driving database and my takeaway is that, this means that the system automatically provisions, it upgrades, it does all the patching for you, it tunes itself. Oracle claims that all reduces labor costs or admin costs by 90%. So I ask you, is this the right interpretation of what Oracle means by autonomous database? And is it real? >> Is that the right interpretation? It's a nice analogy. It's a test to that analogy, isn't it? I would put it as the first stage of the Autonomous Data Warehouse was to do the things that you talked about, which was the tuning, the provisioning, all of that sort of thing. The second stage is actually, I think more interesting in that what they're focusing on is making it easy to use for the end user. Eliminating the requirement for IT, staff to be there to help in the actual using of it and that is a very big step for them but an absolutely vital step because all of the competition focusing on ease of use, ease of use, ease of use and cheapness of being able to manage and deploy. But, so I think that is the really important area that Oracle has focused on and it seemed to have done so very well. >> So in your view, is this, I mean you don't really hear a lot of other companies talking about this analogy of the self-driving database, is this unique? Is it differentiable for Oracle? If so, why, or maybe you could help us understand that a little bit better. >> Well, the whole strategy is unique in its breadth. It has really brought together a whole number of things together and made it of its type the best. So it has a single, whole number of data sources and database types. So it's got a very broad range of different ways that you can look at the data and the second thing that is also excellent is it's a platform. It is fully self provisioned and its functionality is very, very broad indeed. The quality of the original SQL and the query languages, etc, is very, very good indeed and it's a better agent to do joints for example, is excellent. So all of the building blocks are there and together with it's sharing of the same data with OLTP and inference and in memory data paces as well. All together the breadth of what they have is unique and very, very powerful. >> I want to come back to this but let's get into the news a little bit and the announcement. I mean, it seems like what's new in the autonomous data warehouse piece for Oracle's new tooling around four areas that so Andy Mendelsohn, the head of this group instead of the guy who releases his baby, he talked about four things. My takeaway, faster simpler loads, simplified transforms, autonomous machine learning models which are facilitating, What do you call it? Citizen data science and then faster time to insights. So tooling to make those four things happen. What's your take and takeaways on the news? >> I think those are all correct. I would add the ease of use in terms of being able to drag and drop, the user interface has been dramatically improved. Again, I think those, strategically are actually more important that the others are all useful and good components of it but strategically, I think is more important. There's ease of use, the use of apex for example, are more important. And, >> Why are they more important strategically? >> Because they focus on the end users capability. For example, one of other things that they've started to introduce is Python together with their spatial databases, for example. That is really important that you reach out to the developer as they are and what tools they want to use. So those type of ease of use things, those types of things are respecting what the end users use. For example, they haven't come out with anything like click or Tableau. They've left that there for that marketplace for the end user to use what they like best. >> Do you mean, they're not trying to compete with those two tools. They indeed had a laundry list of stuff that they supported, Talend, Tableau, Looker, click, Informatica, IBM, I had IBM there. So their claim was, hey, we're open. But so that's smart. That's just, hey, they realized that people use these tools. >> I'm trying to exclude other people, be a platform and be an ecosystem for the end users. >> Okay, so Mendelsohn who made the announcement said that Oracle's the smartphone of databases and I think, I actually think Alison kind of used that or maybe that was us planing to have, I thought he did like the iPhone of when he announced the exit data way back when the integrated hardware and software but is that how you see it, is Oracle, the smartphone of databases? >> It is, I mean, they are trying to own the complete stack, the hardware with the exit data all the way up to the databases at the data warehouses and the OLTP databases, the inference databases. They're trying to own the complete stack from top to bottom and that's what makes autonomy process possible. You can make it autonomous when you control all of that. Take away all of the requirements for IT in the business itself. So it's democratizing the use of data warehouses. It is pushing it out to the lines of business and it's simplifying it and making it possible to push out so that they can own their own data. They can manage their own data and they do not need an IT person from headquarters to help them. >> Let's stay in this a little bit more and then I want to go into some of the competitive stuff because Mendelsohn mentioned AWS several times. One of the things that struck me, he said, hey, we're basically one API 'cause we're doing analytics in the cloud, we're doing data in the cloud, we're doing integration in the cloud and that's sort of a big part of the value proposition. He made some comparisons to Redshift. Of course, I would say, if you can't find a workload where you beat your big competitor then you shouldn't be in this business. So I take those things with a grain of salt but one of the other things that caught me is that migrating from OnPrem to Oracle, Oracle Cloud was very simple and I think he might've made some comparisons to other platforms. And this to me is important because he also brought in that Gartner data. We looked at that Gardner data when they came out with it in the operational database class, Oracle smoked everybody. They were like way ahead and the reason why I think that's important is because let's face it, the Mission Critical Workloads, when you look at what's moving into AWS, the Mission Critical Workloads, the high performance, high criticality OLTP stuff. That's not moving in droves and you've made the point often that companies with their own cloud particularly, Oracle you've mentioned this about IBM for certain, DB2 for instance, customers are going to, there should be a lower risk environment moving from OnPrem to their cloud, because you could do, I don't think you could get Oracle RAC on AWS. For example, I don't think EXIF data is running in AWS data centers and so that like component is going to facilitate migration. What's your take on all that spiel? >> I think that's absolutely right. You all crown Jewels, the most expensive and the most valuable applications, the mission-critical applications. The ones that have got to take a beating, keep on taking. So those types of applications are where Oracle really shines. They own a very large high percentage of those Mission Critical Workloads and you have the choice if you're going to AWS, for example of either migrating to Oracle on AWS and that is frankly not a good fit at all. There're a lot of constraints to running large systems on AWS, large mission critical systems. So that's not an option and then the option, of course, that AWS will push is move to a Roller, change your way of writing applications, make them tiny little pieces and stitch them all together with microservices and that's okay if you're a small organization but that has got a lot of problems in its own, right? Because then you, the user have to stitch all those pieces together and you're responsible for testing it and you're responsible for looking after it. And that as you grow becomes a bigger and bigger overhead. So AWS, in my opinion needs to have a move towards a tier-one database of it's own and it's not in that position at the moment. >> Interesting, okay. So, let's talk about the competitive landscape and the choices that customers have. As I said, Mendelssohn mentioned AWS many times, Larry on the calls often take shy, it's a compliment to me. When Larry Ellison calls you out, that means you've made it, you're doing well. We've seen it over the years, whether it's IBM or Workday or Salesforce, even though Salesforce's big Oracle customer 'cause AWS, as we know are Oracle customer as well, even though AWS tells us they've off called when you peel the onion >> Five years should be great, some of the workers >> Well, as I said, I believe they're still using Oracle in certain workloads. Way, way, we digress. So AWS though, they take a different approach and I want to push on this a little bit with database. It's got more than a dozen, I think purpose-built databases. They take this kind of right tool for the right job approach was Oracle there converging all this function into a single database. SQL JSON graph databases, machine learning, blockchain. I'd love to talk about more about blockchain if we have time but seems to me that the right tool for the right job purpose-built, very granular down to the primitives and APIs. That seems to me to be a pretty viable approach versus kind of a Swiss Army approach. How do you compare the two? >> Yes, and it is to many initial programmers who are very interested for example, in graph databases or in time series databases. They are looking for a cheap database that will do the job for a particular project and that makes, for the program or for that individual piece of work is making a very sensible way of doing it and they pay for ads on it's clear cloud dynamics. The challenge as you have more and more data and as you're building up your data warehouse in your data lakes is that you do not want to have to move data from one place to another place. So for example, if you've got a Roller,, you have to move the database and it's a pretty complicated thing to do it, to move it to Redshift. It's a five or six steps to do that and each of those costs money and each of those take time. More importantly, they take time. The Oracle approach is a single database in terms of all the pieces that obviously you have multiple databases you have different OLTP databases and data warehouse databases but as a single architecture and a single design which means that all of the work in terms of moving stuff from one place to another place is within Oracle itself. It's Oracle that's doing that work for you and as you grow, that becomes very, very important. To me, very, very important, cost saving. The overhead of all those different ones and the databases themselves originate with all as open source and they've done very well with it and then there's a large revenue stream behind the, >> The AWS, you mean? >> Yes, the original database is in AWS and they've done a lot of work in terms of making it set with the panels, etc. But if a larger organization, especially very large ones and certainly if they want to combine, for example data warehouse with the OLTP and the inference which is in my opinion, a very good thing that they should be trying to do then that is incredibly difficult to do with AWS and in my opinion, AWS has to invest enormously in to make the whole ecosystem much better. >> Okay, so innovation required there maybe is part of the TAM expansion strategy but just to sort of digress for a second. So it seems like, and by the way, there are others that are doing, they're taking this converged approach. It seems like that is a trend. I mean, you certainly see it with single store. I mean, the name sort of implies that formerly MemSQL I think Monte Zweben of splice machine is probably headed in a similar direction, embedding AI in Microsoft's, kind of interesting. It seems like Microsoft is willing to build this abstraction layer that hides that complexity of the different tooling. AWS thus far has not taken that approach and then sort of looking at Snowflake, Snowflake's got a completely different, I think Snowflake's trying to do something completely different. I don't think they're necessarily trying to take Oracle head-on. I mean, they're certainly trying to just, I guess, let's talk about this. Snowflake simplified EDW, that's clear. Zero to snowflake in 90 minutes. It's got this data cloud vision. So you sign on to this Snowflake, speaking of layers they're abstracting the complexity of the underlying cloud. That's what the data cloud vision is all about. They, talk about this Global Mesh but they've not done a good job of explaining what the heck it is. We've been pushing them on that, but we got, >> Aspiration of moment >> Well, I guess, yeah, it seems that way. And so, but conceptually, it's I think very powerful but in reality, what snowflake is doing with data sharing, a lot of reading it's probably mostly read-only and I say, mostly read-only, oh, there you go. You'll get better but it's mostly read and so you're able to share the data, it's governed. I mean, it's exactly, quite genius how they've implemented this with its simplicity. It is a caching architecture. We've talked about that, we can geek out about that. There's good, there's bad, there's ugly but generally speaking, I guess my premise here I would love your thoughts. Is snowflakes trying to do something different? It's trying to be not just another data warehouse. It's not just trying to compete with data lakes. It's trying to create this data cloud to facilitate data sharing, put data in the hands of business owners in terms of a product build, data product builders. That's a different vision than anything I've seen thus far, your thoughts. >> I agree and even more going further, being a place where people can sell data. Put it up and make it available to whoever needs it and making it so simple that it can be shared across the country and across the world. I think it's a very powerful vision indeed. The challenge they have is that the pieces at the moment are very, very easy to use but the quality in terms of the, for example, joints, I mentioned, the joints were very powerful in Oracle. They don't try and do joints. They, they say >> They being Snowflake, snowflake. Yeah, they don't even write it. They would say use another Postgres >> Yeah. >> Database to do that. >> Yeah, so then they have a long way to go. >> Complex joints anyway, maybe simple joints, yeah. >> Complex joints, so they have a long way to go in terms of the functionality of their product and also in my opinion, they sure be going to have more types of databases inside it, including OLTP and they can do that. They have obviously got a great market gap and they can do that by acquisition as well as they can >> They've started. I think, I think they support JSON, right. >> Do they support JSON? And graph, I think there's a graph database that's either coming or it's there, I can't keep all that stuff in my head but there's no reason they can't go in that direction. I mean, in speaking to the founders in Snowflake they were like, look, we're kind of new. We would focus on simple. A lot of them came from Oracle so they know all database and they know how hard it is to do things like facilitate complex joints and do complex workload management and so they said, let's just simplify, we'll put it in the cloud and it will spin up a separate data warehouse. It's a virtual data warehouse every time you want one to. So that's how they handle those things. So different philosophy but again, coming back to some of the mission critical work and some of the larger Oracle customers, they said they have a thousand autonomous database customers. I think it was autonomous database, not ADW but anyway, a few stood out AON, lift, I think Deloitte stood out and as obviously, hundreds more. So we have people who misunderstand Oracle, I think. They got a big install base. They invest in R and D and they talk about lock-in sure but the CIO that I talked to and you talked to David, they're looking for business value. I would say that 75 to 80% of them will gravitate toward business value over the fear of lock-in and I think at the end of the day, they feel like, you know what? If our business is performing, it's a better business decision, it's a better business case. >> I fully agree, they've been very difficult to do business with in the past. Everybody's in dread of the >> The audit. >> The knock on the door from the auditor. >> Right. >> And that from a purchasing point of view has been really bad experience for many, many customers. The users of the database itself are very happy indeed. I mean, you talk to them and they understand why, what they're paying for. They understand the value and in terms of availability and all of the tools for complex multi-dimensional types of applications. It's pretty well, the only game in town. It's only DB2 and SQL that had any hope of doing >> Doing Microsoft, Microsoft SQL, right. >> Okay, SQL >> Which, okay, yeah, definitely competitive for sure. DB2, no IBM look, IBM lost its dominant position in database. They kind of seeded that. Oracle had to fight hard to win it. It wasn't obvious in the 80s who was going to be the database King and all had to fight. And to me, I always tell people the difference is that the chairman of Oracle is also the CTO. They spend money on R and D and they throw off a ton of cash. I want to say something about, >> I was just going to make one extra point. The simplicity and the capability of their cloud versions of all of this is incredibly good. They are better in terms of spending what you need or what you use much better than AWS, for example or anybody else. So they have really come full circle in terms of attractiveness in a cloud environment. >> You mean charging you for what you consume. Yeah, Mendelsohn talked about that. He made a big point about the granularity, you pay for only what you need. If you need 33 CPUs or the other databases you've got to shape, if you need 33, you've got to go to 64. I know that's true for everyone. I'm not sure if that's true too for snowflake. It may be, I got to dig into that a little bit, but maybe >> Yes, Snowflake has got a front end to hiding behind. >> Right, but I didn't want to push it that a little bit because I want to go look at their pricing strategies because I still think they make you buy, I may be wrong. I thought they make you still do a one-year or two-year or three-year term. I don't know if you can just turn it off at any time. They might allow, I should hold off. I'll do some more research on that but I wanted to make a point about the audits, you mentioned audits before. A big mistake that a lot of Oracle customers have made many times and we've written about this, negotiating with Oracle, you've got to bring your best and your brightest when you negotiate with Oracle. Some of the things that people didn't pay attention to and I think they've sort of caught onto this is that Oracle's SOW is adjudicate over the MSA, a lot of legal departments and procurement department. Oh, do we have an MSA? With all, Yes, you do, okay, great and because they think the MSA, they then can run. If they have an MSA, they can rubber stamp it but the SOW really dictateS and Oracle's gotcha there and they're really smart about that. So you got to bring your best and the brightest and you've got to really negotiate hard with Oracle, you get trouble. >> Sure. >> So it is what it is but coming back to Oracle, let's sort of wrap on this. Dominant position in mission critical, we saw that from the Gartner research, especially for operational, giant customer base, there's cloud-first notion, there's investing in R and D, open, we'll put a question Mark around that but hey, they're doing some cool stuff with Michael stuff. >> Ecosystem, I put that, ecosystem they're promoting their ecosystem. >> Yeah, and look, I mean, for a lot of their customers, we've talked to many, they say, look, there's actually, a tail at the tail way, this saves us money and we don't have to migrate. >> Yeah. So interesting, so I'll give you the last word. We started sort of focusing on the announcement. So what do you want to leave us with? >> My last word is that there are platforms with a certain key application or key parts of the infrastructure, which I think can differentiate themselves from the Azures or the AWS. and Oracle owns one of those, SAP might be another one but there are certain platforms which are big enough and important enough that they will, in my opinion will succeed in that cloud strategy for this. >> Great, David, thanks so much, appreciate your insights. >> Good to be here. Thank you for watching everybody, this is Dave Vellante for The Cube. We'll see you next time. (upbeat music)

Published Date : Mar 17 2021

SUMMARY :

and that moderates, the great pleasure to be here. that the system automatically and it seemed to have done so very well. So in your view, is this, I mean and the second thing and the announcement. that the others are all useful that they've started to of stuff that they supported, and be an ecosystem for the end users. and the OLTP databases, and the reason why I and the most valuable applications, and the choices that customers have. for the right job approach was and that makes, for the program OLTP and the inference that complexity of the different tooling. put data in the hands of business owners that the pieces at the moment Yeah, they don't even write it. Yeah, so then they Complex joints anyway, and also in my opinion, they sure be going I think, I think they support JSON, right. and some of the larger Everybody's in dread of the and all of the tools is that the chairman of The simplicity and the capability He made a big point about the granularity, front end to hiding behind. and because they think the but coming back to Oracle, Ecosystem, I put that, ecosystem Yeah, and look, I mean, on the announcement. and important enough that much, appreciate your insights. Good to be here.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Mendelsohn	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
March 9th	DATE	0.99+
February 19th	DATE	0.99+
five	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.99+
75	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Mendelssohn	PERSON	0.99+
two	QUANTITY	0.99+
each	QUANTITY	0.99+
90%	QUANTITY	0.99+
one-year	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
73%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
two tools	QUANTITY	0.99+
Michael	PERSON	0.99+
64%	QUANTITY	0.99+
two factors	QUANTITY	0.99+
more than a dozen	QUANTITY	0.99+
last quarter	DATE	0.99+
SQL	TITLE	0.99+

Marc Staimer, Dragon Slayer Consulting & David Floyer, Wikibon | December 2020

>> Announcer: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is theCUBE conversation. >> Hi everyone, this is Dave Vellante and welcome to this CUBE conversation where we're going to dig in to this, the area of cloud databases. And Gartner just published a series of research in this space. And it's really a growing market, rapidly growing, a lot of new players, obviously the big three cloud players. And with me are three experts in the field, two long time industry analysts. Marc Staimer is the founder, president, and key principal at Dragon Slayer Consulting. And he's joined by David Floyer, the CTO of Wikibon. Gentlemen great to see you. Thanks for coming on theCUBE. >> Good to be here. >> Great to see you too Dave. >> Marc, coming from the great Northwest, I think first time on theCUBE, and so it's really great to have you. So let me set this up, as I said, you know, Gartner published these, you know, three giant tomes. These are, you know, publicly available documents on the web. I know you guys have been through them, you know, several hours of reading. And so, night... (Dave chuckles) Good night time reading. The three documents where they identify critical capabilities for cloud database management systems. And the first one we're going to talk about is, operational use cases. So we're talking about, you know, transaction oriented workloads, ERP financials. The second one was analytical use cases, sort of an emerging space to really try to, you know, the data warehouse space and the like. And, of course, the third is the famous Gartner Magic Quadrant, which we're going to talk about. So, Marc, let me start with you, you've dug into this research just at a high level, you know, what did you take away from it? >> Generally, if you look at all the players in the space they all have some basic good capabilities. What I mean by that is ultimately when you have, a transactional or an analytical database in the cloud, the goal is not to have to manage the database. Now they have different levels of where that goes to as how much you have to manage or what you have to manage. But ultimately, they all manage the basic administrative, or the pedantic tasks that DBAs have to do, the patching, the tuning, the upgrading, all of that is done by the service provider. So that's the number one thing they all aim at, from that point on every database has different capabilities and some will automate a whole bunch more than others, and will have different primary focuses. So it comes down to what you're looking for or what you need. And ultimately what I've learned from end users is what they think they need upfront, is not what they end up needing as they implement. >> David, anything you'd add to that, based on your reading of the Gartner work. >> Yes. It's a thorough piece of work. It's taking on a huge number of different types of uses and size of companies. And I think those are two parameters which really change how companies would look at it. If you're a Fortune 500 or Fortune 2000 type company, you're going to need a broader range of features, and you will need to deal with size and complexity in a much greater sense, and a lot of probably higher levels of availability, and reliability, and recoverability. Again, on the workload side, there are different types of workload and there're... There is as well as having the two transactional and analytic workloads, I think there's an emerging type of workload which is going to be very important for future applications where you want to combine transactional with analytic in real time, in order to automate business processes at a higher level, to make the business processes synchronous as opposed to asynchronous. And that degree of granularity, I think is missed, in a broader view of these companies and what they offer. It's in my view trying in some ways to not compare like with like from a customer point of view. So the very nuance, what you talked about, let's get into it, maybe that'll become clear to the audience. So like I said, these are very detailed research notes. There were several, I'll say analysts cooks in the kitchen, including Henry Cook, whom I don't know, but four other contributing analysts, two of whom are CUBE alum, Don Feinberg, and Merv Adrian, both really, you know, awesome researchers. And Rick Greenwald, along with Adam Ronthal. And these are public documents, you can go on the web and search for these. So I wonder if we could just look at some of the data and bring up... Guys, bring up the slide one here. And so we'll first look at the operational side and they broke it into four use cases. The traditional transaction use cases, the augmented transaction processing, stream/event processing and operational intelligence. And so we're going to show you there's a lot of data here. So what Gartner did is they essentially evaluated critical capabilities, or think of features and functions, and gave them a weighting, or a weighting, and then a rating. It was a weighting and rating methodology. On a s... The rating was on a scale of one to five, and then they weighted the importance of the features based on their assessment, and talking to the many customers they talk to. So you can see here on the first chart, we're showing both the traditional transactions and the augmented transactions and, you know, the thing... The first thing that jumps out at you guys is that, you know, Oracle with Autonomous is off the charts, far ahead of anybody else on this. And actually guys, if you just bring up slide number two, we'll take a look at the stream/event processing and operational intelligence use cases. And you can see, again, you know, Oracle has a big lead. And I don't want to necessarily go through every vendor here, but guys, if you don't mind going back to the first slide 'cause I think this is really, you know, the core of transaction processing. So let's look at this, you've got Oracle, you've got SAP HANA. You know, right there interestingly Amazon Web Services with the Aurora, you know, IBM Db2, which, you know, it goes back to the good old days, you know, down the list. But so, let me again start with Marc. So why is that? I mean, I guess this is no surprise, Oracle still owns the Mission-Critical for the database space. They earned that years ago. One that, you know, over the likes of Db2 and, you know, Informix and Sybase, and, you know, they emerged as number one there. But what do you make of this data Marc? >> If you look at this data in a vacuum, you're looking at specific functionality, I think you need to look at all the slides in total. And the reason I bring that up is because I agree with what David said earlier, in that the use case that's becoming more prevalent is the integration of transaction and analytics. And more importantly, it's not just your traditional data warehouse, but it's AI analytics. It's big data analytics. It's users are finding that they need more than just simple reporting. They need more in-depth analytics so that they can get more actionable insights into their data where they can react in real time. And so if you look at it just as a transaction, that's great. If you're going to just as a data warehouse, that's great, or analytics, that's fine. If you have a very narrow use case, yes. But I think today what we're looking at is... It's not so narrow. It's sort of like, if you bought a streaming device and it only streams Netflix and then you need to get another streaming device 'cause you want to watch Amazon Prime. You're not going to do that, you want one, that does all of it, and that's kind of what's missing from this data. So I agree that the data is good, but I don't think it's looking at it in a total encompassing manner. >> Well, so before we get off the horses on the track 'cause I love to do that. (Dave chuckles) I just kind of let's talk about that. So Marc, you're putting forth the... You guys seem to agree on that premise that the database that can do more than just one thing is of appeal to customers. I suppose that makes, certainly makes sense from a cost standpoint. But, you know, guys feel free to flip back and forth between slides one and two. But you can see SAP HANA, and I'm not sure what cloud that's running on, it's probably running on a combination of clouds, but, you know, scoring very strongly. I thought, you know, Aurora, you know, given AWS says it's one of the fastest growing services in history and they've got it ahead of Db2 just on functionality, which is pretty impressive. I love Google Spanner, you know, love the... What they're trying to accomplish there. You know, you go down to Microsoft is, they're kind of the... They're always good enough a database and that's how they succeed and et cetera, et cetera. But David, it sounds like you agree with Marc. I would say, I would think though, Amazon kind of doesn't agree 'cause they're like a horses for courses. >> I agree. >> Yeah, yeah. >> So I wonder if you could comment on that. >> Well, I want to comment on two vectors. The first vector is that the size of customer and, you know, a mid-sized customer versus a global $2,000 or global 500 customer. For the smaller customer that's the heart of AWS, and they are taking their applications and putting pretty well everything into their cloud, the one cloud, and Aurora is a good choice. But when you start to get to a requirements, as you do in larger companies have very high levels of availability, the functionality is not there. You're not comparing apples and... Apples with apples, it's two very different things. So from a tier one functionality point of view, IBM Db2 and Oracle have far greater capability for recovery and all the features that they've built in over there. >> Because of their... You mean 'cause of the maturity, right? maturity and... >> Because of their... Because of their focus on transaction and recovery, et cetera. >> So SAP though HANA, I mean, that's, you know... (David talks indistinctly) And then... >> Yeah, yeah. >> And then I wanted your comments on that, either of you or both of you. I mean, SAP, I think has a stated goal of basically getting its customers off Oracle that's, you know, there's always this urinary limping >> Yes, yes. >> between the two companies by 2024. Larry has said that ain't going to happen. You know, Amazon, we know still runs on Oracle. It's very hard to migrate Mission-Critical, David, you and I know this well, Marc you as well. So, you know, people often say, well, everybody wants to get off Oracle, it's too expensive, blah, blah, blah. But we talked to a lot of Oracle customers there, they're very happy with the reliability, availability, recoverability feature set. I mean, the core of Oracle seems pretty stable. >> Yes. >> But I wonder if you guys could comment on that, maybe Marc you go first. >> Sure. I've recently done some in-depth comparisons of Oracle and Aurora, and all their other RDS services and Snowflake and Google and a variety of them. And ultimately what surprised me is you made a statement it costs too much. It actually comes in half of Aurora for in most cases. And it comes in less than half of Snowflake in most cases, which surprised me. But no matter how you configure it, ultimately based on a couple of things, each vendor is focused on different aspects of what they do. Let's say Snowflake, for example, they're on the analytical side, they don't do any transaction processing. But... >> Yeah, so if I can... Sorry to interrupt. Guys if you could bring up the next slide that would be great. So that would be slide three, because now we get into the analytical piece Marc that you're talking about that's what Snowflake specialty is. So please carry on. >> Yeah, and what they're focused on is sharing data among customers. So if, for example, you're an automobile manufacturer and you've got a huge supply chain, you can supply... You can share the data without copying the data with any of your suppliers that are on Snowflake. Now, can you do that with the other data warehouses? Yes, you can. But the focal point is for Snowflake, that's where they're aiming it. And whereas let's say the focal point for Oracle is going to be performance. So their performance affects cost 'cause the higher the performance, the less you're paying for the performing part of the payment scale. Because you're paying per second for the CPUs that you're using. Same thing on Snowflake, but the performance is higher, therefore you use less. I mean, there's a whole bunch of things to come into this but at the end of the day what I've found is Oracle tends to be a lot less expensive than the prevailing wisdom. So let's talk value for a second because you said something, that yeah the other databases can do that, what Snowflake is doing there. But my understanding of what Snowflake is doing is they built this global data mesh across multiple clouds. So not only are they compatible with Google or AWS or Azure, but essentially you sign up for Snowflake and then you can share data with anybody else in the Snowflake cloud, that I think is unique. And I know, >> Marc: Yes. >> Redshift, for instance just announced, you know, Redshift data sharing, and I believe it's just within, you know, clusters within a customer, as opposed to across an ecosystem. And I think that's where the network effect is pretty compelling for Snowflake. So independent of costs, you and I can debate about costs and, you know, the tra... The lack of transparency of, because AWS you don't know what the bill is going to be at the end of the month. And that's the same thing with Snowflake, but I find that... And by the way guys, you can flip through slides three and four, because we've got... Let me just take a quick break and you have data warehouse, logical data warehouse. And then the next slide four you got data science, deep learning and operational intelligent use cases. And you can see, you know, Teradata, you know, law... Teradata came up in the mid 1980s and dominated in that space. Oracle does very well there. You can see Snowflake pop-up, SAP with the Data Warehouse, Amazon with Redshift. You know, Google with BigQuery gets a lot of high marks from people. You know, Cloud Data is in there, you know, so you see some of those names. But so Marc and David, to me, that's a different strategy. They're not trying to be just a better data warehouse, easier data warehouse. They're trying to create, Snowflake that is, an incremental opportunity as opposed to necessarily going after, for example, Oracle. David, your thoughts. >> Yeah, I absolutely agree. I mean, ease of use is a primary benefit for Snowflake. It enables you to do stuff very easily. It enables you to take data without ETL, without any of the complexity. It enables you to share a number of resources across many different users and know... And be able to bring in what that particular user wants or part of the company wants. So in terms of where they're focusing, they've got a tremendous ease of use, tremendous focus on what the customer wants. And you pointed out yourself the restrictions there are of doing that both within Oracle and AWS. So yes, they have really focused very, very hard on that. Again, for the future, they are bringing in a lot of additional functions. They're bringing in Python into it, not Python, JSON into the database. They can extend the database itself, whether they go the whole hog and put in transaction as well, that's probably something they may be thinking about but not at the moment. >> Well, but they, you know, they obviously have to have TAM expansion designs because Marc, I mean, you know, if they just get a 100% of the data warehouse market, they're probably at a third of their stock market valuation. So they had better have, you know, a roadmap and plans to extend there. But I want to come back Marc to this notion of, you know, the right tool for the right job, or, you know, best of breed for a specific, the right specific, you know horse for course, versus this kind of notion of all in one, I mean, they're two different ends of the spectrum. You're seeing, you know, Oracle obviously very successful based on these ratings and based on, you know their track record. And Amazon, I think I lost count of the number of data stores (Dave chuckles) with Redshift and Aurora and Dynamo, and, you know, on and on and on. (Marc talks indistinctly) So they clearly want to have that, you know, primitive, you know, different APIs for each access, completely different philosophies it's like Democrats or Republicans. Marc your thoughts as to who ultimately wins in the marketplace. >> Well, it's hard to say who is ultimately going to win, but if I look at Amazon, Amazon is an all-cart type of system. If you need time series, you go with their time series database. If you need a data warehouse, you go with Redshift. If you need transaction, you go with one of the RDS databases. If you need JSON, you go with a different database. Everything is a different, unique database. Moving data between these databases is far from simple. If you need to do a analytics on one database from another, you're going to use other services that cost money. So yeah, each one will do what they say it's going to do but it's going to end up costing you a lot of money when you do any kind of integration. And you're going to add complexity and you're going to have errors. There's all sorts of issues there. So if you need more than one, probably not your best route to go, but if you need just one, it's fine. And if, and on Snowflake, you raise the issue that they're going to have to add transactions, they're going to have to rewrite their database. They have no indexes whatsoever in Snowflake. I mean, part of the simplicity that David talked about is because they had to cut corners, which makes sense. If you're focused on the data warehouse you cut out the indexes, great. You don't need them. But if you're going to do transactions, you kind of need them. So you're going to have to do some more work there. So... >> Well... So, you know, I don't know. I have a different take on that guys. I think that, I'm not sure if Snowflake will add transactions. I think maybe, you know, their hope is that the market that they're creating is big enough. I mean, I have a different view of this in that, I think the data architecture is going to change over the next 10 years. As opposed to having a monolithic system where everything goes through that big data platform, the data warehouse and the data lake. I actually see what Snowflake is trying to do and, you know, I'm sure others will join them, is to put data in the hands of product builders, data product builders or data service builders. I think they're betting that that market is incremental and maybe they don't try to take on... I think it would maybe be a mistake to try to take on Oracle. Oracle is just too strong. I wonder David, if you could comment. So it's interesting to see how strong Gartner rated Oracle in cloud database, 'cause you don't... I mean, okay, Oracle has got OCI, but you know, you think a cloud, you think Google, or Amazon, Microsoft and Google. But if I have a transaction database running on Oracle, very risky to move that, right? And so we've seen that, it's interesting. Amazon's a big customer of Oracle, Salesforce is a big customer of Oracle. You know, Larry is very outspoken about those companies. SAP customers are many, most are using Oracle. I don't, you know, it's not likely that they're going anywhere. My question to you, David, is first of all, why do they want to go to the cloud? And if they do go to the cloud, is it logical that the least risky approach is to stay with Oracle, if you're an Oracle customer, or Db2, if you're an IBM customer, and then move those other workloads that can move whether it's more data warehouse oriented or incremental transaction work that could be done in a Aurora? >> I think the first point, why should Oracle go to the cloud? Why has it gone to the cloud? And if there is a... >> Moreso... Moreso why would customers of Oracle... >> Why would customers want to... >> That's really the question. >> Well, Oracle have got Oracle Cloud@Customer and that is a very powerful way of doing it. Where exactly the same Oracle system is running on premise or in the cloud. You can have it where you want, you can have them joined together. That's unique. That's unique in the marketplace. So that gives them a very special place in large customers that have data in many different places. The second point is that moving data is very expensive. Marc was making that point earlier on. Moving data from one place to another place between two different databases is a very expensive architecture. Having the data in one place where you don't have to move it where you can go directly to it, gives you enormous capabilities for a single database, single database type. And I'm sure that from a transact... From an analytic point of view, that's where Snowflake is going, to a large single database. But where Oracle is going to is where, you combine both the transactional and the other one. And as you say, the cost of migration of databases is incredibly high, especially transaction databases, especially large complex transaction databases. >> So... >> And it takes a long time. So at least a two year... And it took five years for Amazon to actually succeed in getting a lot of their stuff over. And five years they could have been doing an awful lot more with the people that they used to bring it over. So it was a marketing decision as opposed to a rational business decision. >> It's the holy grail of the vendors, they all want your data in their database. That's why Amazon puts so much effort into it. Oracle is, you know, in obviously a very strong position. It's got growth and it's new stuff, it's old stuff. It's, you know... The problem with Oracle it has like many of the legacy vendors, it's the size of the install base is so large and it's shrinking. And the new stuff is.... The legacy stuff is shrinking. The new stuff is growing very, very fast but it's not large enough yet to offset that, you see that in all the learnings. So very positive news on, you know, the cloud database, and they just got to work through that transition. Let's bring up slide number five, because Marc, this is to me the most interesting. So we've just shown all these detailed analysis from Gartner. And then you look at the Magic Quadrant for cloud databases. And, you know, despite Amazon being behind, you know, Oracle, or Teradata, or whomever in every one of these ratings, they're up to the right. Now, of course, Gartner will caveat this and say, it doesn't necessarily mean you're the best, but of course, everybody wants to be in the upper, right. We all know that, but it doesn't necessarily mean that you should go by that database, I agree with what Gartner is saying. But look at Amazon, Microsoft and Google are like one, two and three. And then of course, you've got Oracle up there and then, you know, the others. So that I found that very curious, it is like there was a dissonance between the hardcore ratings and then the positions in the Magic Quadrant. Why do you think that is Marc? >> It, you know, it didn't surprise me in the least because of the way that Gartner does its Magic Quadrants. The higher up you go in the vertical is very much tied to the amount of revenue you get in that specific category which they're doing the Magic Quadrant. It doesn't have to do with any of the revenue from anywhere else. Just that specific quadrant is with that specific type of market. So when I look at it, Oracle's revenue still a big chunk of the revenue comes from on-prem, not in the cloud. So you're looking just at the cloud revenue. Now on the right side, moving to the right of the quadrant that's based on functionality, capabilities, the resilience, other things other than revenue. So visionary says, hey how far are you on the visionary side? Now, how they weight that again comes down to Gartner's experts and how they want to weight it and what makes more sense to them. But from my point of view, the right side is as important as the vertical side, 'cause the vertical side doesn't measure the growth rate either. And if we look at these, some of these are growing much faster than the others. For example, Snowflake is growing incredibly fast, and that doesn't reflect in these numbers from my perspective. >> Dave: I agree. >> Oracle is growing incredibly fast in the cloud. As David pointed out earlier, it's not just in their cloud where they're growing, but it's Cloud@Customer, which is basically an extension of their cloud. I don't know if that's included these numbers or not in the revenue side. So there's... There're a number of factors... >> Should it be in your opinion, Marc, would you include that in your definition of cloud? >> Yeah. >> The things that are hybrid and on-prem would that cloud... >> Yes. >> Well especially... Well, again, it depends on the hybrid. For example, if you have your own license, in your own hardware, but it connects to the cloud, no, I wouldn't include that. If you have a subscription license and subscription hardware that you don't own, but it's owned by the cloud provider, but it connects with the cloud as well, that I would. >> Interesting. Well, you know, to your point about growth, you're right. I mean, it's probably looking at, you know, revenues looking, you know, backwards from guys like Snowflake, it will be double, you know, the next one of these. It's also interesting to me on the horizontal axis to see Cloud Data and Databricks further to the right, than Snowflake, because that's kind of the data lake cloud. >> It is. >> And then of course, you've got, you know, the other... I mean, database used to be boring, so... (David laughs) It's such a hot market space here. (Marc talks indistinctly) David, your final thoughts on all this stuff. What does the customer take away here? What should I... What should my cloud database management strategy be? >> Well, I was positive about Oracle, let's take some of the negatives of Oracle. First of all, they don't make it very easy to rum on other platforms. So they have put in terms and conditions which make it very difficult to run on AWS, for example, you get double counts on the licenses, et cetera. So they haven't played well... >> Those are negotiable by the way. Those... You bring it up on the customer. You can negotiate that one. >> Can be, yes, They can be. Yes. If you're big enough they are negotiable. But Aurora certainly hasn't made it easy to work with other plat... Other clouds. What they did very... >> How about Microsoft? >> Well, no, that is exactly what I was going to say. Oracle with adjacent workloads have been working very well with Microsoft and you can then use Microsoft Azure and use a database adjacent in the same data center, working with integrated very nicely indeed. And I think Oracle has got to do that with AWS, it's got to do that with Google as well. It's got to provide a service for people to run where they want to run things not just on the Oracle cloud. If they did that, that would in my term, and my my opinion be a very strong move and would make make the capabilities available in many more places. >> Right. Awesome. Hey Marc, thanks so much for coming to theCUBE. Thank you, David, as well, and thanks to Gartner for doing all this great research and making it public on the web. You can... If you just search critical capabilities for cloud database management systems for operational use cases, that's a mouthful, and then do the same for analytical use cases, and the Magic Quadrant. There's the third doc for cloud database management systems. You'll get about two hours of reading and I learned a lot and I learned a lot here too. I appreciate the context guys. Thanks so much. >> My pleasure. All right, thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time. (upbeat music)

Published Date : Dec 18 2020

SUMMARY :

leaders all around the world. Marc Staimer is the founder, to really try to, you know, or what you have to manage. based on your reading of the Gartner work. So the very nuance, what you talked about, You're not going to do that, you I thought, you know, Aurora, you know, So I wonder if you and, you know, a mid-sized customer You mean 'cause of the maturity, right? Because of their focus you know... either of you or both of you. So, you know, people often say, But I wonder if you But no matter how you configure it, Guys if you could bring up the next slide and then you can share And by the way guys, you can And you pointed out yourself to have that, you know, So if you need more than one, I think maybe, you know, Why has it gone to the cloud? Moreso why would customers of Oracle... on premise or in the cloud. And as you say, the cost in getting a lot of their stuff over. and then, you know, the others. to the amount of revenue you in the revenue side. The things that are hybrid and on-prem that you don't own, but it's Well, you know, to your point got, you know, the other... you get double counts Those are negotiable by the way. hasn't made it easy to work and you can then use Microsoft Azure and the Magic Quadrant. We'll see you next time.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
David Floyer	PERSON	0.99+
Rick Greenwald	PERSON	0.99+
Dave	PERSON	0.99+
Marc Staimer	PERSON	0.99+
Marc	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adam Ronthal	PERSON	0.99+
Don Feinberg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Larry	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
December 2020	DATE	0.99+
IBM	ORGANIZATION	0.99+
Henry Cook	PERSON	0.99+
Palo Alto	LOCATION	0.99+
two	QUANTITY	0.99+
five years	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Merv Adrian	PERSON	0.99+
100%	QUANTITY	0.99+
second point	QUANTITY	0.99+

3 3 Adminstering Analytics v4 TRT 20m 23s

>>Yeah. >>All right. Welcome back to our third session, which is all about administering analytics at Global Scale. We're gonna be discussing how you can implement security data compliance and governance across the globe at for large numbers of users to ensure thoughts. What is open for everyone across your organization? So coming right up is Cheryl Zang, who is a senior director of product management of Thought spot, and Kendrick. He threw the sports sports director of Systems Engineering. So, Cheryl and Kendrick, the floor is yours. >>Thank you, Tina, for the introduction. So let's talk about analytics scale on. Let's understand what that is. It's really three components. It's the access to not only data but its technology, and we start looking at the intersection of that is the value that you get as an organization. When you start thinking about analytics scale, a lot of times we think of analysts at scale and we look at the cloud as the A seven m for it, and that's a That's an accurate statement because people are moving towards the cloud for a variety of reasons. And if you think about what's been driving, it has been the applications like Salesforce, Forcados, Mongo, DB, among others. And it's actually part of where we're seeing our market go where 64% of the company's air planning to move their analytics to the cloud. And if you think of stock spotted specifically, we see that vast majority of our customers are already in the cloud with one of the Big Four Cloud Data warehouses, or they're evaluated. And what we found, though, is that even though companies are moving their analytics to the cloud, we have not solved. The problem of accessing the data is a matter of fact. Our customers. They're telling us that 10 to 25% of that data warehouse that they're leveraging, they've moved and I'm utilizing. And if you look at in General, Forrester says that 60 to 73% of data that you have is not being leveraged, and if we think about why you go through, you have this process of taking enterprise data, moving it into these cubes and aggregates and building these reports dashboards. And there's this bottleneck typically of that be I to and at the end of the day, the people that are getting that data on the right hand side or on Lee. Anywhere from 20 to 30% of the population when companies want to be data driven is 20 to 30% of the population. Really what you're looking for now it's something north of that. And if you think of Cloud data, warehouse is being the the process and you bring Cloud Data Warehouse and it's still within the same framework. You know? Why invest? Why invest and truly not fix the problem? And if you take that out and your leverage okay, you don't necessarily have the You could go directly against the warehouse, but you're still not solving the reports and dashboards. Why investing truly not scale? It's the three pillars. It's technology, it's data, and it's a accessibility. So if we look at analytics at scale, it truly is being able to get to that north of the 20 to 30% have that be I team become enablers, often organization. Have them be ableto work with the data in the Cloud Data warehouse and allow the cells marking finding supplies and then hr get direct access to that. Ask their own questions to be able to leverage that to be able to do that. You really have to look at your modern data architecture and figure out where you are in this maturity, and then they'll be able to build that out. So you look at this from the left to right and sources. It's ingestion transformation. It's the storage that the technology brains e. It's the data from a historical predictive perspective. And then it's the accessibility. So it's technology. It's data accessibility. And how do you build that? Well, if you look at for a thought to spot perspective, it truly is taking and driving and leveraging the cloud data warehouse architectures, interrogated, essay behind it. And then the accessibility is the search answers pen boards and embedded analytics. If you take that and extend it where you want to augment it, it's adding our partners from E T L R E L t. Perspective like al tricks talent Matile Ian Streaming data from data brings or if you wanna leverage your cloud, data warehouses of Data Lake and then leverage the Martin capability of your child data warehouse. The augmentation leveraging out through its data bricks and data robot. And that's where your data side of that pillar gets stronger, the technologies are enabling it. And then the accessibility from the output. This thought spot. Now, if you look at the hot spots, why and how do we make this technology accessible? What's the user experience we are? We allow an organization to go from 20 to 30% population, having access to data to what it means to be truly data driven by our users. That user experience is enabled by our ability to lead a person through the search process. There are search index and rankings. This is built for search for corporate data on top of the Cloud Data Warehouse. On top of the data that you need to be able to allow a person who doesn't understand analytics to get access to the data and the questions they need to answer, Arcuri Engine makes it simple for customers to take. Ask those questions and what you might think are not complex business questions. But they turn into complex queries in the back end that someone who typically needs to know that's that power user needs to know are very engine. Isolate that from an end user and allows them to ask that question and drive that query. And it's built on an architecture that allows us to change and adapt to the types of things. It's micro services architecture, that we've not only gone from a non grim system to our cloud offering, in a matter of of really true these 23 years. And it's amazing the reason why we can do that, do that and in a sense, future proof your investment. It's because of the way we've developed this. It's wild. First, it's Michael Services. It's able to drive. So what this architecture ER that we've talked about. We've seen different conversations of beyond its thought spot everywhere, which allows us to take that spot. One. Our ability to for search for search data for auto analyzed the Monitor with that govern security in the background and being able to leverage that not only internally but externally and then being able to take thought spot modeling language for that analysts and that person who just really good at creating and let them create these models that it could be deployed anywhere very, very quickly and then taking advantage off the Cloud Data warehouse or the technology that you have and really give you accessibility the technology that you need as well as the data that you need. That's what you need to be able to administer, uh, to take analytics at scale. So what I'm gonna do now is I'm gonna turn it over to Cheryl and she's gonna talk about administration in thought spot. Cheryl, >>thank you very much Can take. Today. I'm going to show you how you can administrator and manage South Spot for your organization >>covering >>streaming topics, the user management >>data management and >>also user adoption and performance monitoring. Let's jump into the demo. >>I think the Southport Application The Admin Council provides all the core functions needed for system level administration. Let's start with user management and authentication. With the user tab. You can add or delete a user, or you can modify the setting for an existing user. For example, user name, password email. Or you can add the user toe a different group with the group's tab. You can add or delete group, or you can manage the group setting. For example, Privileges associated with all the group members, for example, can administrate a soft spot can share data with all users or can manage data this can manage data privilege is very important. It grants a user the privileges to add data source added table and worksheet, manage data for different organizations or use cases without being an at me. There is also a field called Default Pin Board. You can select a set of PIN board that will be shown toe all of the users in that group on their homepage in terms off authentication. Currently, we support three different methods local active directory and samel By default. Local authentication is enabled and you can also choose to have several integration with an external identity provider. Currently, we support actor Ping Identity, Seaside Minor or a T. F. S. The third method is integration with active directory. You can configure integration with L DAP through active directory, allowing you to authenticate users against an elder up server. Once the users and groups are added to the system, we can share pin board wisdom or they can search to ask and answer their own questions. To create a searchable data, we first need to connect to our data warehouses with embraced. You can directly query the data as it exists in the data warehouse without having to move or transfer the data. In this page, you can add a connection to any off the six supported data warehouses. Today we will be focusing on the administrative aspect off the data management. So I will close the tap here and we will be using the connections that are already being set up. Under the Data Objects tab, we can see all of the tables from the connections. Sometimes there are a lot of tables, and it may be overwhelming for the administrator to manage the data as a best practice. We recommend using stickers toe organize your data sets here, we're going to select the Salesforce sticker. This will refined a list off tables coming from Salesforce only. This helps with data, lineage and the traceability because worksheets are curated data that's based on those tables. Let's take a look at this worksheet. Here we can see the joints between tables that created a schema. Once the data analyst created the table and worksheet, the data is searchable by end users. Let's go to search first, let's select the data source here. We can see all of the data that we have been granted access to see Let's choose the Salesforce sticker and we will see all of the tables and work ship that's available to us as a data source. Let's choose this worksheet as a data source. Now we're ready to search the search Insight can be saved either into a PIN board or an answer. Okay, it's important to know that the sticker actually persist with PIN board and answers. So when the user logging, they will be able to see all of the content that's available to them. Let's go to the Admin Council and check out the User Adoption Pin board. The User Adoption Pin board contains essential information about your soft spot users and their adoption off the platform. Here, you can see daily active user, weekly, active user and monthly active user. Count that in the last 30 days you can also see the total count off the pin board and answers that saved in the system. Here, you can see that unique count off users. Now. You can also find out the top 10 users in the last 30 days. The top 10 PIN board consumers and top 10 ad hoc searchers here, you can see that trending off weekly, active users, daily, active users and hourly active users over time. You can also get information about popular pin boards and user actions in the last one month. Now let's zoom in into this chart. With this chart, you can see weekly active users and how they're using soft spot. In this example, you can see 60% of the time people are doing at Hawk search. If you would like to see what people are searching, you can do a simple drill down on quarry tax. Here we can find out the most popular credit tax that's being used is number off the opportunities. At last, I would like to show you assistant performance Tracking PIN board that's available to the ad means this PIN board contains essential information about your soft spot. Instance performance You this pimple. To understand the query, Leighton see user traffic, how users are interacting with soft spot, most frequently loaded tables and so on. The last component toe scowling hundreds of users, is a great on boarding experience. A new feature we call Search Assist helps automate on boarding while ensuring new users have the foundation. They need to be successful on Day one, when new users logging for the first time, they're presented with personalized sample searches that are specific to their data set. In this example, someone in a sales organization would see questions like What were sales by product? Type in 2020. From there are guided step by step process helps introduce new users with search ensuring a successful on boarding experience. The search assist. The coach is a customized in product Walk through that uses your own data and your own business vocabulary to take your business users from unfamiliar to near fluent in minutes. Instead of showing the entire end user experience today, I will focus on the set up and administration side off the search assist. Search Assist is easy to set up at worksheet level with flexible options for multiple guided lessons. Using preview template, we help you create multiple learning path based on department or based on your business. Users needs to set up a learning path. You're simply feeling the template with relevant search examples while previewing what the end user will see and then increase the complexity with each additional question toe. Help your users progress >>in summary. It is easy to administrator user management, data management, management and the user adoption at scale Using soft spot Admin Council Back to you, Kendrick. >>Thank you, Cheryl. That was great. Appreciate the demo there. It's awesome. It's real life data, real life software. You know what? Enclosing here? I want to talk a little bit about what we've seen out in the marketplace and some of them when we're talking through prospects and customers, what they talk a little bit about. Well, I'm not quite area either. My data is not ready or I've got I don't have a file data warehouse. That's this process. In this thinking on, we have examples and three different examples. We have a company that actually had never I hadn't even thought about analytics at scale. We come in, we talked to them in less than a week. They're able to move their data thought spot and ask questions of the billion rose in less than a week now. We've also had customers that are early adoption. They're sticking their toes in the water around the technology, so they have a lot of data warehouse and they put some data at it, and with 11 minute within 11 minutes, we were able to search on a billion rows of their data. Now they're adding more data to combine to, to be able to work with. And then we have customers that are more mature in their process. Uh, they put large volumes of data within nine minutes. We're asking questions of their data, their business users air understanding. What's going on? A second question we get sometimes is my data is not clean. We'll talk Spot is very, very good at finding that type of data. If you take, you start moving and becomes an inner door process, and we can help with that again. Within a week, we could take data, get it into your system, start asking business questions of that and be ready to go. You know, I'm gonna turn it back to you and thank you for your time. >>Kendrick and Carol thank you for joining us today and bringing all of that amazing inside for our audience at home. Let's do a couple of stretches and then join us in a few minutes for our last session of the track. Insides for all about how Canadian Tire is delivering Korean making business outcomes would certainly not in a I. So you're there

Published Date : Dec 10 2020

SUMMARY :

We're gonna be discussing how you can implement security data compliance and governance across the globe Forrester says that 60 to 73% of data that you have is not I'm going to show you how you Let's jump into the demo. and it may be overwhelming for the administrator to manage the data as data management, management and the user adoption at scale Using soft spot Admin and thank you for your time. Kendrick and Carol thank you for joining us today and bringing all of that amazing inside for our audience at home.

ENTITIES

Entity	Category	Confidence
Cheryl	PERSON	0.99+
Tina	PERSON	0.99+
Kendrick	PERSON	0.99+
Cheryl Zang	PERSON	0.99+
10	QUANTITY	0.99+
60	QUANTITY	0.99+
20	QUANTITY	0.99+
60%	QUANTITY	0.99+
Forrester	ORGANIZATION	0.99+
third session	QUANTITY	0.99+
64%	QUANTITY	0.99+
11 minute	QUANTITY	0.99+
Today	DATE	0.99+
First	QUANTITY	0.99+
30%	QUANTITY	0.99+
nine minutes	QUANTITY	0.99+
third method	QUANTITY	0.99+
second question	QUANTITY	0.99+
Global Scale	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
South Spot	ORGANIZATION	0.99+
less than a week	QUANTITY	0.99+
23 years	QUANTITY	0.99+
2020	DATE	0.99+
Carol	PERSON	0.99+
Leighton	ORGANIZATION	0.98+
today	DATE	0.98+
Michael Services	ORGANIZATION	0.98+
25%	QUANTITY	0.97+
73%	QUANTITY	0.97+
hundreds of users	QUANTITY	0.97+
11 minutes	QUANTITY	0.97+
Matile Ian	PERSON	0.97+
first	QUANTITY	0.96+
three pillars	QUANTITY	0.96+
three components	QUANTITY	0.96+
one	QUANTITY	0.95+
three different methods	QUANTITY	0.95+
10 users	QUANTITY	0.95+
Day one	QUANTITY	0.95+
six supported data warehouses	QUANTITY	0.94+
Systems Engineering	ORGANIZATION	0.94+
Thought spot	ORGANIZATION	0.93+
Data Lake	ORGANIZATION	0.91+
Arcuri Engine	ORGANIZATION	0.9+
10 ad hoc searchers	QUANTITY	0.9+
Warehouse	TITLE	0.89+
billion rows	QUANTITY	0.88+
Cloud Data warehouse	TITLE	0.87+
billion	QUANTITY	0.86+
three different examples	QUANTITY	0.86+
last one month	DATE	0.86+
Salesforce	ORGANIZATION	0.86+
a week	QUANTITY	0.85+
Canadian	OTHER	0.84+
each additional question	QUANTITY	0.83+
v4	OTHER	0.83+
last 30 days	DATE	0.78+
Salesforce	TITLE	0.77+
last 30 days	DATE	0.77+
Korean	OTHER	0.75+
One	QUANTITY	0.74+
Search	TITLE	0.73+
Big Four	QUANTITY	0.73+
Martin	PERSON	0.72+
DB	TITLE	0.72+
10 PIN	QUANTITY	0.71+
Southport	TITLE	0.66+
Lee	PERSON	0.66+
Hawk	ORGANIZATION	0.66+
Adminstering Analytics	TITLE	0.66+
Mongo	TITLE	0.64+
Forcados	TITLE	0.64+
Seaside Minor	ORGANIZATION	0.62+
gress	ORGANIZATION	0.6+
Cloud	TITLE	0.57+
Ping	TITLE	0.53+
seven	QUANTITY	0.49+
User Adoption	ORGANIZATION	0.39+
20m	OTHER	0.36+
User	ORGANIZATION	0.35+
Adoption	COMMERCIAL_ITEM	0.35+

Unleash the Power of Your Cloud Data | Beyond.2020 Digital

>>Yeah, yeah. Welcome back to the third session in our building, A vibrant data ecosystem track. This session is unleash the power of your cloud data warehouse. So what comes after you've moved your data to the cloud in this session will explore White Enterprise Analytics is finally ready for the cloud, and we'll discuss how you can consume Enterprise Analytics in the very same way he would cloud services. We'll also explore where analytics meets cloud and see firsthand how thought spot is open for everyone. Let's get going. I'm happy to say we'll be hearing from two folks from thought spot today, Michael said Cassie, VP of strategic partnerships, and Vika Valentina, senior product marketing manager. And I'm very excited to welcome from our partner at AWS Gal Bar MIA, product engineering manager with Red Shift. We'll also be sharing a live demo of thought spot for BTC Marketing Analytics directly on Red Shift data. Gal, please kick us off. >>Thank you, Military. And thanks. The talks about team and everyone attending today for joining us. When we talk about data driven organizations, we hear that 85% of businesses want to be data driven. However, on Lee. 37% have been successful in We ask ourselves, Why is that and believe it or not, Ah, lot of customers tell us that they struggled with live in defining what being data driven it even means, and in particular aligning that definition between the business and the technology stakeholders. Let's talk a little bit. Let's look at our own definition. A data driven organization is an organization that harnesses data is an asset. The drive sustained innovation and create actionable insights. The super charge, the experience of their customers so they demand more. Let's focus on a few things here. One is data is an asset. Data is very much like a product needs to evolve sustained innovation. It's not just innovation innovation, it's sustained. We need to continuously innovate when it comes to data actionable insights. It's not just interesting insights these air actionable that the business can take and act upon, and obviously the actual experience we. Whether whether the customers are internal or external, we want them to request Mawr insights and as such, drive mawr innovation, and we call this the for the flywheel. We use the flywheel metaphor here where we created that data set. Okay, Our first product. Any focused on a specific use case? We build an initial NDP around that we provided with that with our customers, internal or external. They provide feedback, the request, more features. They want mawr insights that enables us to learn bringing more data and reach that actual data. And again we create MAWR insights. And as the flywheel spins faster, we improve on operational efficiencies, supporting greater data richness, and we reduce the cost of experimentation and legacy environments were never built for this kind of agility. In many cases, customers have struggled to keep momentum in their fleet, flywheel in particular around operational efficiency and experimentation. This is where Richie fits in and helps customer make the transition to a true data driven organization. Red Shift is the most widely used data warehouse with tens of thousands of customers. It allows you to analyze all your data. It is the only cloud data warehouse that sits, allows you to analyze data that sits in your data lake on Amazon, a street with no loading duplication or CTL required. It is also allows you to scale with the business with its hybrid architectures it also accelerates performance. It's a shared storage that provides the ability to scale toe unlimited concurrency. While the UN instant storage provides low late and say access to data it also provides three. Key asks that customers consistently tell us that matter the most when it comes to cost. One is usage based pricing Instead of license based pricing. Great value as you scale your data warehouse using, for example, reserved instances they can save up to 75% compared to on the mind demand prices. And as your data grows, infrequently accessed data can be stored. Cost effectively in S three encouraged through Amazon spectrum, and the third aspect is predictable. Month to month spend with no hitting charges and surprises. Unlike and unlike other cloud data warehouses, where you need premium versions for additional enterprise capabilities. Wretched spicing include building security compression and data transfer. >>Great Thanks. Scout um, eso. As you can see, everybody wins with the cloud data warehouses. Um, there's this evolution of movement of users and data and organizations to get value with these cloud data warehouses. And the key is the data has to be accessible by the users, and this data and the ability to make business decisions on the data. It ranges from users on the front line all the way up to the boardroom. So while we've seen this evolution to the Cloud Data Warehouse, as you can see from the statistic from Forrester, we're still struggling with how much of that data actually gets used for analytics. And so what is holding us back? One of the main reasons is old technology really trying to work with today's modern cloud data warehouses? They weren't built for it. So you run into issues of trying to do data replication, getting the data out of the cloud data warehouse. You can do analysis and then maintaining these middle layers of data so that you can access it quickly and get the answers you need. Another issue that's holding us back is this idea that you have to have your data in perfect shape with the perfect pipeline based on the exact dashboard unique. Um, this isn't true. Now, with Cloud data warehouse and the speed of important business data getting into those cloud data warehouses, you need a solution that allows you to access it right away without having everything to be perfect from the start, and I think this is a great opportunity for GAL and I have a little further discussion on what we're seeing in the marketplace. Um, one of the primary ones is like, What are the limiting factors, your Siegel of legacy technologies in the market when it comes to this cloud transformation we're talking about >>here? It's a great question, Michael and the variety of aspect when it comes to legacy, the other warehouses that are slowing down innovation for companies and businesses. I'll focus on 21 is performance right? We want faster insights. Companies want the ability to analyze MAWR data faster. And when it comes to on prem or legacy data warehouses, that's hard to achieve because the second aspect comes into display, which is the lack of flexibility, right. If you want to increase your capacity of your warehouse, you need to ensure request someone needs to go and bring an actual machine and install it and expand your data warehouse. When it comes to the cloud, it's literally a click of a button, which allows you to increase the capacity of your data warehouse and enable your internal and external users to perform analytics at scale and much faster. >>It falls right into the explanation you provided there, right as the speed of the data warehouses and the data gets faster and faster as it scales, older solutions aren't built toe leverage that, um, you know, they're either they're having to make technical, you know, technical cuts there, either looking at smaller amounts of data so that they can get to the data quicker. Um, or it's taking longer to get to the data when the data warehouse is ready, when it could just be live career to get the answers you need. And that's definitely an issue that we're seeing in the marketplace. I think the other one that you're looking at is things like governance, lineage, regulatory requirements. How is the cloud you know, making it easier? >>That's That's again an area where I think the cloud shines. Because AWS AWS scale allows significantly more investment in securing security policies and compliance, it allows customers. So, for example, Amazon redshift comes by default with suck 1 to 3 p. C. I. Aiso fared rampant HIPPA compliance, all of them out of the box and at our scale. We have the capacity to implement those by default for all of our customers and allow them to focus. Their very expensive, valuable ICTY resource is on actual applications that differentiate their business and transform the customer experience. >>That's a great point, gal. So we've talked about the, you know, limiting factors. Technology wise, we've mentioned things like governance. But what about the cultural aspect? Right? So what do you see? What do you see in team struggling in meeting? You know, their cloud data warehouse strategy today. >>And and that's true. One of the biggest challenges for large large organizations when they moved to the cloud is not about the technology. It's about people, process and culture, and we see differences between organizations that talk about moving to the cloud and ones that actually do it. And first of all, you wanna have senior leadership, drive and be aligned and committed to making the move to the cloud. But it's not just that you want. We see organizations sometimes Carol get paralyzed. If they can't figure out how to move each and every last work clothes, there's no need to boil the ocean, so we often work with organizations to find that iterative motion that relative process off identifying the use cases are date identifying workloads in migrating them one at a time and and through that allowed organization to grow its knowledge from a cloud perspective as well as adopt its tooling and learn about the new capabilities. >>And from an analytics perspective, we see the same right. You don't need a pixel perfect dashboard every single time to get value from your data. You don't need to wait until the data warehouse is perfect or the pipeline to the data warehouse is perfect. With today's technology, you should be able to look at the data in your cloud data warehouse immediately and get value from it. And that's the you know, that's that change that we're pushing and starting to see today. Thanks. God, that was That was really interesting. Um, you know, as we look through that, you know, this transformation we're seeing in analytics, um, isn't really that old? 20 years ago, data warehouses were primarily on Prem and the applications the B I tools used for analytics around them were on premise well, and so you saw things like applications like Salesforce. That live in the cloud. You start having to pull data from the cloud on Prem in order to do analytics with it. Um, you know, then we saw the shift about 10 years ago in the explosion of Cloud Data Warehouse Because of their scale, cost reduced, reduce shin reduction and speed. You know, we're seeing cloud data. Warehouses like Amazon Red Shift really take place, take hold of the marketplace and are the predominant ways of storing data moving forward. What we haven't seen is the B I tools catch up. And so when you have this new cloud data warehouse technology, you really need tools that were custom built for it to take advantage of it, to be able to query the cloud data warehouse directly and get results very quickly without having to worry about creating, you know, a middle layer of data or pipelines in order to manage it. And, you know, one company captures that really Well, um, chick fil A. I'm sure everybody has heard of is one of the largest food chains in America. And, you know, they made a huge investment in red shift and one of the purposes of that investment is they wanted to get access to the data mawr quickly, and they really wanted to give their business users, um, the ability to do some ad hoc analysis on the data that they were capturing. They found that with their older tools, the problems that they were finding was that all the data when they're trying to do this analysis was staying at the analyst level. So somebody needed to create a dashboard in order to share that data with a user. And if the user's requirements changed, the analysts were starting to become burdened with requests for changes and the time it took to reflect those changes. So they wanted to move to fought spot with embrace to connect to Red Shift so they could start giving business users that capability. Query the database right away. And with this, um, they were able to find, you know, very common things in in the supply chain analysis around the ability to figure out what store should get, what product that was selling better. The other part was they didn't have to wait for the data to get settled into some sort of repository or second level database. They were able to query it quickly. And then with that, they're able to make changes right in the red shift database that were then reflected to customers and the business users right away. So what they found from this is by adopting thought spot, they were actually able to arm business users with the ability to make decisions very quickly. And they cleared up the backlog that they were having and the delay with their analysts. And they're also putting their analysts toe work on different projects where they could get better value from. So when you look at the way we work with a cloud data warehouse, um, you have to think of thoughts about embrace as the tool that access that layer. The perfect analytic partner for the Cloud Data Warehouse. We will do the live query for the business user. You don't need to know how to script and sequel, um Thio access, you know, red shift. You can type the question that you want the answer to and thought spot will take care of that query. We will do the indexing so that the results come back faster for you and we will also do the analysis on. This is one of the things I wanted to cover, which is our spot i. Q. This is new for our ability to use this with embrace and our partners at Red Shift is now. We can give you the ability to do auto analysis to look at things like leading indicators, trends and anomalies. So to put this in perspective amount imagine somebody was doing forecasting for you know Q three in the western region. And they looked at how their stores were doing. And they saw that, you know, one store was performing well, Spot like, you might be able to look at that analysis and see if there's a leading product that is underperforming based on perhaps the last few quarters of data. And bring that up to the business user for analysis right away. They don't need to have to figure that out. And, um, you know, slice and dice to find that issue on their own. And then finally, all the work you do in data management and governance in your cloud data warehouse gets reflected in the results in embrace right away. So I've done a lot of talking about embrace, and I could do more, but I think it would be far better toe. Have Vika actually show you how the product works, Vika. >>Thanks, Michael. We learned a lot today about the power of leveraging your red shift data and thought spot. But now let me show you how it works. The coronavirus pandemic has presented extraordinary challenges for many businesses, and some industries have fared better than others. One industry that seems to weather the storm pretty well actually is streaming media. So companies like Netflix and who Lou. And in this demo, we're going to be looking at data from B to C marketing efforts. First streaming media company in 2020 lately, we've been running campaigns for comedy, drama, kids and family and reality content. Each of our campaigns last four weeks, and they're staggered on a weekly basis. Therefore, we always have four campaigns running, and we can focus on one campaign launch per >>week, >>and today we'll be digging into how our campaigns are performing. We'll be looking at things like impressions, conversions and users demographic data. So let's go ahead and look at that data. We'll see what we can learn from what's happened this year so far, and how we can apply those learnings to future decision making. As you can already see on the thoughts about homepage, I've created a few pin boards that I use for reporting purposes. The homepage also includes what others on my team and I have been looking at most recently. Now, before we dive into a search, will first take a look at how to make a direct connection to the customer database and red shift to save time. I've already pre built the connection Red Shift, but I'll show you how easy it is to make that connection in just three steps. So first we give the connection name and we select our connection type and was on red Shift. Then we enter our red shift credentials, and finally, we select the tables that we want to use Great now ready to start searching. So let's start in this data to get a better idea of how our marketing efforts have been affected either positively or negatively by this really challenging situation. When we think of ad based online marketing campaigns, we think of impressions, clicks and conversions. Let's >>look at those >>on a daily basis for our purposes. So all this data is available to us in Thought spot, and we can easily you search to create a nice line chart like this that shows US trends over the last few months and based on experience. We understand that we're going to have more clicks than impressions and more impressions and conversions. If we started the chart for a minute, we could see that while impressions appear to be pretty steady over the course of the year, clicks and especially conversions both get a nice boost in mid to late March, right around the time that pandemic related policies were being implemented. So right off the bat, we found something interesting, and we can come back to this now. There are few metrics that we're gonna focus on as we analyze our marketing data. Our overall goal is obviously to drive conversions, meaning that we bring new users into our streaming service. And in order to get a visitor to sign up in the first place, we need them to get into our sign up page. A compelling campaign is going to generate clicks, so if someone is interested in our ad, they're more likely to click on it, so we'll search for Click through Rape 5% and we'll look this up by campaign name. Now even compare all the campaigns that we've launched this year to see which have been most effective and bring visitors star site. And I mentioned earlier that we have four different types of campaign content, each one aligned with one of our most popular genres. So by adding campaign content, yeah, >>and I >>just want to see the top 10. I could limit my church. Just these top 10 campaigns automatically sorted by click through rate and assigned a color for each category so we could see right away that comedy and drama each of three of the top 10 campaigns by click through rate reality is, too, including the top spot and kids and family makes one appearance as well. Without spot. We know that any non technical user can ask a question and get an answer. They can explore the answer and ask another question. When you get an answer that you want to share, keep an eye on moving forward, you pin the answer to pin board. So the BBC Marketing Campaign Statistics PIN board gives us a solid overview of our campaign related activities and metrics throughout 2020. The visuals here keep us up to date on click through rate and cost per click, but also another really important metrics that conversions or cost proposition. Now it's important to our business that we evaluate the effectiveness of our spending. Let's do another search. We're going to look at how many new customers were getting so conversions and the price cost per acquisition that we're spending to get each of these by the campaign contact category. So >>this is a >>really telling chart. We can basically see how much each new users costing us, based on the content that they see prior to signing up to the service. Drama and reality users are actually relatively expensive compared to those who joined based on comedy and kids and family content that they saw. And if all the genres kids and family is actually giving us the best bang for our marketing >>buck. >>And that's good news because the genres providing the best value are also providing the most customers. We mentioned earlier that we actually saw a sizable uptick in conversions as stay at home policies were implemented across much of the country. So we're gonna remove cost per acquisition, and we're gonna take a daily look how our campaign content has trended over the years so far. Eso By doing this now, we can see a comparison of the different genres daily. Some campaigns have been more successful than others. Obviously, for example, kids and family contact has always fared pretty well Azaz comedy. But as we moved into the stay at home area of the line chart, we really saw these two genres begin to separate from the rest. And even here in June, as some states started to reopen, we're seeing that they're still trending up, and we're also seeing reality start to catch up around that time. And while the first pin board that we looked at included all sorts of campaign metrics, this is another PIN board that we've created so solely to focus on conversions. So not only can we see which campaigns drug significant conversions, we could also dig into the demographics of new users, like which campaigns and what content brought users from different parts of the country or from different age groups. And all this is just a quick search away without spot search directly on a red shift. Data Mhm. All right, Thank you. And back to you, Michael. >>Great. Thanks, Vika. That was excellent. Um, so as you can see, you can very quickly go from zero to search with thought Spot, um, connected to any cloud data warehouse. And I think it's important to understand that we mentioned it before. Not everything has to be perfect. In your doubt, in your cloud data warehouse, um, you can use thought spot as your initial for your initial tool. It's for investigatory purposes, A Z you can see here with star, Gento, imax and anthem. And a lot of these cases we were looking at billions of rows of data within minutes. And as you as your data warehouse maturity grows, you can start to add more and more thoughts about users to leverage the data and get better analysis from it. So we hope that you've enjoyed what you see today and take the step to either do one of two things. We have a free trial of thoughts about cloud. If you go to the website that you see below and register, we can get you access the thought spots so you can start searching today. Another option, by contacting our team, is to do a zero to search workshop where 90 minutes will work with you to connect your data source and start to build some insights and exactly what you're trying to find for your business. Um thanks, everybody. I would especially like to thank golf from AWS for joining us on this today. We appreciate your participation, and I hope everybody enjoyed what they saw. I think we have a few questions now. >>Thank you, Vika, Gal and Michael. It's always exciting to see a live demo. I know that I'm one of those comedy numbers. We have just a few minutes left, but I would love to ask a couple of last questions Before we go. Michael will give you the first question. Do I need to have all of my data cleaned and ready in my cloud data warehouse before I begin with thought spot? >>That's a great question, Mallory. No, you don't. You can really start using thought spot for search right away and start getting analysis and start understanding the data through the automatic search analysis and the way that we query the data and we've seen customers do that. Chick fil a example that we talked about earlier is where they were able to use thoughts bought to notice an anomaly in the Cloud Data Warehouse linking between product and store. They were able to fix that very quickly. Then that gets reflected across all of the users because our product queries the Cloud Data Warehouse directly so you can get started right away without it having to be perfect. And >>that's awesome. And gal will leave a fun one for you. What can we look forward to from Amazon Red Shift next year? >>That's a great question. And you know, the team has been innovating extremely fast. We released more than 200 features in the last year and a half, and we continue innovating. Um, one thing that stands out is aqua, which is a innovative new technology. Um, in fact, lovely stands for Advanced Square Accelerator, and it allows customers to achieve performance that up to 10 times faster, uh, than what they've seen really outstanding and and the way we've achieved that is through a shift in paradigm in the actual technological implementation section. Uh, aqua is a new distributed and hardware accelerated processing layer, which effectively allows us to push down operations analytics operations like compression, encryption, filtering and aggregations to the storage there layer and allow the aqua nodes that are built with custom. AWS designed analytics processors to perform these operations faster than traditional soup use. And we no longer need to bring, you know, scan the data and bring it all the way to the computational notes were able to apply these these predicates filtering and encourage encryption and compression and aggregations at the storage level. And likewise is going to be available for every are a three, um, customer out of the box with no changes to come. So I apologize for being getting out a little bit, but this is really exciting. >>No, that's why we invited you. Call. Thank you on. Thank you. Also to Michael and Vika. That was excellent. We really appreciate it. For all of you tuning in at home. The final session of this track is coming up shortly. You aren't gonna want to miss it. We're gonna end strong, come back and hear directly from our customer a T mobile on how T Mobile is building a data driven organization with thought spot in which >>pro, It's >>up next, see you then.

Published Date : Dec 10 2020

SUMMARY :

is finally ready for the cloud, and we'll discuss how you can that provides the ability to scale toe unlimited concurrency. to the Cloud Data Warehouse, as you can see from the statistic from Forrester, which allows you to increase the capacity of your data warehouse and enable your they're either they're having to make technical, you know, technical cuts there, We have the capacity So what do you see? And first of all, you wanna have senior leadership, drive and And that's the you know, that's that change that And in this demo, we're going to be looking at data from B to C marketing efforts. I've already pre built the connection Red Shift, but I'll show you how easy it is to make that connection in just three all this data is available to us in Thought spot, and we can easily you search to create a nice line chart like this that Now it's important to our business that we evaluate the effectiveness of our spending. And if all the genres kids and family is actually giving us the best bang for our marketing And that's good news because the genres providing the best value are also providing the most customers. And as you as your Do I need to have all of my data cleaned the Cloud Data Warehouse directly so you can get started right away without it having to be perfect. forward to from Amazon Red Shift next year? And you know, the team has been innovating extremely fast. For all of you tuning in at home.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Cassie	PERSON	0.99+
Vika	PERSON	0.99+
Vika Valentina	PERSON	0.99+
America	LOCATION	0.99+
90 minutes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
June	DATE	0.99+
2020	DATE	0.99+
T Mobile	ORGANIZATION	0.99+
two folks	QUANTITY	0.99+
first question	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
first product	QUANTITY	0.99+
First	QUANTITY	0.99+
next year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
85%	QUANTITY	0.99+
third session	QUANTITY	0.99+
Gal	PERSON	0.99+
second aspect	QUANTITY	0.99+
third aspect	QUANTITY	0.99+
more than 200 features	QUANTITY	0.99+
One	QUANTITY	0.99+
one campaign	QUANTITY	0.99+
today	DATE	0.99+
Each	QUANTITY	0.99+
T mobile	ORGANIZATION	0.99+
Carol	PERSON	0.99+
each category	QUANTITY	0.98+
one	QUANTITY	0.98+
37%	QUANTITY	0.98+
first	QUANTITY	0.98+
two genres	QUANTITY	0.98+
three steps	QUANTITY	0.98+
Red Shift	ORGANIZATION	0.98+
20 years ago	DATE	0.98+
one store	QUANTITY	0.98+
three	QUANTITY	0.97+
tens of thousands of customers	QUANTITY	0.97+
MIA	PERSON	0.97+
21	QUANTITY	0.97+
US	LOCATION	0.97+
One industry	QUANTITY	0.97+
each one	QUANTITY	0.97+
Mallory	PERSON	0.97+
each	QUANTITY	0.97+
Vika	ORGANIZATION	0.97+
this year	DATE	0.97+
up to 75%	QUANTITY	0.97+
mid	DATE	0.97+
Lee	PERSON	0.96+
up to 10 times	QUANTITY	0.95+
S three	TITLE	0.95+
first pin board	QUANTITY	0.93+
both	QUANTITY	0.93+
two things	QUANTITY	0.93+
four campaigns	QUANTITY	0.93+
top 10	QUANTITY	0.92+
one thing	QUANTITY	0.92+
late March	DATE	0.91+
Cloud Data Warehouse	ORGANIZATION	0.91+

Ajeet Singh, ThoughtSpot | CUBE Conversation, November 2020

>> Narrator: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is theCUBE conversation. >> Everyone welcome to this special CUBE conversation. I'm John Furrier, host of theCUBE here in our Palo Alto studios. During this time of the pandemic, we're doing a lot of remote interviews, supporting a lot of events. theCUBE virtual is our new brand because there's no events to go to, but we certainly want to talk to the best people and get the most important stories. And today I have a great segment with a world-class entrepreneur, Ajeet Singh co-founder and executive chairman of ThoughtSpot. And they've got an event coming up, which is going to be coming up in December 9th and 10th. But this interview is really about what it takes to be a world-class leader and what it takes to see the future and be a visionary, but then execute an opportunity because this is the time that we're in right now is there's a lot of change, data, technology, a sea change is happening and it's upon us and leadership around technology and how to capture opportunities is really what we need right now. And so Ajeet I want to thank you for coming on to theCUBE conversation. >> Thanks for having me, John. Pleasure to be here. >> For the folks watching, the startup that you've been doing for many, many years now, ThoughtSpot you're the co-founder executive chairman, but you also were involved in Nutanix as the co-founder of that company as well. You know, a little about unicorns and creating value and doing things early, but you're a visionary and you're a technologist and a leader. I want to go in and explore that because now more than ever, the role of data, the role of the truth is super important. And as the co-founder, your company is well positioned to do that. I mean, your tagline today on the website says insight is the speed of thought, but going back to the beginning, probably wasn't the tagline. It was probably maybe like we got to leverage data, take us through the vision initially when you founded the company in 2012. What was the thinking? What was on your mind? Take us through the journey. >> Yeah. So as an entrepreneur, I think visionary is a very big term. I don't know if I qualify for that or not, but what I'm really passionate about is identifying very large markets, with very, very big problems. And then going to the white board and from scratch, building a solution that is perfectly designed for the big problem that the market might be facing from scratch. And just an absolute honest way of approaching the problem and finding the best possible solution. So when we were starting ThoughtSpot, the market that we identified was analytics, analytics software. And the big problem that we saw was that while on one hand, companies were building very big data lakes, data warehouses, there was a lot of money being spent in capturing and storing data how that data was consumed by the end-users, the non-technical people, the sales, marketing, HR people, the doctors, the nurses, that process was not changing. That process was still stuck in old times where you have to ask an analyst to go and build a dashboard for you. And at the same time, we saw that in the consumer space, when anyone had a question they wanted to learn about something, they would just go to Google and ask that question. So we said, why can't analytics be as easy as Google? If I have a question, why do I have to wait for three weeks for some data experts to bring some insights to me for most simple questions, if I'm doing some very deep analysis, trying to come up with fraud algorithms, it's understood, you know, you need data expert. But if I'm just trying to understand how my business is doing, how my customers are doing, I shouldn't have to wait. And so that's how we identified the market and the problem. And then we build a solution that is designed for that non-technical user with a very design thinking UX first approach to make it super easy for anyone to ask that question. So that was the Genesis of the company. >> You know, I just love the thinking because you're solving a problem with a clean sheet piece of paper, you're looking at what can be done. And it's just, you can bring up Google because you know, you think about Google's motto was find what you're looking for. And they had a little gimmicky buttons, like I'm feeling lucky, which just took you to a random webpage at that time while everyone else was tryna build these walled gardens and this structural apparatus, Google wanted you in and out with your results fast. And that mindset just never came over to the enterprise and with all that legacy structure and all the baggage associated with it. So I totally loved the vision, but I got to ask you, how did you get to beachhead? How did you get that first success milestone? When did you see results in your thinking? >> Yeah, so I mean, I believe that once you've identified a big market and a big problem, it comes down to the people. So I sort of went on a recruit recruiting mission and I recruited perhaps the best technology and business team that you can find in any enterprise segment, not only just analytics, some of the early engineers, my co-founder, he was at Google before that, Amit Prakash, before that he was at Microsoft working on Bing. So it took a lot of very deliberate effort to find the right kind of people who have a builder's mentality and are also deep experts in areas like search large-scale distributed systems. Very passionate about user experience. And then you start building the product, you know, it took us almost, I would say one and a half three years to get the initial working version of the product. And we were lucky enough to engage with some of the largest companies in the world, such as Walmart who are very interested in our solution because they were facing these kinds of problems. And we almost co-developed this technology with our early customers, focusing on ease of use, scale, security, governance, all of that, because it's one thing to have a concept where you want to make access to data as easy as Google, you have a certain interface people can type and get an answer. But when you are talking about enterprise data and enterprise needs, they are nowhere similar to what you have in consumer space. Consumer space is free for all, all the information is there you can crawl it and then you can access it. In enterprise, for you to take this idea of search, but make it production grid, make it real and not just a concept card. You need to invest a lot in building deep technology and then enabling security and scalability and all of that. So it took us almost , I would say a two and a half to three years to get to the initial version of the product and the problem we are solving and the area of technology search that we are working on. We brought it to the market. It's almost an infinite game. You know, you can keep making things easier and easier. And we've seen how Google has continued to evolve their search over time And it is still evolving. We just feel so lucky to be in this market, taking the direction that we have taken. >> Yeah. It's easy to talk a big game in this area because like you said, it's a hard technical problem because it'll structural data, whether it's schema databases or whatever, legacy baggage, but to make it easy, hard. And I like what you guys go with this, find the right information and put it in the right place, the right time. It's a really hard problem. And the beautiful thing is you guys are building a category while there's spend in the market that needs the problem today. So category creation with an existing market that needs it. So I got to ask you, if you could do me a favor and define for the audience, what is search-driven analytics? What does that mean from your standpoint? >> Yeah, what it means is for the end user, it looks like search but under the hood is driving large scale analytics. I like to say that our product looks like a search engine on the surface, but under the hood, it's a massive number crunching machine. So Search and AI driven analytics. There's two goals there. One, if the user has, any user and we're talking about non-technical users here, we're not talking about necessarily data experts, but if a user has a question, they should be able to get an answer instantly. They shouldn't have to wait. That is what we achieve with Search and with Spot IQ, our AI engine, we help surface insights where people may not even know that those are the questions they should be asking because data has become so complex. People often don't even know what question they should be asking. And we give them a pool that's very easy to use, but it helps surface insights to them. So there is both a pool model that we enabled through Search and a push model that we enable through Spot IQ. >> So I have to ask you that you guys are pioneering this segment you're in first. And sometimes when you're first, you have arrows in your back as you know, it's not all the beginners survive, they get competition copies, but you guys have had a lead. You had success. What's different today as you have competition coming in trying to say, "Oh, we got Search too." So what's different today with ThoughtSpot? How are you guys differentiated? >> Yeah. I mean, that's always a sign of success. If what you are trying to do, if others are saying we have it too, you have done something that is valuable. And that happens in all industry. I think the best example is Tesla. They were the first to look at this very well-known problem. I mean, we haven't had a very sort of unique take on the existence of the problem itself. Everybody knows that there is a problem with access to data, but the technology that we have built is so deep that it's very, very hard to really copy it and make it work in real world with Tesla in automotive industry in cars, there is obviously so many other companies that have launched battery powered cars, electric cars, but there is Tesla and there is all the other electric cars which are a bit of an afterthought, because if you want to build an analytics product, where Search is at the core, Search cannot be added on the top, Search has to be the core, and then you build around it. And that requires you to build a fundamental architecture from the ground up. And you can't take an existing BI product that is built for dash boarding and add a search bar. I have always said that adding a search bar in a UI is perhaps, you know, 10 to 20 lines of JavaScript code. Anyone can add it and there is so much open source stuff out there that you can just take it and plug it. And many people have tried to do that, but taking off the shelf, Search technology that is built for unstructured data and sticking it on to a product that is required to do analytics on enterprise data, that doesn't work. We built a search technology that understands enterprise data at a very deep level, so that when our customers take our product and bring it into their environment, they don't have to fundamentally change how they manage their data. Our goal is to add value to their existing enterprise data Cloud Data Warehouses and deliver this amazing Search experience where our Search engine is enable to understand what's in their data Lake, what's in their Cloud Data Warehouse. What are the schema, the tables, the joints, the cardinality, the data archive, the security requirements, all of things have to be understood by the technology for you to deliver the experience. So now that said, we pride ourselves in not resting on our laurels. You know, we have this sort of motto in the company. We say we are only 2% done. So we are on our own sort of a continuous journey of innovation. And we have been working on taking our Search technology to the next level. And that is something really powerful that we are going to unveil at our upcoming conference, Beyond, in December. And that is one to create even more distance between us and the competition. And it's all driven by what we have seen with our customers, how they're using our product or learnings what they like, what they don't like, where we see gaps and where we see opportunity to make it even easier to deliver value to our customers and our users. >> I think that's a really profound insight you just shared, because if you look at what you just said around thinking about Search as an embedded architectural foundational, you know, embedded in the architecture, that's different than bolting on a feature where you said Java code or some open source library. You know, we see in the security market, people bolted on security had huge problems. Now, all you hear is, "Oh, you got a big security in from the beginning." You actually have baked Search into everything from the beginning. And it's not just a utility, it's a mindset. And it's also a technology metadata data about data software, and all kinds of tech is involved. Am I getting that right? I mean, cause I think this is what I heard you say. It's like, you got to have the data. >> This is totally right. I mean, if I can use an analogy, there is Google search and obviously Yahoo also tried to bring their own search Yahoo search Yahoo actually, Yahoo versus Google is a perfect example or a perfect analogy to compare with ThoughtSpot versus other BI product Yahoo was built for predefined content consumption. You know, you had a homepage, somebody defined it. You could make some customizations. And there is predefined content you can consume it. Now, they also did add search, but that didn't really go so far. While Google said, we will vary from scratch ability to crawl all the data, ability to index all the data and then build a serving infrastructure that deliver this amazing performance and interactivity and relevance for the user. Relevance is where Google already shined. And you can't do those things until you think about the architecture from the ground up. >> Ajeet I'm looking forward to having more deep dive conversations on that one topic. But for the folks who might not be old enough, like me to remember Google back at that time, Yahoo was the best search engine and it was directory basically with a keyword search. It was trivial, technically speaking, but they got big. And then the portal wars came out, we got to have a portal. Google was very much not looked down as an innovator, but they had great technical chops and they just stayed the course. They had a mission to provide the best search engine to help users find what they're looking for. And they never wavered. And it was not fashionable about that time to your point. And then Yahoo was number one, then Google just became Google and the rest is history. So I really think that's super notable because companies face the same problem. What looks like fashionable tech today might not be the right one. I think that's... >> Yeah, and I totally agree. And I think a lot of times in our space, there's a lot of sort of hype around AI and machine learning. We as a company have tried to stay close to our customers and users and build things that will work for them. And a lot of stuff that we are doing, it has never been done before. So it's not to say that along the way, we don't have our own failures. We do have failures and we learn from them. >> Yeah. Yeah. Just don't make the same mistake twice. >> Yeah, I think if you have a process of learning quickly, improving quickly, those are the companies that will have a competitive advantage. In today's world, nobody gets it right the first time. If you're trying to do something fundamentally different, if you're copying somebody else, then you're too late already. >> I totally agree. >> If you do something new, it's about how fast you penetrate And that's... >> That's a great mindset. That's a great mindset. And I think that's worth capturing calling out, but I got to ask you because what's first of all, distinguished history and I love your mindset and just solving problems, big problems. All great. I want to ask you something about the industry and where you guys were in 2012 alright when you started the company, you were literally in what I call the before Cloud phase. Cause it was before Cloud companies and then during Cloud companies and then after Cloud, you know, Amazon clearly took advantage of that for a lot of startups. So right around 2012 through 2016, I'd call that the Amazon is growing up years. How did the Cloud impact your thinking around the product and how you guys were executing because you were right on that wave. You were probably in the sweet spot of your development. >> Yeah. >> Pre business planning. You were in the pre-business planning mode, incomes, Amazon. I'm sure you're probably using Amazon cause your starters and all start up sort of use Amazon at first, but I just think about, do we all have found premise with a data center? How did that impact you guys? And how does that change today? >> Certainly. Yeah it's been fascinating to see how the world is evolving how enterprises have also really evolved in depth, thinking on how they leverage the cloud infrastructure now. In the Cloud, there is the compute and storage infrastructure. And then you have a Cloud Data Warehouse, the analytics stack in the Cloud. That's becoming more popular now with a company like Google, having BigQuery and then Snowflake really amazing concepts and things like that. So when we started, we looked at where our customers are , where is their data. And what kind of infrastructure is available to us at the time there wasn't enough compute to drive the search engine that we wanted to build. There were also not any significant Cloud Data Warehousing at the time, but our engineering team our co-founders, they came from companies like Google, where building a Cloud based architecture and elastic architecture, service oriented architecture is in their DNA. So we architected the product to run on infrastructure that is very elastic that can be run practically anywhere. But our initial customers and applies the Global 2000. They had their data on-prem. So we had started more with on-prem as a go-to-market strategy. and then about four and a half years ago, once cloud infrastructure I'm talking about the compute infrastructure started to become more mature, we certified our software, to run on all three clouds So today we have more than 75 to 80% of our customers already running our software in the Cloud. And as now, because we connect to our primary data sources, our Cloud Data Warehouses, Cloud Data Lakes. Now with Snowflake and BigQuery and Synapse and Redshift, we have enough of our customers who have deployed Cloud Data Warehouses. So we are also able to directly integrate with them. And that's why we launched our own hosted SaaS Offering about a month ago. So I would say our journey in this area has been sort of similar to companies like Splunk or Elastic, which started with a software model initially deployed more on-prem, but then evolved with the customers to the Cloud. So we have a lot of focus and momentum and lot of our customers, as they're moving their data to the Cloud, they're asking us as well to be in the Cloud and provide a hosted offering. And that is what we have built for the last one year. And we launched it a month ago. >> It's nice to be on the right side of history. I got to say, when you're on the way to be there. And that also makes integrations easy too. I love the Cloud play. Let's get to the final segment here. I want to get your thoughts on your customers, your advice. There's a huge untapped opportunity for companies when it comes to data, a lot of them are realizing that the pandemic is highlighting a lot of areas where they have to go faster and then to go to Cloud, they're going to build modern apps more data's coming in than ever before. Where are these untapped opportunities for customers to take advantage of the data? And what's your opinion on where they should look and what they should do? >> Yeah, I really think that the pandemics has shown for the first, the value of data to society at large, there is probably more than a billion people in the world that have seen a chart for the first time in their life. Everybody is being... and COVID has done some magic. But everybody was looking at charts of infection and so on and so forth. So there is a lot more broad awareness of what data can do in improving our society at large for the businesses of course, in the last six, seven months, you heard it enough from lot of leaders that digital transformation is accelerating. Everybody is realizing that the way to interact in the world is becoming more and more digital expecting your customers to come to your branch to do banking is not really an option. And people are also seeing how all the SaaS companies and SaaS businesses, digital businesses, they have really taken off. So if a company like Zoom can suddenly have a a hundred, $150 billion valuation, because you are able to do everything remote, all the enterprises are looking to really touch their customers and partners in a lot more digital way than they could do before. And definitely COVID has also really created this almost, you know, pool buckets of organization. There is lot of companies that have tremendously benefited from it. And there a lot of companies that have been poorly affected, really in a difficult place. And I think both of them for the first category, they are looking at how do I maintain this revenue even after COVID, because one of this thing, you know, hopefully early next year we have a vaccine and things can start to look better again sometime next year. But we have learned so much. We have attracted so many new customers, how do we retain and grow them further? And that means I need to invest more and more in my technology. Now, companies that are not doing well, they really want to figure out how to become more operationally efficient. And they are really under pressure to get more value from there and both categories, improving your revenue, retaining customers. You need to understand the customer behavior. You need to understand which products they are buying at a fine grain level, not with the law of averages, not by looking at a dashboard and saying our average customer likes this kind of product. That one doesn't really work. You have to offer people personalized services and that personalization is just not possible at scale, without really using data on the front lines. You can't have just manager sitting in their office, looking at dashboards and charts and saying these are the kinds of campaigns I need to run because my average customer seems to like these kinds of offers. I need to really empower my sales people, my individual frontline workers, who are interfacing with the customer to be able to make customized offers of services and products to them. And that is possible on the data. So we see a really, a lot more focus in getting value from data, delivering value quickly and digital transformation broadly but definitely leveraging data in businesses. There is tremendous acceleration that is happening and, you know, next five years, it's all going to be about being able to monetize data on the front lines when you are interfacing with your customers and partners >> Ajeet, that's great insight. And I really appreciate what you're saying. And you know, I wrote a blog post in 2007. I said, data will be the new development kit. Back then we used to call development kits, software user development. >> John, you are the real visionary. It took me until 2012 to be able to do this. >> Well, it wasn't clear, but you saw other data was going to have to be programmed be part of the programming. And I think, what you're getting at here is so profound because we're living 2020 people can see the value of data at the right time. It changes the conversations, it changes what's going on in the real time communications of our world with real-time access to information, whether that's machine to machine or machine to human, having data in the right place, changes the context. >> Yap. >> And that is a true, not a tech thing, that's just life, right? I think this year, I think we're going to look back and say, this was the year that everyone realized that real time communications, real-time society needs real time data. And I think it's going to be more important than ever. So it's a really big problem and important one. And thank you for sharing that. >> Yeah. And actually you bring up a very good point programming, developing big data. Data as a development kit. We are also going to announce a new product at Beyond, which will be about bringing ThoughtSpot everywhere, where a lot of business users are in their business applications. And by using ThoughtSpot product, using our full experience, they can obviously do enterprise wide analytics and look at all the data. But if they're looking for insights and nuggets, and they want to ask questions in their business workflows. We are also launching a product capability that will allow software developers to inject data in their business applications and enable and empower their own business users to be able to ask any questions that they might have without having to go to yet another BI product. >> It's data as code. I mean, you almost think about like software metaphors, where's the compiler? Where's the source code? Where's the data code? You start to get into this new mindset of thinking about data as code, because you got to have data about the data. Is it clean data, dirty data? Is it real time? Is it useful? There's a lot of intelligence needed to manage this. This is like a pretty big deal. And it's fairly new in the sense in the science side. Yeah, machine learning has been around for a while and you know, there's tracks for that. But thinking of this way as an operating system mindset, it's not just being a data geek. You know what I'm saying? So I think you're on the right track Ajeet. I really appreciate your thoughts here. Thank you. >> Thank you John. >> Okay. This is a cube conversation. Unpacking the data. The data is the future. We're living in a real-time world and in real-time data can change the outcomes of all kinds of contexts. And with truth, you need data and Ajeet Singh co-founder executive chairman of ThoughtSpot shares his thoughts here in theCUBE. I'm John furrier. Thanks for watching. (soft upbeat music)

Published Date : Nov 23 2020

SUMMARY :

leaders all around the world. and get the most important stories. Pleasure to be here. And as the co-founder, And at the same time, we saw and all the baggage associated with it. and the problem we are solving And the beautiful thing is you and a push model that we So I have to ask you And that is one to is what I heard you say. and relevance for the user. about that time to your point. And a lot of stuff that we are doing, Just don't make the same mistake twice. gets it right the first time. about how fast you penetrate but I got to ask you How did that impact you guys? and applies the Global 2000. and then to go to Cloud, And that is possible on the data. And you know, I wrote a blog post in 2007. to be able to do this. data in the right place, And I think it's going to and look at all the data. And it's fairly new in the And with truth, you need data

ENTITIES

Entity	Category	Confidence
2012	DATE	0.99+
Walmart	ORGANIZATION	0.99+
2007	DATE	0.99+
John Furrier	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
10	QUANTITY	0.99+
November 2020	DATE	0.99+
Google	ORGANIZATION	0.99+
December	DATE	0.99+
Amit Prakash	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
December 9th	DATE	0.99+
two goals	QUANTITY	0.99+
2016	DATE	0.99+
Java	TITLE	0.99+
both categories	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
first time	QUANTITY	0.99+
three years	QUANTITY	0.99+
next year	DATE	0.99+
first category	QUANTITY	0.99+
10th	DATE	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.99+
Ajeet Singh	PERSON	0.99+
One	QUANTITY	0.99+
Boston	LOCATION	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
ThoughtSpot	ORGANIZATION	0.99+
early next year	DATE	0.99+
a month ago	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
20 lines	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
more than a billion people	QUANTITY	0.98+
one and a half three years	QUANTITY	0.98+
one thing	QUANTITY	0.98+
Bing	ORGANIZATION	0.97+
Zoom	ORGANIZATION	0.97+
pandemics	EVENT	0.97+
JavaScript	TITLE	0.97+
COVID	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Cloud Data Warehouse	TITLE	0.97+
CUBE	ORGANIZATION	0.97+
2%	QUANTITY	0.96+

Benoit & Christian Live

>>Okay, We're now going into the technical deep dive. We're gonna geek out here a little bit. Ben Wa Dodgeville is here. He's co founder of Snowflake and president of products. And also joining us is Christian Kleinerman. Who's the senior vice president of products. Gentlemen, welcome. Good to see you. >>Yeah, you that >>get this year, they Thanks for having us. >>Very welcome. So it been well, we've heard a lot this morning about the data cloud, and it's becoming my view anyway, the linchpin of your strategy. I'm interested in what technical decisions you made early on. That that led you to this point and even enabled the data cloud. >>Yes. So? So I would say that that a crowd was built in tow in three phases. Really? The initial phase, as you call it, was it was really about 20 minutes. One regions Teoh, Data Cloud and and that region. What was important is to make that region infinity, infinity scalable, right. And that's our architectural, which we call the beauty cross to share the architectural er so that you can plug in as many were clues in that region as a Z without any limits. The limit is really the underlying prop Provide the, you know, resource is which you know, Cal provide the region as a really no limits. So So that z you know, region architecture, I think, was really the building block of the snowflake. That a cloud. But it really didn't stop there. The second aspect Waas Well, it was really data sharing. How you know munity internets within the region, how to share data between 10 and off that region between different customers on that was also enabled by architectures Because we discover, you know, compute and storage so compute You know clusters can access any storage within the region. Eso that's based off the data cloud and then really faced three Which is critical is the expansion the global expansion how we made you know, our cloud domestic layers so that we could talk You know the snowflake vision on different clouds on DNA Now we are running in three cloud on top of three cloud providers. We started with the ws and US West. We moved to assure and then uh, Google g c p On how this this crowd region way started with one crowd region as I said in the W S U S West, and then we create we created, you know, many you know, different regions. We have 22 regions today, all over the world and all over the different in the cloud providers. And what's more important is that these regions are not isolated. You know, Snowflake is one single, you know, system for the world where we created this global data mesh which connects every region such that not only there's no flex system as a whole can can be aware of for these regions, But customers can replicate data across regions on and, you know, share. There are, you know, across the planet if need be. So So this is one single, you know, really? I call it the World Wide Web. Off data that, that's, you know, is this vision of the data cloud. And it really started with this building block, which is a cloud region. >>Thank you for that. Ben White Christian. You and I have talked about this. I mean, that notion of a stripping away the complexity and that's kind of what the data cloud does. But if you think about data architectures, historically they really had no domain knowledge. They've really been focused on the technology toe ingest and analyze and prepare And then, you know, push data out to the business and you're really flipping that model, allowing the sort of domain leaders to be first class citizens if you will, uh, because they're the ones that creating data value, and they're worrying less about infrastructure. But I wonder, do you feel like customers air ready for that change? >>I I love the observation. They've that, uh, so much energy goes in in in enterprises, in organizations today, just dealing with infrastructure and dealing with pipes and plumbing and things like that and something that was insightful from from Ben Juan and and our founders from from Day one WAAS. This is a managed service. We want our customers to focus on the data, getting the insights, getting the decisions in time, not just managing pipes and plumbing and patches and upgrades, and and the the other piece that it's it's it's an interesting reality is that there is this belief that the cloud is simplifying this, and all of a sudden there's no problem but actually understanding each of the public cloud providers is a large undertaking, right? Each of them have 100 plus services, uh, sending upgrades and updates on a constant basis. And that just distracts from the time that it takes to go and say, Here's my data. Here's my data model. Here's how it make better decisions. So at the heart of everything we do is we wanna abstract the infrastructure. We don't wanna abstract the nuance of each of the cloud providers. And as you said, have companies focus on This is the domain expertise or the knowledge for my industry. Are all companies ready for it? I think it's a It's a mixed bag. We we talk to customers on a regular basis every way, every week, every day, and some of them are full on. They've sort of burned the bridges and, like I'm going to the cloud, I'm going to embrace a new model. Some others. You can see the complete like, uh, shock and all expressions like What do you mean? I don't have all these knobs. 2 to 3 can turn. Uh, but I think the future is very clear on how do we get companies to be more competitive through data? >>Well, Ben Ben. Well, it's interesting that Christian mentioned to manage service and that used to be in a hosting. Guys run around the lab lab coats and plugging things in. And of course, you're looking at this differently. It's high degrees of automation. But, you know, one of those areas is workload management. And I wonder how you think about workload management and how that changes with the data cloud. >>Yeah, this is a great question. Actually, Workload management used to be a nightmare. You know, traditional systems on it was a nightmare for the B s and they had to spend most a lot of their time, you know, just managing workloads. And why is that is because all these workloads are running on the single, you know, system and a single cluster The compete for resources. So managing workload that always explain it as explain Tetris, right? You had the first to know when to run. This work will make sure that too big workers are not overlapping. You know, maybe it really is pushed at night, you know, And And you have this 90 window which is not, you know, efficient. Of course, for you a TL because you have delays because of that. But but you have no choice, right? You have a speaks and more for resource is and you have to get the best out of this speaks resource is. And and for sure you don't want to eat here with her to impact your dash boarding workload or your reports, you know, impact and with data science and and And this became a true nine man because because everyone wants to be that a driven meaning that all the entire company wants to run new workers on on this system. And these systems are completely overwhelmed. So so, well below management was, and I may have before Snowflake and Snowflake made it really >>easy. The >>reason is it's no flag. We leverage the crowds who dedicates, you know, compute resources to each work. It's in the snowflake terminology. It's called a warehouse virtual warehouse, and each workload can run in its own virtual warehouse, and each virtual warehouse has its own dedicated competition resources. It's on, you know, I opened with and you can really control how much resources which workload gas by sizing this warehouses. You know, I just think the compute resources that they can use When the workload, you know, starts to execute automatically. The warehouse, the compute resources are turned off, but turned on by snowflake is for resuming a warehouse and you can dynamically resized this warehouse. It can be done by the system automatically. You know if if the conference see of the workload increases or it can be done manually by the administrator or, you know, just suggesting, you know, uh, compute power. You know, for each workload and and the best off that model is not only it gives you a very fine grain. Control on resource is that this work can get Not only workloads are not competing and not impacting it in any other workload. But because of that model, you can hand as many workload as you want. And that's really critical because, as I said, you know, everyone in the organization wants to use data to make decisions, So you have more and more work roads running. And then the Patriots game, you know, would have been impossible in in a in a centralized one single computer, cross the system On the flip side. Oh, is that you have to have a zone administrator off the system. You have to to justify that. The workload is worth running for your organization, right? It's so easy in literally in seconds, you can stand up a new warehouse and and start to run your your crazy on that new compute cluster. And of course, you have to justify if the cost of that because there is a cost, right, snowflake charges by seconds off compute So that cost, you know, is it's justified and you have toe. You know, it's so easy now to hire new workflow than you do new things with snowflake that that that you have to to see, you know, and and look at the trade off the cost off course and managing costs. >>So, Christian been while I use the term nightmare, I'm thinking about previous days of workload management. I mean, I talked to a lot of customers that are trying to reduce the elapsed time of going from data insights, and their nightmare is they've got this complicated data lifecycle. Andi, I'm wondering how you guys think about that. That notion of compressing elapsed time toe data value from raw data to insights. >>Yeah, so? So we we obsess or we we think a lot about this time to insight from the moment that an event happens toe the point that it shows up in a dashboard or a report or some decision or action happens based on it. There are three parts that we think on. How do we reduce that life cycle? The first one which ties to our previous conversation is related toe. Where is their muscle memory on processes or ways of doing things that don't actually make us much sense? My favorite example is you say you ask any any organization. Do you run pipelines and ingestion and transformation at two and three in the morning? And the answer is, Oh yeah, we do that. And if you go in and say, Why do you do that? The answer is typically, well, that's when the resource is are available Back to Ben Wallace. Tetris, right? That's that's when it was possible. But then you ask, Would you really want to run it two and three in the morning? If if you could do it sooner, we could do it. Mawr in time, riel time with when the event happened. So first part of it is back to removing the constraints of the infrastructures. How about running transformations and their ingestion when the business best needs it? When it's the lowest time to inside the lowest latency, not one of technology lets you do it. So that's the the the easy one out the door. The second one is instead of just fully optimizing a process, where can you remove steps of the process? This is where all of our data sharing and the snowflake data marketplace come into place. How about if you need to go in and just data from a SAS application vendor or maybe from a commercial data provider and imagine the dream off? You wouldn't have to be running constant iterations and FTP s and cracking C S V files and things like that. What if it's always available in your environment, always up to date, And that, in our mind, is a lot more revolutionary, which is not? Let's take away a process of ingesting and copying data and optimize it. How about not copying in the first place? So that's back to number two on, then back to number three is is what we do day in and day out on making sure our platform delivers the best performance. Make it faster. The combination of those three things has led many of our customers, and and And you'll see it through many of the customer testimonials today that they get insights and decisions and actions way faster, in part by removing steps, in part by doing away with all habits and in part because we deliver exceptional performance. >>Thank you, Christian. Now, Ben Wa is you know, we're big proponents of this idea of the main driven design and data architecture. Er, you know, for example, customers building entire applications and what I like all data products or data services on their data platform. I wonder if you could talk about the types of applications and services that you're seeing >>built >>on top of snowflake. >>Yeah, and And I have to say that this is a critical aspect of snowflake is to create this platform and and really help application to be built on top of this platform. And the more application we have, the better the platform will be. It is like, you know, the the analogies with your iPhone. If your iPhone that no applications, you know it would be useless. It's it's an empty platforms. So So we are really encouraging. You know, applications to be belong to the top of snowflake and from there one actually many applications and many off our customers are building applications on snowflake. We estimated that's about 30% are running already applications on top off our platform. And the reason is is off course because it's it's so easy to get compute resources. There is no limit in scale in our viability, their ability. So all these characteristics are critical for for an application on DWI deliver that you know from day One Now we have improved, you know, our increased the scope off the platform by adding, you know, Java in competition and Snow Park, which which was announced today. That's also you know, it is an enabler. Eso in terms off type of application. It's really, you know, all over and and what I like actually needs to be surprised, right? I don't know what well being on top of snowflake and how it will be the world, but with that are sharing. Also, we are opening the door to a new type of applications which are deliver of the other marketplace. Uh, where, You know, one can get this application died inside the platform, right? The platform is distributing this application, and today there was a presentation on a Christian T notes about, >>you >>know, 20 finds, which, you know, is this machine learning, you know, which is providing toe. You know, any users off snowflake off the application and and machine learning, you know, to find, you know, and apply model on on your data and enrich your data. So data enrichment, I think, will be a huge aspect of snowflake and data enrichment with machine learning would be a big, you know, use case for these applications. Also, how to get there are, you know, inside the platform. You know, a lot of applications led him to do that. Eso machine learning. Uh, that engineering enrichments away. These are application that we run on the platform. >>Great. Hey, we just got a minute or so left in. Earlier today, we ran a video. We saw that you guys announced the startup competition, >>which >>is awesome. Ben, while you're a judge in this competition, what can you tell us about this >>Yeah, >>e you know, for me, we are still a startup. I didn't you know yet, you know, realize that we're not anymore. Startup. I really, you know, you really feel about you know, l things, you know, a new startups, you know, on that. That's very important for Snowflake. We have. We were started yesterday, and we want to have new startups. So So the ends, the idea of this program, the other aspect off that program is also toe help, you know, started to build on top of snowflake and to enrich. You know, this this pain, you know, rich ecosystem that snowflake is or the data cloud off that a cloud is And we want to, you know, add and boost. You know that that excitement for the platform, so So the ants, you know, it's a win win. It's a win, you know, for for new startups. And it's a win, ofcourse for us. Because it will make the platform even better. >>Yeah, And startups, or where innovation happens. So registrations open. I've heard, uh, several, uh, startups have have signed up. You goto snowflake dot com slash startup challenge, and you can learn mawr. That's exciting program. An initiative. So thank you for doing that on behalf of of startups out there and thanks. Ben Wa and Christian. Yeah, I really appreciate you guys coming on Great conversation. >>Thanks for David. >>You're welcome. And when we talk, Thio go to market >>pros. They >>always tell us that one of the key tenets is to stay close to the customer. Well, we want to find out how data helps us. To do that in our next segment. Brings in to chief revenue officers to give us their perspective on how data is helping their customers transform. Business is digitally. Let's watch.

Published Date : Nov 20 2020

SUMMARY :

Okay, We're now going into the technical deep dive. That that led you to this point and even enabled the data cloud. and then we create we created, you know, many you know, different regions. and prepare And then, you know, push data out to the business and you're really flipping that model, And as you said, have companies focus on This is the domain expertise But, you know, You know, maybe it really is pushed at night, you know, And And you have this 90 The done manually by the administrator or, you know, just suggesting, you know, I'm wondering how you guys think about that. And if you go in and say, Why do you do that? Er, you know, for example, customers building entire It is like, you know, the the analogies with your iPhone. the application and and machine learning, you know, to find, We saw that you guys announced the startup competition, is awesome. so So the ants, you know, it's a win win. I really appreciate you guys coming on Great conversation. And when we talk, Thio go to market Brings in to chief revenue

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Ben Wallace	PERSON	0.99+
Ben White	PERSON	0.99+
Ben Wa	PERSON	0.99+
three parts	QUANTITY	0.99+
Ben Ben	PERSON	0.99+
Each	QUANTITY	0.99+
Ben	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Ben Wa Dodgeville	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
Christian	PERSON	0.99+
Benoit	PERSON	0.99+
today	DATE	0.99+
Thio	PERSON	0.99+
yesterday	DATE	0.99+
first part	QUANTITY	0.99+
first	QUANTITY	0.99+
each	QUANTITY	0.99+
three things	QUANTITY	0.99+
22 regions	QUANTITY	0.98+
second aspect	QUANTITY	0.98+
Java	TITLE	0.98+
about 20 minutes	QUANTITY	0.98+
first one	QUANTITY	0.98+
10	QUANTITY	0.98+
each work	QUANTITY	0.98+
about 30%	QUANTITY	0.97+
Ben Juan	PERSON	0.97+
second one	QUANTITY	0.97+
nine man	QUANTITY	0.97+
one	QUANTITY	0.97+
90 window	QUANTITY	0.97+
single	QUANTITY	0.97+
each virtual warehouse	QUANTITY	0.96+
two	QUANTITY	0.96+
each workload	QUANTITY	0.96+
DWI	ORGANIZATION	0.95+
100 plus servi	QUANTITY	0.94+
20 finds	QUANTITY	0.94+
one single	QUANTITY	0.91+
3	QUANTITY	0.91+
three	DATE	0.91+
three phases	QUANTITY	0.91+
this morning	DATE	0.91+
Google	ORGANIZATION	0.89+
three	QUANTITY	0.89+
Tetris	TITLE	0.89+
Snow Park	TITLE	0.88+
US West	LOCATION	0.87+
Christian T	PERSON	0.87+
Patriots	ORGANIZATION	0.87+
this year	DATE	0.86+
single cluster	QUANTITY	0.84+
day One	QUANTITY	0.82+
two	DATE	0.8+
SAS	ORGANIZATION	0.79+
one single computer	QUANTITY	0.78+
Snowflake	TITLE	0.78+
one crowd region	QUANTITY	0.76+
three cloud providers	QUANTITY	0.76+
W S U S West	LOCATION	0.74+
One regions	QUANTITY	0.73+
Christian	ORGANIZATION	0.73+
Day one	QUANTITY	0.71+
Earlier today	DATE	0.68+
ws	ORGANIZATION	0.61+
number three	QUANTITY	0.58+
g c p	TITLE	0.57+
2	QUANTITY	0.53+
Snowflake	EVENT	0.45+
Tetris	ORGANIZATION	0.35+

Breaking Analysis: How Snowflake Plans to Change a Flawed Data Warehouse Model

>> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE in ETR. This is Breaking Analysis with Dave Vellante. >> Snowflake is not going to grow into its valuation by stealing the croissant from the breakfast table of the on-prem data warehouse vendors. Look, even if snowflake got 100% of the data warehouse business, it wouldn't come close to justifying its market cap. Rather Snowflake has to create an entirely new market based on completely changing the way organizations think about monetizing data. Every organization I talk to says it wants to be, or many say they already are data-driven. why wouldn't you aspire to that goal? There's probably nothing more strategic than leveraging data to power your digital business and creating competitive advantage. But many businesses are failing, or I predict, will fail to create a true data-driven culture because they're relying on a flawed architectural model formed by decades of building centralized data platforms. Welcome everyone to this week's Wikibon Cube Insights powered by ETR. In this Breaking Analysis, I want to share some new thoughts and fresh ETR data on how organizations can transform their businesses through data by reinventing their data architectures. And I want to share our thoughts on why we think Snowflake is currently in a very strong position to lead this effort. Now, on November 17th, theCUBE is hosting the Snowflake Data Cloud Summit. Snowflake's ascendancy and its blockbuster IPO has been widely covered by us and many others. Now, since Snowflake went public, we've been inundated with outreach from investors, customers, and competitors that wanted to either better understand the opportunities or explain why their approach is better or different. And in this segment, ahead of Snowflake's big event, we want to share some of what we learned and how we see it. Now, theCUBE is getting paid to host this event, so I need you to know that, and you draw your own conclusions from my remarks. But neither Snowflake nor any other sponsor of theCUBE or client of SiliconANGLE Media has editorial influence over Breaking Analysis. The opinions here are mine, and I would encourage you to read my ethics statement in this regard. I want to talk about the failed data model. The problem is complex, I'm not debating that. Organizations have to integrate data and platforms with existing operational systems, many of which were developed decades ago. And as a culture and a set of processes that have been built around these systems, and they've been hardened over the years. This chart here tries to depict the progression of the monolithic data source, which, for me, began in the 1980s when Decision Support Systems or DSS promised to solve our data problems. The data warehouse became very popular and data marts sprung up all over the place. This created more proprietary stovepipes with data locked inside. The Enron collapse led to Sarbanes-Oxley. Now, this tightened up reporting. The requirements associated with that, it breathed new life into the data warehouse model. But it remained expensive and cumbersome, I've talked about that a lot, like a snake swallowing a basketball. The 2010s ushered in the big data movement, and Data Lakes emerged. With a dupe, we saw the idea of no schema online, where you put structured and unstructured data into a repository, and figure it all out on the read. What emerged was a fairly complex data pipeline that involved ingesting, cleaning, processing, analyzing, preparing, and ultimately serving data to the lines of business. And this is where we are today with very hyper specialized roles around data engineering, data quality, data science. There's lots of batch of processing going on, and Spark has emerged to improve the complexity associated with MapReduce, and it definitely helped improve the situation. We're also seeing attempts to blend in real time stream processing with the emergence of tools like Kafka and others. But I'll argue that in a strange way, these innovations actually compound the problem. And I want to discuss that because what they do is they heighten the need for more specialization, more fragmentation, and more stovepipes within the data life cycle. Now, in reality, and it pains me to say this, it's the outcome of the big data movement, as we sit here in 2020, that we've created thousands of complicated science projects that have once again failed to live up to the promise of rapid cost-effective time to insights. So, what will the 2020s bring? What's the next silver bullet? You hear terms like the lakehouse, which Databricks is trying to popularize. And I'm going to talk today about data mesh. These are other efforts they look to modernize datalakes and sometimes merge the best of data warehouse and second-generation systems into a new paradigm, that might unify batch and stream frameworks. And this definitely addresses some of the gaps, but in our view, still suffers from some of the underlying problems of previous generation data architectures. In other words, if the next gen data architecture is incremental, centralized, rigid, and primarily focuses on making the technology to get data in and out of the pipeline work, we predict it's going to fail to live up to expectations again. Rather, what we're envisioning is an architecture based on the principles of distributed data, where domain knowledge is the primary target citizen, and data is not seen as a by-product, i.e, the exhaust of an operational system, but rather as a service that can be delivered in multiple forms and use cases across an ecosystem. This is why we often say the data is not the new oil. We don't like that phrase. A specific gallon of oil can either fuel my home or can lubricate my car engine, but it can't do both. Data does not follow the same laws of scarcity like natural resources. Again, what we're envisioning is a rethinking of the data pipeline and the associated cultures to put data needs of the domain owner at the core and provide automated, governed, and secure access to data as a service at scale. Now, how is this different? Let's take a look and unpack the data pipeline today and look deeper into the situation. You all know this picture that I'm showing. There's nothing really new here. The data comes from inside and outside the enterprise. It gets processed, cleanse or augmented so that it can be trusted and made useful. Nobody wants to use data that they can't trust. And then we can add machine intelligence and do more analysis, and finally deliver the data so that domain specific consumers can essentially build data products and services or reports and dashboards or content services, for instance, an insurance policy, a financial product, a loan, that these are packaged and made available for someone to make decisions on or to make a purchase. And all the metadata associated with this data is packaged along with the dataset. Now, we've broken down these steps into atomic components over time so we can optimize on each and make them as efficient as possible. And down below, you have these happy stick figures. Sometimes they're happy. But they're highly specialized individuals and they each do their job and they do it well to make sure that the data gets in, it gets processed and delivered in a timely manner. Now, while these individual pieces seemingly are autonomous and can be optimized and scaled, they're all encompassed within the centralized big data platform. And it's generally accepted that this platform is domain agnostic. Meaning the platform is the data owner, not the domain specific experts. Now there are a number of problems with this model. The first, while it's fine for organizations with smaller number of domains, organizations with a large number of data sources and complex domain structures, they struggle to create a common data parlance, for example, in a data culture. Another problem is that, as the number of data sources grows, organizing and harmonizing them in a centralized platform becomes increasingly difficult, because the context of the domain and the line of business gets lost. Moreover, as ecosystems grow and you add more data, the processes associated with the centralized platform tend to get further genericized. They again lose that domain specific context. Wait (chuckling), there are more problems. Now, while in theory organizations are optimizing on the piece parts of the pipeline, the reality is, as the domain requires a change, for example, a new data source or an ecosystem partnership requires a change in access or processes that can benefit a domain consumer, the reality is the change is subservient to the dependencies and the need to synchronize across these discrete parts of the pipeline or actually, orthogonal to each of those parts. In other words, in actuality, the monolithic data platform itself remains the most granular part of the system. Now, when I complain about this faulty structure, some folks tell me this problem has been solved. That there are services that allow new data sources to really easily be added. A good example of this is Databricks Ingest, which is, it's an auto loader. And what it does is it simplifies the ingestion into the company's Delta Lake offering. And rather than centralizing in a data warehouse, which struggles to efficiently allow things like Machine Learning frameworks to be incorporated, this feature allows you to put all the data into a centralized datalake. More so the argument goes, that the problem that I see with this, is while the approach does definitely minimizes the complexities of adding new data sources, it still relies on this linear end-to-end process that slows down the introduction of data sources from the domain consumer beside of the pipeline. In other words, the domain experts still has to elbow her way into the front of the line or the pipeline, in this case, to get stuff done. And finally, the way we are organizing teams is a point of contention, and I believe is going to continue to cause problems down the road. Specifically, we've again, we've optimized on technology expertise, where for example, data engineers, well, really good at what they do, they're often removed from the operations of the business. Essentially, we created more silos and organized around technical expertise versus domain knowledge. As an example, a data team has to work with data that is delivered with very little domain specificity, and serves a variety of highly specialized consumption use cases. All right. I want to step back for a minute and talk about some of the problems that people bring up with Snowflake and then I'll relate it back to the basic premise here. As I said earlier, we've been hammered by dozens and dozens of data points, opinions, criticisms of Snowflake. And I'll share a few here. But I'll post a deeper technical analysis from a software engineer that I found to be fairly balanced. There's five Snowflake criticisms that I'll highlight. And there are many more, but here are some that I want to call out. Price transparency. I've had more than a few customers telling me they chose an alternative database because of the unpredictable nature of Snowflake's pricing model. Snowflake, as you probably know, prices based on consumption, just like AWS and other cloud providers. So just like AWS, for example, the bill at the end of the month is sometimes unpredictable. Is this a problem? Yes. But like AWS, I would say, "Kill me with that problem." Look, if users are creating value by using Snowflake, then that's good for the business. But clearly this is a sore point for some users, especially for procurement and finance, which don't like unpredictability. And Snowflake needs to do a better job communicating and managing this issue with tooling that can predict and help better manage costs. Next, workload manage or lack thereof. Look, if you want to isolate higher performance workloads with Snowflake, you just spin up a separate virtual warehouse. It's kind of a brute force approach. It works generally, but it will add expense. I'm kind of reminded of Pure Storage and its approach to storage management. The engineers at Pure, they always design for simplicity, and this is the approach that Snowflake is taking. Usually, Pure and Snowflake, as I have discussed in a moment, is Pure's ascendancy was really based largely on stealing share from Legacy EMC systems. Snowflake, in my view, has a much, much larger incremental market opportunity. Next is caching architecture. You hear this a lot. At the end of the day, Snowflake is based on a caching architecture. And a caching architecture has to be working for some time to optimize performance. Caches work well when the size of the working set is small. Caches generally don't work well when the working set is very, very large. In general, transactional databases have pretty small datasets. And in general, analytics datasets are potentially much larger. Is it Snowflake in the analytics business? Yes. But the good thing that Snowflake has done is they've enabled data sharing, and it's caching architecture serves its customers well because it allows domain experts, you're going to hear this a lot from me today, to isolate and analyze problems or go after opportunities based on tactical needs. That said, very big queries across whole datasets or badly written queries that scan the entire database are not the sweet spot for Snowflake. Another good example would be if you're doing a large audit and you need to analyze a huge, huge dataset. Snowflake's probably not the best solution. Complex joins, you hear this a lot. The working set of complex joins, by definition, are larger. So, see my previous explanation. Read only. Snowflake is pretty much optimized for read only data. Maybe stateless data is a better way of thinking about this. Heavily right intensive workloads are not the wheelhouse of Snowflake. So where this is maybe an issue is real-time decision-making and AI influencing. A number of times, Snowflake, I've talked about this, they might be able to develop products or acquire technology to address this opportunity. Now, I want to explain. These issues would be problematic if Snowflake were just a data warehouse vendor. If that were the case, this company, in my opinion, would hit a wall just like the NPP vendors that proceeded them by building a better mouse trap for certain use cases hit a wall. Rather, my promise in this episode is that the future of data architectures will be really to move away from large centralized warehouses or datalake models to a highly distributed data sharing system that puts power in the hands of domain experts at the line of business. Snowflake is less computationally efficient and less optimized for classic data warehouse work. But it's designed to serve the domain user much more effectively in our view. We believe that Snowflake is optimizing for business effectiveness, essentially. And as I said before, the company can probably do a better job at keeping passionate end users from breaking the bank. But as long as these end users are making money for their companies, I don't think this is going to be a problem. Let's look at the attributes of what we're proposing around this new architecture. We believe we'll see the emergence of a total flip of the centralized and monolithic big data systems that we've known for decades. In this architecture, data is owned by domain-specific business leaders, not technologists. Today, it's not much different in most organizations than it was 20 years ago. If I want to create something of value that requires data, I need to cajole, beg or bribe the technology and the data team to accommodate. The data consumers are subservient to the data pipeline. Whereas in the future, we see the pipeline as a second class citizen, with a domain expert is elevated. In other words, getting the technology and the components of the pipeline to be more efficient is not the key outcome. Rather, the time it takes to envision, create, and monetize a data service is the primary measure. The data teams are cross-functional and live inside the domain versus today's structure where the data team is largely disconnected from the domain consumer. Data in this model, as I said, is not the exhaust coming out of an operational system or an external source that is treated as generic and stuffed into a big data platform. Rather, it's a key ingredient of a service that is domain-driven and monetizable. And the target system is not a warehouse or a lake. It's a collection of connected domain-specific datasets that live in a global mesh. What is a distributed global data mesh? A data mesh is a decentralized architecture that is domain aware. The datasets in the system are purposely designed to support a data service or data product, if you prefer. The ownership of the data resides with the domain experts because they have the most detailed knowledge of the data requirement and its end use. Data in this global mesh is governed and secured, and every user in the mesh can have access to any dataset as long as it's governed according to the edicts of the organization. Now, in this model, the domain expert has access to a self-service and obstructed infrastructure layer that is supported by a cross-functional technology team. Again, the primary measure of success is the time it takes to conceive and deliver a data service that could be monetized. Now, by monetize, we mean a data product or data service that it either cuts cost, it drives revenue, it saves lives, whatever the mission is of the organization. The power of this model is it accelerates the creation of value by putting authority in the hands of those individuals who are closest to the customer and have the most intimate knowledge of how to monetize data. It reduces the diseconomies at scale of having a centralized or a monolithic data architecture. And it scales much better than legacy approaches because the atomic unit is a data domain, not a monolithic warehouse or a lake. Zhamak Dehghani is a software engineer who is attempting to popularize the concept of a global mesh. Her work is outstanding, and it's strengthened our belief that practitioners see this the same way that we do. And to paraphrase her view, "A domain centric system must be secure and governed with standard policies across domains." It has to be trusted. As I said, nobody's going to use data they don't trust. It's got to be discoverable via a data catalog with rich metadata. The data sets have to be self-describing and designed for self-service. Accessibility for all users is crucial as is interoperability, without which distributed systems, as we know, fail. So what does this all have to do with Snowflake? As I said, Snowflake is not just a data warehouse. In our view, it's always had the potential to be more. Our assessment is that attacking the data warehouse use cases, it gave Snowflake a straightforward easy-to-understand narrative that allowed it to get a foothold in the market. Data warehouses are notoriously expensive, cumbersome, and resource intensive, but they're a critical aspect to reporting and analytics. So it was logical for Snowflake to target on-premise legacy data warehouses and their smaller cousins, the datalakes, as early use cases. By putting forth and demonstrating a simple data warehouse alternative that can be spun up quickly, Snowflake was able to gain traction, demonstrate repeatability, and attract the capital necessary to scale to its vision. This chart shows the three layers of Snowflake's architecture that have been well-documented. The separation of compute and storage, and the outer layer of cloud services. But I want to call your attention to the bottom part of the chart, the so-called Cloud Agnostic Layer that Snowflake introduced in 2018. This layer is somewhat misunderstood. Not only did Snowflake make its Cloud-native database compatible to run on AWS than Azure in the 2020 GCP, what Snowflake has done is to obstruct cloud infrastructure complexity and create what it calls the data cloud. What's the data cloud? We don't believe the data cloud is just a marketing term that doesn't have any substance. Just as SAS is Simplified Application Software and iOS made it possible to eliminate the value drain associated with provisioning infrastructure, a data cloud, in concept, can simplify data access, and break down fragmentation and enable shared data across the globe. Snowflake, they have a first mover advantage in this space, and we see a number of fundamental aspects that comprise a data cloud. First, massive scale with virtually unlimited compute and storage resource that are enabled by the public cloud. We talk about this a lot. Second is a data or database architecture that's built to take advantage of native public cloud services. This is why Frank Slootman says, "We've burned the boats. We're not ever doing on-prem. We're all in on cloud and cloud native." Third is an obstruction layer that hides the complexity of infrastructure. and fourth is a governed and secured shared access system where any user in the system, if allowed, can get access to any data in the cloud. So a key enabler of the data cloud is this thing called the global data mesh. Now, earlier this year, Snowflake introduced its global data mesh. Over the course of its recent history, Snowflake has been building out its data cloud by creating data regions, strategically tapping key locations of AWS regions and then adding Azure and GCP. The complexity of the underlying cloud infrastructure has been stripped away to enable self-service, and any Snowflake user becomes part of this global mesh, independent of the cloud that they're on. Okay. So now, let's go back to what we were talking about earlier. Users in this mesh will be our domain owners. They're building monetizable services and products around data. They're most likely dealing with relatively small read only datasets. They can adjust data from any source very easily and quickly set up security and governance to enable data sharing across different parts of an organization, or, very importantly, an ecosystem. Access control and governance is automated. The data sets are addressable. The data owners have clearly defined missions and they own the data through the life cycle. Data that is specific and purposely shaped for their missions. Now, you're probably asking, "What happens to the technical team and the underlying infrastructure and the cluster it's in? How do I get the compute close to the data? And what about data sovereignty and the physical storage later, and the costs?" All these are good questions, and I'm not saying these are trivial. But the answer is these are implementation details that are pushed to a self-service layer managed by a group of engineers that serves the data owners. And as long as the domain expert/data owner is driving monetization, this piece of the puzzle becomes self-funding. As I said before, Snowflake has to help these users to optimize their spend with predictive tooling that aligns spend with value and shows ROI. While there may not be a strong motivation for Snowflake to do this, my belief is that they'd better get good at it or someone else will do it for them and steal their ideas. All right. Let me end with some ETR data to show you just how Snowflake is getting a foothold on the market. Followers of this program know that ETR uses a consistent methodology to go to its practitioner base, its buyer base each quarter and ask them a series of questions. They focus on the areas that the technology buyer is most familiar with, and they ask a series of questions to determine the spending momentum around a company within a specific domain. This chart shows one of my favorite examples. It shows data from the October ETR survey of 1,438 respondents. And it isolates on the data warehouse and database sector. I know I just got through telling you that the world is going to change and Snowflake's not a data warehouse vendor, but there's no construct today in the ETR dataset to cut a data cloud or globally distributed data mesh. So you're going to have to deal with this. What this chart shows is net score in the y-axis. That's a measure of spending velocity, and it's calculated by asking customers, "Are you spending more or less on a particular platform?" And then subtracting the lesses from the mores. It's more granular than that, but that's the basic concept. Now, on the x-axis is market share, which is ETR's measure of pervasiveness in the survey. You can see superimposed in the upper right-hand corner, a table that shows the net score and the shared N for each company. Now, shared N is the number of mentions in the dataset within, in this case, the data warehousing sector. Snowflake, once again, leads all players with a 75% net score. This is a very elevated number and is higher than that of all other players, including the big cloud companies. Now, we've been tracking this for a while, and Snowflake is holding firm on both dimensions. When Snowflake first hit the dataset, it was in the single digits along the horizontal axis and continues to creep to the right as it adds more customers. Now, here's another chart. I call it the wheel chart that breaks down the components of Snowflake's net score or spending momentum. The lime green is new adoption, the forest green is customers spending more than 5%, the gray is flat spend, the pink is declining by more than 5%, and the bright red is retiring the platform. So you can see the trend. It's all momentum for this company. Now, what Snowflake has done is they grabbed a hold of the market by simplifying data warehouse. But the strategic aspect of that is that it enables the data cloud leveraging the global mesh concept. And the company has introduced a data marketplace to facilitate data sharing across ecosystems. This is all about network effects. In the mid to late 1990s, as the internet was being built out, I worked at IDG with Bob Metcalfe, who was the publisher of InfoWorld. During that time, we'd go on speaking tours all over the world, and I would listen very carefully as he applied Metcalfe's law to the internet. Metcalfe's law states that the value of the network is proportional to the square of the number of connected nodes or users on that system. Said another way, while the cost of adding new nodes to a network scales linearly, the consequent value scores scales exponentially. Now, apply that to the data cloud. The marginal cost of adding a user is negligible, practically zero, but the value of being able to access any dataset in the cloud... Well, let me just say this. There's no limitation to the magnitude of the market. My prediction is that this idea of a global mesh will completely change the way leading companies structure their businesses and, particularly, their data architectures. It will be the technologists that serve domain specialists as it should be. Okay. Well, what do you think? DM me @dvellante or email me at david.vellante@siliconangle.com or comment on my LinkedIn? Remember, these episodes are all available as podcasts, so please subscribe wherever you listen. I publish weekly on wikibon.com and siliconangle.com, and don't forget to check out etr.plus for all the survey analysis. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching. Be well, and we'll see you next time. (upbeat music)

Published Date : Nov 14 2020

SUMMARY :

This is Breaking Analysis and the data team to accommodate.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Bob Metcalfe	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Metcalfe	PERSON	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
November 17th	DATE	0.99+
75%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
five	QUANTITY	0.99+
2020	DATE	0.99+
Snowflake	TITLE	0.99+
1,438 respondents	QUANTITY	0.99+
2018	DATE	0.99+
October	DATE	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
today	DATE	0.99+
more than 5%	QUANTITY	0.99+
theCUBE Studios	ORGANIZATION	0.99+
First	QUANTITY	0.99+
2020s	DATE	0.99+
Snowflake Data Cloud Summit	EVENT	0.99+
Second	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both dimensions	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
iOS	TITLE	0.99+
DSS	ORGANIZATION	0.99+
1980s	DATE	0.99+
each company	QUANTITY	0.99+
decades ago	DATE	0.98+
zero	QUANTITY	0.98+
first	QUANTITY	0.98+
2010s	DATE	0.98+
each quarter	QUANTITY	0.98+
Third	QUANTITY	0.98+
20 years ago	DATE	0.98+
Databricks	ORGANIZATION	0.98+
earlier this year	DATE	0.98+
both	QUANTITY	0.98+
Pure	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
IDG	ORGANIZATION	0.97+
Today	DATE	0.97+
each	QUANTITY	0.97+
Decision Support Systems	ORGANIZATION	0.96+
Boston	LOCATION	0.96+
single digits	QUANTITY	0.96+
siliconangle.com	OTHER	0.96+
one	QUANTITY	0.96+
Spark	TITLE	0.95+
Legacy EMC	ORGANIZATION	0.95+
Kafka	TITLE	0.94+
LinkedIn	ORGANIZATION	0.94+
Snowflake	EVENT	0.92+
first mover	QUANTITY	0.92+
Azure	TITLE	0.91+
InfoWorld	ORGANIZATION	0.91+
dozens and	QUANTITY	0.91+
mid to	DATE	0.91+

Frank Slootman Dave Vellante Cube Conversation

>>from the Cube Studios in Palo Alto in Boston, connecting with thought leaders all around >>the world. This is a cute conversation high, but this is Day Volonte. And as you know, we've been tracking the next generation of clouds. Sometimes we call it Cloud to two point. Frank's Lukman is here to really unpack this with me. Frank. Great to see you. Thanks for coming on. >>Yeah, you as well. They could see it >>s o obviously hot off your AIPO A lot of buzz around that. Uh, that's fine. We could we could talk about that, but I really want to talk about the future. What? Before we get off the I p o. That was something you told me when you're CEO service. Now you said, hey, we're priced to perfection, so it looks like snowflakes gonna be priced to perfection. It's a marathon, though. You You made that clear. I presume it's not any different here for you. Yeah, >>well, I think you know the service now. Journey was different in the sense that we were kind of under the underdogs, and people sort of discovered over the years the full potential of the company and I think there's stuff like they pretty much discovered a day. One. It's a little bit more, More sometimes it's nice to be an underdog. Were a bit of an over dog in this, uh, this particular scenario, but, you know, it is what it is, Andre. You know, it's all about execution delivering the results, delivering on our vision, Uh, you know, being great with our customers. And, uh, hopefully the chips will fall where they where they may. At that point, >>yeah, you're you're You're a poorly kept secret at this point, Frank. After a while, I wanted, you know, I've got some excerpts of your book that that I've been reading. And, of course, I've been following your career since the two thousands. You're off sailing. You mentioned in your book that you were kind of retired. You were done, and then you get sucked back in now. Why? I mean, are you in this for the sport? What's the story here? >>Uh, actually, that that's not a bad way of characterizing it. I think I am in that, uh, you know, for the sport, uh, you know the only way to become the best version of yourself is to be to be under the gun and, uh, you know, every single day. And that's that's certainly what we are. It sort of has its own rewards building great products, building great companies, regardless off you know what the spoils. Maybe it has its own rewards. And I It's hard for people like us to get off the field and, you know, hang it up. So here we are. >>You know, you're putting forth this vision now the data cloud, which obviously it's good marketing, but I'm really happy because I don't like the term Enterprise Data Warehouse. I don't think it reflects what you're trying to accomplish. E D. W. It's slow on Lee. A few people really know how to use it. The time value of data is gone by the time you know, your business is moving faster than the data in the D. W. And it really became a savior because of Sarbanes Oxley. That's really what it came a reporting mechanism. So I've never seen What you guys are doing is is e d w. So I want you to talk about the data cloud. I want to get into the to the vision a little bit and maybe challenge you on a couple things so our audience can better understand it. Yes. So >>the notion of a data cloud is is actually, uh, you know, type of cloud that we haven't had. I mean, data has been been fragmented and locked up in a million different places in different clouds. Different cloud regions, obviously on premise, um, And for data science teams, you know, they're trying thio drive analysis across datasets, which is incredibly hard, Which is why you know, a lot of this resorts to, you know, programming on bond things of that sort of. ITT's hardly scalable because the data is not optimized. The economics are not optimized. There's no governance model and so on. But a data cloud is actually the ability thio loosely couple and lightly Federated uh, data, regardless of where it is. So it doesn't have scale limitations or performance limitations. Uh, the way traditional data warehouses have had it. So we really have a fighting chance off really killing the silos and unlocking the bunkers and allowing the full promise of data sciences and ml On day I thio really happen. I mean, a lot of lot of the analysis that happens on data is on the single data set because it's just too damn hard, you know, to drive analysis across multiple data sets. And, you know, when we talk to our customers, they have very precise designs on what they're trying to do. They say, Look, we are trying to discover, you know, through through through deep learning You know what the patterns are that lead to transactions. You know, whether it's if you're streaming company. Maybe it's that you're signing up for a channel or you're buying a movie or whatever it is. What is the pattern you know, of data points that leads us to that desired outcome. Once you have a very accurate description of the data relationships, you know that results in that outcome, you can then search for it and scale it, you know, tens of million times over. That's what digital enterprises do, right? So in order to discover these patterns enriched the data to the point where the patterns become incredibly predictive. Uh, that's that's what snowflake is formed, right? But it requires a completely Federated Data mo because you're not gonna find a data pattern in the in the single data set per se right? So that's that's what it's all about. I mean, the outcomes of a data cloud are very, very closely related to the business outcomes that the user is seeking, right? It's not some infrastructure process. It has a very remote relationship with business outcome. This is very, very closely related. >>So it doesn't take a brain surgeon to look at the Trillion Years Club. And so I could see that I could see the big you know, trillion dollars apple $2 trillion market cap companies. They got data at the core, whereas most companies most incumbents. Yeah, it might be a bottling plant that the core, some manufacturing or some other processes they put, they put data around it in these silos. It seems like you're trying toe really? Bring that innovation and put data at the core. And you've got an architecture to do that. You talk about your multi cluster shared storage architecture. You mentioned you mentioned data sharing it. Will this, in your opinion, enable, for instance, incumbents to do what a lot of the startups were able to do with the cloud days? I mean they got access to data centers, which they they couldn't have before the cloud you're trying to do with something similar with data. >>Yeah, so So, you know, obviously there's no doubt that the cloud is a critical enabler. This wouldn't be happening. Uh, you know what? I was at the same time, the trails that have been blessed by the likes of Facebook and Google. Uh, e the reason those enterprises are so extraordinary valuable is is because of what they know. Uh, you know, through data and how they can monetize what they know through data. But that is now because that power is now becoming available, you know, to every single enterprise out there. Right, Because the data platform, the underlying cloud capabilities, we are now delivering that to anybody who wants it. Now, you still need to have strong date engineering data science capabilities. It's not like falling off a log, but fundamentally, those capabilities are now, you know, broadly accessible in the marketplace. >>So we're talking upfront about some of the differences between what you've done earlier in your career. Like I said, you're the worst kept secret, you know, Data domain. I would say it was sort of somewhat of a niche market. You you blew it up until it was very disruptive, but it was somewhat limited in what could be done. Uh, and and maybe some of that limitation, you know, wouldn't have occurred if you stay the price, uh, independent company service. Now you mop the table up because you really had no competition there, Not the case here. You you've got some of the biggest competitors in the world, so talk about that. And what gives you confidence that you can continue to dominate, >>But, you know, it's actually interesting that you bring up these companies. I mean, data. The man was a scenario where we were constrained on market and literally we were a data backup company. As you recall, we needed to move into backup software. Need to move the primary storage. While we knew it, we couldn't execute on it because it took tremendous resource is which, back in the day, it was much harder than one of this right now. So we ended up selling the company to E M. C and and now part of Dell. But way short, uh, we're left with some trauma from that experience, Uh, that, you know, why couldn't we, you know, execute on that transformation? So coming to service now, we were extremely. I'm certainly need personally, extremely attuned to the challenges that we have endured in our prior company. One of the reasons why you saw service now break out at scale at tremendous growth rights is because of what we have learned from the prior journey. We're not gonna ever get caught again in a situation where we could not sustain our markets and sustain our growth. So if service I was very much the execution model was very much a reaction to what we had encountered in the prior company. Now coming into snowflake totally different deal. Because not only is there's a large market, this is a developing market. I think you've pointed out in some of your broadcasting that this market is very much in flux on the reason is that you know, technology is now capable of doing things for for people and enterprises that they could never do before. So people are spending way mawr resource is than they ever thought possible on these new capability. So you can't think in terms of static markets and static data definitions, it means nothing. Okay, These things are so in transition right now, it's very difficult for people you know to to scope that the scale of this opportunity. >>Yeah. I wanna understand you're thinking around and, you know, I've written about the TAM, and can Snowflake grow into its valuation and the way I drew it, I said, Okay, you got data Lakes and you got Enterprise Data Warehouse. That's pretty well understood. But I called it data as a service to cover the closest analogy to your data cloud. And then even beyond that, when you start bringing in the edge and real time data, uh, talk about how you're thinking about that, Tam. And what what you have to do to participate. You have toe, you know, bring adjacent capabilities, ISAT this read data sharing that will get you there. In other words, you're not like a transaction system. You hear people talking about converge databases, you hear? Talk about real time inference at the edge that today anyway, isn't what snowflake is about. Does that vision of data sharing and the data cloud does that allow you to participate in that massive, multi $100 billion tam that that I laid out and probably others as well. >>Yeah, well, it is always difficult. Thio defined markets based on historical concept that probably not gonna apply whole lot for much longer. I mean, the way we think of it is that data is the beating heart of the digital enterprise on, uh, you know, digital enterprises today. What do you look at? People against the car door dash or so on. Um, they were built from the ground up to be digital on the prices and data Is the beating heart off their operation Data operations is their manufacturing, if you will, um, every other enterprise out there is is working very hard to become digital or part digital and is going to learn to develop data platforms like what we're talking about here to data Cloud Azaz. Well, as the expertise in terms of data engineering and data scientist to really fully become a digital enterprise, right. So, you know, we view data as driving operations off the digital enterprise. That's really what it iss right data, and it's completely data driven. And there's no people involved. People are developing and supporting the process. But in the execution, it is end to end. Data driven. Being that data is the is the signal that initiates the process is technol assess. Their there being a detective, and then they fully execute the entire machinery probe Problematic machinery, if you will, um, you know, of the processes that have been designed, for example, you know, I may fit a certain pattern. You know, that that leads to some transactional context. But I've not fully completed that pattern until I click on some Lincoln. And all of a sudden proof I have become, you know, a prime prospect system, the text that in the real time and then unleashes Oh, it's outreach and capabilities to get me to transact me. You and I are experiencing this every day. You know, when we're when we're online, you just may not fully re election. That's what's happening behind the scenes. That's really what this is all about. So and so to me, this is sort of the new online transaction processing is enter and, uh, you know, data digital. Uh, no process that is continually acquiring, analyzing and acting on data. >>Well, you've talked about the time time value of of data. It loses value over time. And to the extent that you can actually affect decisions, maybe before you lose the customer before you lose the patient even even more importantly or before you lose the battle. Uh, there's all kinds of, you know, mental models that you can apply this. So automation is a key part of that. And then again, I think a lot of people like you said, if you just try to look at historical markets, you can't really squint through those and apply them. You really have toe open up your mind and think about the new possibilities. And so I could see your your component of automation. I I see what's happening in the r P. A space and and I could see See these this massive opportunities Thio really change society, change business, your last thoughts. >>There's just there's just no scenario that I can envision where data is not completely core in central to a digital enterprise, period. >>Yeah, I think I really do think, Frank, your your your Your vision is misunderstood somewhat. I think people say Okay. Hey, we'll bet on salute men Scarpelli the team. That's great to do that. But I think this is gonna unfold in a way that people may be having predicted that maybe you guys, yourselves and your founders, you know, haven't have aren't able to predict as well. But you've got that good, strong architectural philosophy that you're pursuing and it just kind of feels right, doesn't it? >>You know, I mean, one of the 100 conversations and, uh, you know, things is the one of the reasons why we also wrote our book. You know, the rights of the data cloud is to convey to the marketplace that this is not an incremental evolution, that this is not sort of building on the past. There is a real step function here on the way to think about it is that typically enterprises and institutions will look at a platform like snowflakes from a workload context. In other words, I have this business. I have this workload. This is very much historically defined, by the way. And then they benchmark us, you know, against what they're what they're already doing on some legacy platform. And they decided, like, Yeah, this is a good fit. We're gonna put Snowflake here. Maybe there, but it's still very workload centric, which means that we are essentially perpetuating the mentality off the past. Right? We were doing it. Wanna work, load of the time We're creating the new silos and the new bunkers of data in the process. And we're really not approaching this with level of vision that the data science is really required to drive maximum benefit from data. So our arguments and this is this is not an easy arguments is to say, toc IOS on any other sea level person that wants to listen to that look, you know, just thinking about, you know, operational context and operational. Excellent. It's like we have toe have a platform that allows us unfettered access to the data that, you know, we may need to, you know, bring the analytical power to right. If you have to bring in political power to a diversity of data sets, how are we going to do that right? The data lives in, like, 500 different places. It's just not possible, right, other than with insane amounts of programming and complexity, and then we don't have the performance, and we don't have to economics, and we don't have the governance and so on. So you really want to set yourself up with a data cloud so that you can unleash your data science, uh, capabilities, your machine learning your deep learning capabilities, aan den, you really get the full throttle advantage. You know of what the technology can do if you're going to perpetuate the silo and bunkering of data by doing it won't work. Load of the time. You know, 5, 10 years from now, we're having the same conversation we've been having over the last 40 years, you know? >>Yeah. Operationalize ing your data is gonna require busting down those those silos, and it's gonna require something like the data cloud to really power that to the next decade and beyond. Frank's movement Thanks so much for coming in. The Cuban helping us do a preview here of what's to come. >>You bet, Dave. Thanks. >>All right. Thank you for watching. Everybody says Dave Volonte for the Cube will see you next time

Published Date : Oct 16 2020

SUMMARY :

And as you know, we've been tracking the next generation of clouds. Yeah, you as well. Before we get off the I p o. That was something you told me when you're CEO service. this particular scenario, but, you know, it is what it is, Andre. I wanted, you know, I've got some excerpts of your book that that I've been reading. uh, you know, for the sport, uh, you know the only way to become the best version of yourself is to it. The time value of data is gone by the time you know, your business is moving faster than the data is on the single data set because it's just too damn hard, you know, to drive analysis across And so I could see that I could see the big you know, trillion dollars apple Uh, you know, through data and how they can monetize what Uh, and and maybe some of that limitation, you know, wouldn't have occurred if you stay the price, Uh, that, you know, why couldn't we, you know, execute on and the data cloud does that allow you to participate in that massive, And all of a sudden proof I have become, you know, a prime prospect system, Uh, there's all kinds of, you know, mental models that you completely core in central to a digital enterprise, period. maybe you guys, yourselves and your founders, you know, haven't have aren't able to predict as well. You know, I mean, one of the 100 conversations and, uh, you know, things and it's gonna require something like the data cloud to really power that to the next Everybody says Dave Volonte for the Cube will see you next time

ENTITIES

Entity	Category	Confidence
Dave Volonte	PERSON	0.99+
Frank	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Scarpelli	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Palo Alto	LOCATION	0.99+
apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Lee	PERSON	0.99+
IOS	TITLE	0.99+
Andre	PERSON	0.99+
Boston	LOCATION	0.99+
Cube Studios	ORGANIZATION	0.99+
Trillion Years Club	ORGANIZATION	0.99+
two thousands	QUANTITY	0.99+
100 conversations	QUANTITY	0.99+
trillion dollars	QUANTITY	0.98+
today	DATE	0.98+
ITT	ORGANIZATION	0.98+
one	QUANTITY	0.98+
$2 trillion	QUANTITY	0.98+
One	QUANTITY	0.97+
a day	QUANTITY	0.97+
single	QUANTITY	0.97+
Cloud Azaz	ORGANIZATION	0.96+
next decade	DATE	0.96+
TAM	ORGANIZATION	0.96+
$100 billion	QUANTITY	0.96+
Enterprise Data Warehouse	ORGANIZATION	0.95+
Dave Vellante	PERSON	0.95+
500 different	QUANTITY	0.94+
two point	QUANTITY	0.93+
D. W.	LOCATION	0.92+
Sarbanes Oxley	PERSON	0.91+
5	QUANTITY	0.9+
Snowflake	TITLE	0.87+
Snowflake	ORGANIZATION	0.86+
single data set	QUANTITY	0.86+
tens of million times	QUANTITY	0.85+
10 years	QUANTITY	0.83+
E M. C	ORGANIZATION	0.83+
Lincoln	PERSON	0.82+
Day Volonte	PERSON	0.82+
Lukman	PERSON	0.82+
Cuban	PERSON	0.8+
last 40 years	DATE	0.77+
snowflakes	TITLE	0.75+
single enterprise	QUANTITY	0.64+
Tam	ORGANIZATION	0.63+
Thio	PERSON	0.62+
million	QUANTITY	0.53+
single day	QUANTITY	0.49+
Cube	PERSON	0.36+

Victoria Stasiewicz, Harley-Davidson Motor Company | IBM DataOps 2020

from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hi everybody this is Dave Volante and welcome to this special digital cube presentation sponsored by IBM we're going to focus in on data op data ops in action a lot of practitioners tell us that they really have challenges operationalizing in infusing AI into the data pipeline we're going to talk to some practitioners and really understand how they're solving this problem and really pleased to bring Victoria stayshia vich who's the Global Information Systems Manager for information management at harley-davidson Vik thanks for coming to the cube great to see you wish we were face to face but really appreciate your coming on in this manner that's okay that's why technology's great right so you you are steeped in a data role at harley-davidson can you describe a little bit about what you're doing and what that role is like definitely so obviously a manager of information management >> governance at harley-davidson and what my team is charged with is building out data governance at an enterprise level as well as supporting the AI and machine learning technologies within my function right so I have a portfolio that portfolio really includes DNA I and governance and also our master data and reference data and data quality function if you're familiar with the dama wheel of course what I can tell you is that my team did an excellent job within this last year in 2019 standing up the infrastructure so those technologies right specific to governance as well as their newer more modern warehouse on cloud technologies and cloud objects tour which also included Watson Studio and Watson Explorer so many of the IBM errs of the world might hear about obviously IBM ISEE or work on it directly we stood that up in the cloud as well as db2 warehouse and cloud like I said in cloud object store we spent about the first five months of last year standing that infrastructure up working on the workflow ensuring that access security management was all set up and can within the platform and what we did the last half of the year right was really start to collect that metadata as well as the data itself and bring the metadata into our metadata repository which is rx metadata base without a tie FCE and then also bring that into our db2 warehouse on cloud environment so we were able to start with what we would consider our dealer domain for harley-davidson and bring those dimensions within to db2 warehouse on cloud which was never done before a lot of the information that we were collecting and bringing together for the analytics team lived in disparate data sources throughout the enterprise so the goal right was to stop with redundant data across the enterprise eliminate some of those disparity to source data resources right and bring it into a centralized repository for reporting okay Wow we got a lot to unpack here Victoria so but let me start with sort of the macro picture I mean years ago you see the data was this thing that had to be managed and it still does but it was a cost was largely a liability you know governance was sort of front and center sometimes you know it was the tail that wagged the value dog and then the whole Big Data movement comes in and everybody wants to be data-driven and so you saw some pretty big changes in just the way in which people looked at data they wanted to you know mine that data and make it an asset versus just a straight liability so what what are the changes that you discerned in in data and in your organization over the last let's say half a decade we to tell you the truth we started looking at access management and the ability to allow some of our users to do some rapid prototyping that they could never do before so what more and more we're seeing as far as data citizens or data scientists right or even analysts throughout most enterprises is it well they want access to the information they want it now they want speed to insight at this moment using pretty much minimal Viable Product they may not need the entire data set and they don't want to have to go through leaps and bounds right to just get access to that information or to bring that information into necessarily a centralized location so while I talk about our db2 warehouse on cloud and that's an excellent example of one we actually need to model data we know that this is data that we trust right that's going to be called upon many many times from many many analysts right there's other information out there that people are collecting because there's so much big data right there's so many ways to enrich your data within your organization for your customer reporting the people are really trying to tap into those third-party datasets so what my team has done what we're seeing right change throughout the industry is that a lot of teams and a lot of enterprises are looking at s technologists how can we enable our scientists and our analysts right the ability to access data virtually so instead of repeating right recuperating redundant data sources we're actually ambling data virtualization at harley-davidson and we've been doing that first working with our db2 warehouse on cloud and connecting to some of our other trusted versions of data warehouses that we have throughout the enterprise that being our dealer warehouse as well to enable obviously analysts to do some quick reporting without having to bring all that data together that is a big change I see the fact that we were able to tackle that that's allowed technology to get back ahead because most backup Furnish say most organizations right have given IT the bad rap wrap up it takes too long to get what we need my technologists cannot give me my data at my fingertips in a timely manner to not allow for speed to insight and answers the business questions at point of time of delivery most and we've supplied data to our analysts right they're able to calculate aggregate brief the reporting metrics to get those answers back to the business but they're a week two weeks too late the information is no longer relevant so data virtualization through data Ops is one of the ways and we've been able to speed that up and act as a catalyst for data delivery but we've also done though and I see this quite a bit is well that's excellent we still need to start classifying our information and labeling that at the system level we've seen most most enterprises right I worked at Blue Cross as well with IBM tool had the same struggle they were trying to eliminate their technology debt reduce their spend reduce the time it takes for resources working on technologies to maintain technologies they want to reduce their their IT portfolio of assets and capabilities that they license today so what do they do to do that it's time to start taking a look at what systems should be classified as essential systems versus those systems that are disparate and could be eliminated and that starts with data governance right so okay so your your main focus is on governance and you talked about real people want answers now they don't want to have to wait they don't want to go big waterfall process so what was what would you say was sort of some of the top challenges in terms of just operationalizing your data pipelining getting to the point that you are today you know I have to be quite honest um standing up the governance framework the methodology behind it right to get it data owners data stewards at a catalog established that was not necessarily the heavy lifting the heavy lifting really came with I'm setting up a brand new infrastructure in the cloud for us to be quite honest um we with IBM partnered and said you know what we're going to the cloud and these tools had never been implemented in the cloud before we were kind of the first do it so some of the struggles that we aren't they or took on and we're actually um standing up the infrastructure security and access management network pipeline access right VPN issues things of that nature I would say is some of the initial roadblocks we went through but after we overcame those challenges with the help of IBM and the patience of both the Harley and IBM team it became quite easy to roll out these technologies to other users the nice thing is right we at harley-davidson have been taking the time to educate our users today up for example we had what we call the data bytes a Lunch and Learn and so in that Lunch and Learn what we did is we took our entire GIS team our global information services team which is all of IT through these new technologies it was a form of over 250 people with our CIO and CTO on and taking them through how do we use these tools what are the purpose of schools why do we need governance to maintain these pools why is metadata management important to the organization that piece of it seems to be much easier than just our initial scanning it up so it's good enough to start letting users in well sounds like you had real sponsorship from from leadership and input from leadership and they were kind of leaning into the whole process first of all is that true and how important is that for success oh it's essential we often said when we were first standing up the tools to be quite honest is our CIO really understand what it is that were for standing up as our CIO really understand governance because we didn't have the time to really get that face-to-face interaction with our leadership so I myself made it a mandate having done this previously at Blue Cross to get in front of my CIO and my CTO and educate them on what it is we are exactly standing up and once we did that it was very easy to get at an executive steering committee as well as an executive membership Council right I'm boarded with our governance council and now they're the champions of that it's never easy that was selling governance to leadership and the ROI is never easy because it's not something that you can easily calculate it's something that has to show its return on investment over time and that means that you're bringing dashboards you're educating your CIO and CTO and how you're bringing people together how groups are now talking about solutions and technologies in a domain like environment right where you have people from at an international level we have people from Asia from Europe from China that join calls every Thursday to talk about the data quality issue specific to dealer for example what systems were using what solutions on there are on the horizon to solve them so that now instead of having people from other countries that work for Harley as well as just even within the US right creating one-off solutions that are answering the same business questions using the same data but creating multiple solutions right to solve the same problem we're now bringing them together and we're solving together and we're prioritizing those as well so that return on investment necessarily down the line you can show that is you know what instead of this printing into five projects we've now turned this into one and instead of implementing four systems we've now implemented one and guess what we have the business rules and we have the classification I to this system so that you CIO or CTO right you now go in and reference this information a glossary a user interface something that a c-level can read interpret understand quickly write dissect the information for their own need without having to take the long lengthy time to talk to a technologist about what does this information mean and how do i how do I use it you know what's interesting is take away based on what you just said is you know harley-davidson is an iconic brand cool company with fuckin motorcycles right and but you came out of an insurance background which is a regulated industry where you know governance is sort of de rigueur right I mean it's it's a table steak so how are you able that arleigh to balance the sort of tension between governance and the sort of business flexibility so there's different there's different lovers I would call them right obviously within healthcare in insurance the importance becomes compliance and risk and regulatory right they're big pushes gosh I don't want to pay millions of dollars for fines start classifying this information enabling security reducing risk all that good stuff right for Harley Davidson it was much different it was more or less we have a mission right we want to invest in our technologies yet we want to save money how do we cut down the technologies that we have today reduce our technology spend yet and able our users have access to more information in a timely manner that's not an easy that's not an easy pass right um so what we did is I took that my married governance part-time model and our time model is specific worried they're gonna tolerate an application we're going to invest in an application we're gonna migrate an application or we're gonna eliminate that so I'm talking to my CIO said you know we can use governance the classifier system help act as a catalyst when we start to implement what it is we're doing with our technologies which technologies are we going to eliminate tomorrow we as IG cannot do that unless we discuss some sort of business impact unless you look at a system and say how many users are using us what reports are essential the business teams do they need this system is this something that's critical for users today to eat is this duplicate 'iv right we have many systems that are solving the same capability that is how I sold that off my CIO and it made it important to the rest of the organization they knew we had a mandate in front of us we had to reduce technology spend and that really for me made it quite easy and talking to other technologists as well as business users on why if governance is important why it's going to help harley-davidson and their mission to save money going forward I will tell you though that the businesses of biggest value right is the fact that they now owns the data they're more likely right to use your master data management systems like I said I'm the owner of our MDM services today as well as our customer knowledge center today they're more likely to access and reference those systems if they feel that they built the rule and they own the rules in those systems so that's another big value add to write as many business users will say ok you know you think I need access to this system I don't know I'm not sure I don't know what the data looks like within it is it easily accessible is it gonna give me the reporting metrics that I need that's where governance will help them for example like our state a scientist beam using a catalog right you can browse your metadata you can look at your server your database your tables your fields understand what those mean understand the classifications the formulas within them right they're all documented in a glossary versus having to go and ask for access to six different systems throughout the enterprise hoping right that's Sally next few that told you you needed access to these systems was right just to find out that you don't need the access and hence it took you three days to get the access anyway that's why a glossary is really a catalyst a lot of that well it's really interesting what you just said about you went through essentially an application rationalization exercise which which saved your organization money that's not always easy because you know businesses even though the you know IIT may be spending money on these systems businesses don't want to give them up but you were able to use it sounds like you're able to use data to actually inform which applications you should invest in versus you know sunset as well you'd sounds like you were giving the business a real incentive to go through this exercise because they ended up as you said owning the data well then what's great right who wants pepper what's using the old power and driving a new car if they can buy the I'm sorry bull owning the old car right driving the old park if they can truly own a new car for a cheaper price nobody wants to do that I've even looked at Tesla's right I can buy a Tesla for the same prices I can buy a minivan these days I think I might buy the Tesla but what I will say is that we also use that we built out a capabilities model with our enterprise architecture team and building that capabilities model we started to bucket our technologies within those capabilities models right like AI machine learning warehouse on cloud technologies are even warehousing technologies governance technologies you know those types of classifications today integrations technologies reporting technologies by kind of grouping all those into a capabilities matrix right and was Eve it was easy for us to then start identifying alright we're the system owners for these when it comes to technologies who are the business users for these based on that right let's go talk to this team the dealer management team about access to this new profiling capability with an IBM or this new catalog with an IBM right that they can use stay versus this sharepoint excel spreadsheets they were using for their metadata management right or the profiling tools that were old you know ten years old some of our sa peoples that they were using before right let's sell them on the noodles and start migrating them that becomes pretty easy because I mean unless you're buying some really old technology when you give people a purview into those new tools and those new capabilities especially with some of the IBM's new tools we have today there the buy-in is pretty quick it's pretty easy to sell somebody on something shiny and it's much easier to use than some of the older technologies let's talk about the business impact in my understanding is you were trying to increase the improve the effectiveness of the dealers not not just go out and brute force sign up more dealers were you able to achieve that outcome and what does it meant for your business yes actually we were so right now what we did is we slipped something called a CDR and that's our consumer dealer and development repository right that's where a lot of our dealer information resides today it's actually argue ler warehouse we had some other systems that we're collecting that information Kalinin like speed for example we were able to bring all that reporting man to one location sunset some of those other technologies but then also enable for that centralized reporting layer which we've also used data virtualization to start to marry submit information to db2 warehouse on cloud for users so we're allowing basically those that want to access CDR and our db2 warehouse and called dealer information to do that within one reporting layer um in doing so we were able to create something called a dealer harmonized ID really which is our version of we have so many dealers today right and some of those dealers actually sell bytes some of those dealers sell just apparel material some of those dealers just sell parts of those dealers right can we have certain you IDs kind of a golden record mastered information if you will right bought back in reporting so that we can accurately assess the dealer performance up to two years ago right it was really hard to do that we had information spread out all over it was really hard to get a good handle on what dealers were performing and what dealers weren't because was it was tough right for our analysts to wrangle that information and bring it together it took time many times we you would get multiple answers to one business question which is never good right one one question should have one answer if it's accurate um that is what we worked on within us last year and that's where really our CEO so the value at is now we can start to act on what dealers are performing at an optimal level versus what dealers are struggling and that's allowed even our account reps or field steel fields that right to go work with those struggling dealers and start to share with them the information of you know these are what some of our stronger dealer performing dealers are doing today that is making them more affecting it inside sorry effective is selling bikes you know these are some of the best practices you can implement that's where we make right our field staff smarter and our dealers smarter we're not looking to shut down dealers we just want to educate them on how to do better well and to your point about a single version of the truth if you will the the lines of business kind of owning their own data that's critical because you're not spending all your time you know pointing at fingers trying to understand the data if the if the users own it then they own it I and so how does self-service fit in were you able to achieve you know some level of self-service how far could you and you go there we were we did use some other tools I'll be quite honest aside from just the IBM tools today that's enabled some of that self-service analytics si PSAC was one of them Alteryx is another big one that we like to that our analyst team likes to use today to wrangle and bring that data together but that really allowed for our analysts spread in our reporting teams to start to build their own derivations their transformations for reporting themselves because they're more user interface space versus going in the backend systems and having to write straight pull right sequel queries things of that nature it usually takes time then requires a deeper level of knowledge then what we'd like to allow for our analysts right to have today I can say the same thing with the data scientist scheme you know they use a lot of the R and Python coding today what we've tried to do is make sure that the tools are available so that they can do everything they need to do without us really having to touch anything and I will be quite honest we have not had to touch much of anything we have a very skilled data scientist team so I will tell you that the tools that we put in place today Watson explore some of the other tools as well they haven't that has enabled the data scientists to really quickly move do what they need to do for reporting and even in cases where maybe Watson or Explorer may not be the optimal technology right for them to use we've also allowed for them to use some of our other resources are open source resources to build some of the models that they're that they were looking to build well I'm glad you brought that up Victoria because IBM makes a big deal out of you know being open and so you're kind of confirming that you can use third-party tools and and if you like you know tool vendor ABC you can use them as part of this framework yeah it's really about TCO right so take a look at what you have today if it's giving you at least 80% of what you need for the business or for your data scientists or reporting analysts right to do what they need to do it's to me it's good enough right it's giving you what you need it's pretty hard to find anything that's exactly 100 percent it's about being open though to when you're scientists or your analysts find another reporting tool right that requires minimal maintenance or let's just say did a scientist flow that requires minimal maintenance it's free right because it's open source IBM can integrate with that and we can enable that to be a quicker way for them to do what they need to do versus telling them no right you can't use the other technologies or the other open source information out there for you today you've got to use just these spools that's pretty tough to do and I think that would shut most IT shops down pretty quick within larger enterprises because it would really act as a roadblock to allow most of our teams right to do what they need to do reporting well last question so a big part of this the data ops you know borrowing from DevOps is this continuous integration continuous improvement you know kind of ongoing MOOC raising the bar if you will what do you see going from here oh I definitely see I see a world I see a world of where we're allowing for that rapid prototyping like I was talking about earlier I see a very big change in the data industry you said it yourself right we are in the brink of big data and it's only gonna get bigger there are organizations right right now that have literally understood how much of an asset their data really is today but they're starting to sell their data ah to other of their similar people are smaller industries right similar vendors within the industry similar spaces right so they can make money off of it because data truly is an asset now the key to it that was obviously making sure that it's curated that it's cleanse that it's rusted so that when you are selling that back you can't really make money off of it but we've seen though and what I really see on the horizon is the ability to vet that data right is in the past what have you been doing the past decade or just buying big data sets we're trusting that it's you know good information we're not doing a lot of profiling at most organizations arts you're gonna pay this big top dollar you're gonna receive this third-party data set and you're not gonna be able to use it the way you need to what I see on the horizon is us being able to do that you know we're building data Lake houses if you will right we're building um really those Hadoop link environments those data lakes right where we can land information we can quickly access it we can quickly profile it with tools that it would take hours for an ALICE write a bunch of queries do to understand what the profile of that data look like we did that recently at harley-davidson we bought and some third-party data evaluated it quickly through our agile scrum team right within a week we determined that the data was not as good as it as the vendor selling it right pretty much sold it to be and so we told the vendor we want our money back the data is not what we thought it would be please take the data sets back now that's just one use case right but to me that was golden it's a way to save money and start betting the data that we're buying otherwise what I would see in the past or what I've seen in the past is many organizations are just buying up big third-party data sets and just saying okay now it's good enough we think that you know just because it comes from the motorcycle and council right for motorcycles and operation Council then it's good enough it may not be it's up to us to start vetting that and that's where technology is going to change data is going to change analytics is going to change is a great example you're really in the cutting edge of this whole data op trend really appreciate you coming on the cube and sharing your insights and there's more in the crowd chatter crowd chatter off the Thank You Victoria for coming on the cube well thank you Dave nice to meet you it was a pleasure speaking with you yeah really a pleasure was all ours and thank you for watching everybody as I say crowd chatting at flash data op or more detail more Q&A this is Dave Volante for the cube keep it right there but right back right after this short break [Music]

Published Date : May 28 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Dave Volante	PERSON	0.99+
Asia	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
five projects	QUANTITY	0.99+
Victoria Stasiewicz	PERSON	0.99+
China	LOCATION	0.99+
Tesla	ORGANIZATION	0.99+
Victoria	PERSON	0.99+
Harley	ORGANIZATION	0.99+
Harley Davidson	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Blue Cross	ORGANIZATION	0.99+
Blue Cross	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Dave	PERSON	0.99+
US	LOCATION	0.99+
Harley-Davidson Motor Company	ORGANIZATION	0.99+
harley-davidson	PERSON	0.99+
six different systems	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
last year	DATE	0.99+
over 250 people	QUANTITY	0.99+
today	DATE	0.99+
three days	QUANTITY	0.99+
100 percent	QUANTITY	0.99+
IG	ORGANIZATION	0.99+
Watson	TITLE	0.99+
Boston	LOCATION	0.99+
tomorrow	DATE	0.98+
one business question	QUANTITY	0.98+
first	QUANTITY	0.98+
ABC	ORGANIZATION	0.98+
one answer	QUANTITY	0.97+
four systems	QUANTITY	0.97+
one	QUANTITY	0.97+
Victoria stayshia	PERSON	0.96+
Watson Explorer	TITLE	0.96+
Explorer	TITLE	0.96+
2019	DATE	0.96+
agile	ORGANIZATION	0.95+
Vik	PERSON	0.95+
two years ago	DATE	0.95+
one question	QUANTITY	0.95+
two weeks	QUANTITY	0.94+
both	QUANTITY	0.93+
excel	TITLE	0.93+
Sally	PERSON	0.92+
a week	QUANTITY	0.92+
harley	ORGANIZATION	0.91+
Watson Studio	TITLE	0.91+
last half of the year	DATE	0.89+
Alteryx	ORGANIZATION	0.88+
millions of dollars	QUANTITY	0.87+
single version	QUANTITY	0.86+
every Thursday	QUANTITY	0.86+
R	TITLE	0.85+

Ajay Vohora & Lester Waters, Io-Tahoe | AWS re:Invent 2019

>>LA Las Vegas. It's the cube covering AWS reinvent 2019, brought to you by Amazon web services and they don't care along with its ecosystem partners. >>Fine. Oh, welcome back here to Las Vegas. We are alive at AWS. Reinvent a lot with Justin Warren. I'm John Walls day one of a jam pack show. We had great keynotes this morning from Andy Jassy, uh, also representatives from Goldman Sachs and number of other enterprises on this stage right now we're gonna talk about data. It's all about data with IO Tahoe, a couple of the companies, representatives, CEO H J for horror. Jorge J. Thanks for being with us. Thank you Joan. And uh, Lester waters is the CSO at IO Tahoe. Leicester. Good afternoon to you. Thanks for being with us. Thank you for having us. CJ, you brought a football with you there. I see. So you've come prepared for a sport sport. I love it. All right. But if this is that your booth and your, you're showing here I assume and exhibiting and I know you've got a big offering we're going to talk about a little bit later on. First tell us about IO Tahoe a little bit to inform our viewers right now who might not be too familiar with the company. >>Sure. Well, our background was dealing with enterprise scale data issues that were really about the complexity, the amount of data and different types of data. So 2014 around when we're in stealth, kind of working on our technology, uh, the, a lot of the common technologies around them were Apache base. So Hadoop, um, large enterprises that were working with like a GE, Comcast had a cow help us come out of stealth in 2017. Uh, and grave, it's gave us a great story of solving petabyte scale data challenges, uh, using machine learning. So, uh, that manual overhead, that more and more as we look at, uh, AWS services, how do we drive the automation and get the value from data, uh, automation. >>It's gotta be the way forwards. All right, so let's, let's jump onto that then. Uh, on, on that notion, you've got this exponential growth in data, obviously working off the edge internet of things. Um, all these inputs, right? And we have so much more information at our disposal. Some of it's great, some of it's not. How do we know the difference, especially in this world where this exponential increase has happened. Lester, I mean, just tackle that for, from a, uh, from a company perspective and identifying, you know, first off, how do we ever figure out what do we have that's that valuable? Where do we get the value out of that, right? And then, um, how do we make sense of it? How do we put it into practice? >>Yeah. So I think not most enterprises have a problem with data sprawl. There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, they take a copy of that data. There's another instance of it. Then there's another instance for another project. >>And suddenly these different data sources become authoritative and become production. So now I have three, four, or five different instances. Oh, and then there's the three or four that got canceled and they're still sitting around. And as an information security professional, my challenge is to know where all of those pieces of data are so that, so that I can govern it and make sure that the stuff I don't need is gotten rid of it deleted. Uh, so you know, using the IO Tahoe software, I'm able to catalog all of that. I'm able to garner insights into that data using the, the nine patent pending algorithms that we have, uh, to, to find that, uh, to do intelligent tagging, if you will. So, uh, from my perspective, I'm very interested in making sure that I'm adhering to compliance rules. So the really cool thing about the stuff is that we go and tag data, we look at it and we actually tie it to lines of regulations. So you could go CC CCPA. This bit of text here applies to this. And that's really helpful for me as an information security professional because I'm not necessarily versed on every line of regulation, but when I can go and look at it handily like that, it makes it easier for me to go, Oh, okay, that's great. I know how to treat that in terms of control. So that for, that's the important bit for me. So if you don't know where your data is, you can't control it. You can't monitor it. >>Governance. Yeah. The, the knowing where stuff is, I'm familiar with a framework that was developed at Telstra back in Australia called the five no's, which is about exactly that. Knowing where your data is, what is it, who has access to it? Cause I actually being able to cattle on the data then like knowing what it is that you have. This is a mammoth task. I mean that's, that's hard enough 12 years ago. But like today with the amount of data that's actually actively being created every single day, so how, how does your system help CSOs tackle this, this kind of issue and maybe less listed. You can, you can start off and then, then you can tell us a bit more of yourself. >>Yeah, I mean I'll start off on that. It's a, a place to kind of see the feedback from our enterprise customers is as that veracity and volume of data increases. The, the challenge is definitely there to keep on top of governing that. So continually discovering that new data created, how is it different? How's it adding to the existing data? Uh, using machine learning and the models that we create, whether it's anomaly detection or classifying the data based on certain features in the data that allows us to tag it, load that in our catalog. So I've discovered it now we've made it accessible. Now any BI developer data engineer can search for that data in a catalog and make something from it. So if there were 10 steps in that data mile, we definitely sold the first four or five to of bring that momentum to getting value from that data. So discovering it, catalog it, tagging the data to make it searchable, and then it's free to pick up for whatever use case is out there, whether it's migration, security, compliance, um, security is a big one for you. >>And I would also add too, for the data scientists, you know, knowing all the assets they have available to them in order to, to drive those business value insights that they're so important these days. For companies because you know, a lot of companies compete on very thin margins and, and, and having insights into their data and to the way customers can use their data really can make, make or break a company these days. So that's, that's critical. And as Aja pointed out, being able to automate that through, through data ops if you will, uh, and drive those insights automatically is great. Like for example, from an information security standpoint, I want to fingerprint my data and I want to feed it into a DLP system. And so that, you know, I can really sort of keep an eye out if this data is actually going out. And it really is my data versus a standard reject kind of matching, which isn't the best, uh, techniques. So >>yeah. So walk us through that in a bit more detail. So you mentioned tagging is essentially that a couple of times. So let's go into the details a little bit about what that, what that actually means for customers. My understanding is that you're looking for things like a social security number that could be sitting somewhere in this data. So finding out where are all these social security numbers that I may not be aware of and it could be being shared with someone who shouldn't have access to that, but it is there, is that what it is or are they, are there other kinds of data that you're able to tag that traditional purchase? >>Yeah. Was wait straight out of the box. You've got your um, PII or personally, um, identifiable information, that kind of day that is covered under the CCPA GDPR. So there are those standards, regulatory driven definitions that is social security number name, address would fall under. Um, beyond that. Then in a large enterprise, you've got a clever data scientists, data engineers you through the nature of their work can combine sets of data that could include work patterns, IDs, um, lots of activity. You bring that together and that suddenly becomes, uh, under that umbrella of sensitive. Um, so being able to tag and classify data under those regulatory policies, but then is what and what could be an operational risk to an organization, whether it's a bank, insurance, utility, health care in particular, if you work in all those verticals or yeah, across the way, agnostic to any vertical. >>Okay. All right. And the nature of being able to do that is having that machine learning set up a baseline, um, around what is sensitive and then honing that to what is particular to that organization. So, you know, lots of people will use ever sort of seen here at AWS S three, uh, Aurora, Postgres or, or my sequel Redshift. Um, and also different ways the underlying sources of that data, whether it's a CRM system, a IOT, all of those sources have got nuances that makes every enterprise data landscape just slightly different. So China make a rules based, one size fits all approach is, is going to be limiting, um, that the increase your manual overhead. So customers like GE, Comcast, um, that move way beyond throwing people at the problem, that's no longer possible. Uh, so being smart about how to approach this, classifying the data, using features in the data crane, that metadata as an asset just as an eight data warehouse would be, allows you to, to enable the rest of the organization. >>So, I mean, you've talked about, um, you know, deriving value and identifying value. Um, how does ultimately, once you catalog your tag, what does this mean to the bottom line of terms of ROI? How does AWS play into that? Um, you know, why am I as, as a, as a company, you know, what value am I getting out of, of your abilities with AWS and then having that kind of capability. >>Yeah. We, we did a great study with Forester. Um, they calculated the ROI and it's a mixture of things. It's that manual personnel overhead who are locked into that. Um, pretty unpleasant low productivity role of wrangling with data for want of a better words to make something of it. They'd much rather be creating the dashboards that the BI or the insights. Um, so moving, you know, dozens of people from the back office manual wrangling into what's going to make difference to the chief marketing officer and your CFO bring down the cost of served your customer by getting those operational insights is how they want to get to working with that data. So that automation to take out the manual overhead of the upfront task is an allowing that, that resource to be better deployed onto the more interesting productive work. So that's one part of the ROI. >>The other is with AWS. What we've found here engaging with the AWS ecosystem is just that speed of migration to AWS. We can take months out of that by cataloging what's on premise and saying, huh, I date aside. So our data engineering team want to create products on for their own customers using Sage maker using Redshift, Athena. Um, but what is the exact data that we need to push into the cloud to use those services? Is it the 20 petabytes that we've accumulated over the 20 last 20 years? That's probably not going to be the case. So tiering the on prem and cloud, um, base of that data is, is really helpful to a data officer and an information architect to set themselves up to accelerate that migration to AWS. So for people who've used this kind of system and they've run through the tagging and seen the power of the platform that you've got there. So what are some of the things that they're now able to do once they've got these highly qual, high quality tagged data set? >>So it's not just tagging too. We also do, uh, we do, we do, we do fuzzy, fuzzy magic so we can find relationships in the data or even relationships within the data in terms of duplicate. So, so for example, somebody, somebody got married and they're really the same, you know, so now there's their surname has changed. We can help companies find that, those bits of a matching. And I think we had one customer where we saved about, saved him about a hundred thousand a year in mailing costs because they were sending, you know, to, you know, misses, you know, right there anymore. Her name was. And having the, you know, being able to deduplicate that kind of data really helps with that helps people save money. >>Yep. And that's kind of the next phase in our journey is moving beyond the tag in the classification is uh, our roadmap working with AWS is very much machine learning driven. So our engineering team, uh, what they're excited about is what's the next model, what's the next problem we can solve with AI machine learning to throw at the large scale data problem. So we'll continually be curating and creating that metadata catalog asset. So allow that to be used as a resource to enable the rest of the, the data landscape. >>And I think what's interesting about our product is we really have multiple audiences for it. We've got the chief data officer who wants to make sure that we're completely compliant because it doesn't want that 4% potential fine. You know, so being able to evidence that they're having due diligence and their data management will go a long way towards if there is a breach because zero days do happen. But if you can evidence that you've really been, been, had a good discipline, then you won't get that fine or hopefully you won't get a big fine. And that the second audience is going to be information security professionals who want to secure that perimeter. The third is going to be the data architects who are trying to, to uh, to, you know, manage and, and create new solutions with that data. And the fourth of course is the data scientists trying to drive >>new business value. >>Alright, well before we, we, we, we um, let y'all take off, I want to know about, uh, an offering that you've launched this week, uh, apparently to great success and you're pretty excited about just your space alone here, your presence here. But tell us a little bit about that before you take off. >>Yeah. So we're here also sponsoring the jam lounge and everybody's welcome to sign up. It's, um, a number of our friends there to competitively take some challenges, come into the jam lounge, use our products, and kind of understand what it means to accelerate that journey onto AWS. What can I do if I show what what? Yeah, give me, give me an idea about the blog. You can take some chances to discover data and understand what data is there. Isn't there fighting relationships and intuitively through our UI, start exploring that and, and joining the dots. Um, uh, what, what is my day that knowing your data and then creating policies to drive that data into use. Cool. Good. And maybe pick up a football along the way so I know. Yeah. Thanks for being with us. Thank you for half the time. And, uh, again, the jam lounge, right? Right, right here at the SAS Bora AWS reinvent. We are alive. And you're watching this right here on the queue.

Published Date : Dec 4 2019

SUMMARY :

AWS reinvent 2019, brought to you by Amazon web services So you've come prepared for So Hadoop, um, large enterprises that were working with like and identifying, you know, first off, how do we ever figure out what do we have that's that There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, So that for, that's the important bit for me. it is that you have. tagging the data to make it searchable, and then it's free to pick up for And I would also add too, for the data scientists, you know, knowing all the assets they So let's go into the details a little bit about what that, what that actually means for customers. Um, so being able to tag and classify And the nature of being able to do that is having Um, you know, why am I as, as a, as a company, you know, what value am I Um, so moving, you know, dozens of people from the back office base of that data is, is really helpful to a data officer and And having the, you know, being able to deduplicate that kind of data really So allow that to be used as a resource And that the second audience is going you take off. start exploring that and, and joining the dots.

ENTITIES

Entity	Category	Confidence
Comcast	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Justin Warren	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
2017	DATE	0.99+
Joan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 steps	QUANTITY	0.99+
three	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
2014	DATE	0.99+
Telstra	ORGANIZATION	0.99+
Jorge J.	PERSON	0.99+
five	QUANTITY	0.99+
Ajay Vohora	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
20 petabytes	QUANTITY	0.99+
four	QUANTITY	0.99+
John Walls	PERSON	0.99+
IO Tahoe	ORGANIZATION	0.99+
4%	QUANTITY	0.99+
Io-Tahoe	PERSON	0.99+
one customer	QUANTITY	0.99+
First	QUANTITY	0.99+
CJ	PERSON	0.99+
Redshift	TITLE	0.99+
third	QUANTITY	0.99+
12 years ago	DATE	0.98+
fourth	QUANTITY	0.98+
today	DATE	0.98+
Lester Waters	PERSON	0.98+
H J	PERSON	0.97+
Aja	PERSON	0.97+
Forester	ORGANIZATION	0.97+
CCPA	TITLE	0.97+
this week	DATE	0.97+
zero days	QUANTITY	0.96+
about a hundred thousand a year	QUANTITY	0.96+
first	QUANTITY	0.95+
second audience	QUANTITY	0.94+
nine	QUANTITY	0.94+
LA Las Vegas	LOCATION	0.94+
Sage	ORGANIZATION	0.92+
Leicester	LOCATION	0.91+
Apache	ORGANIZATION	0.9+
Lester	PERSON	0.9+
SAS Bora	ORGANIZATION	0.88+
first four	QUANTITY	0.87+
one part	QUANTITY	0.87+
one	QUANTITY	0.87+
2019	DATE	0.85+
Hadoop	ORGANIZATION	0.84+
Aurora	TITLE	0.82+
dozens of people	QUANTITY	0.79+
Redshift	ORGANIZATION	0.78+
Postgres	ORGANIZATION	0.76+
20	DATE	0.75+
eight data warehouse	QUANTITY	0.74+
five different	QUANTITY	0.73+
CEO	PERSON	0.7+
single day	QUANTITY	0.69+
China	LOCATION	0.68+
20 last	QUANTITY	0.65+
Athena	LOCATION	0.63+
morning	DATE	0.55+
Invent	EVENT	0.54+
GDPR	TITLE	0.53+
S three	TITLE	0.52+
years	QUANTITY	0.51+
no	OTHER	0.4+
waters	ORGANIZATION	0.39+

Tony Higham, IBM | IBM Data and AI Forum

>>live from Miami, Florida It's the Q covering IBM is data in a I forum brought to you by IBM. >>We're back in Miami and you're watching the cubes coverage of the IBM data and a I forum. Tony hi. Amiss here is a distinguished engineer for Ditch the Digital and Cloud Business Analytics at IBM. Tony, first of all, congratulations on being a distinguished engineer. That doesn't happen often. Thank you for coming on the Cube. Thank you. So your area focus is on the B I and the Enterprise performance management space. >>Um, and >>if I understand it correctly, a big mission of yours is to try to modernize those make himself service, making cloud ready. How's that going? >>It's going really well. I mean, you know, we use things like B. I and enterprise performance management. When you really boil it down, there's that's analysis of data on what do we do with the data this useful that makes a difference in the world, and then this planning and forecasting and budgeting, which everyone has to do whether you are, you know, a single household or whether you're an Amazon or Boeing, which are also some of our clients. So it's interesting that we're going from really enterprise use cases, democratizing it all the way down to single user on the cloud credit card swipe 70 bucks a month >>so that was used to be used to work for Lotus. But Cognos is one of IBM's largest acquisitions in the software space ever. Steve Mills on his team architected complete transformation of IBM is business and really got heavily into it. I think I think it was a $5 billion acquisition. Don't hold me to that, but massive one of the time and it's really paid dividends now when all this sort of 2000 ten's came in and said, Oh, how Duke's gonna kill all the traditional b I traditional btw that didn't happen, that these traditional platforms were a fundamental component of people's data strategies, so that created the imperative to modernize and made sure that there could be things like self service and cloud ready, didn't it? >>Yeah, that's absolutely true. I mean, the work clothes that we run a really sticky were close right when you're doing your reporting, your consolidation or you're planning of your yearly cycle, your budget cycle on these technologies, you don't rip them out so easily. So yes, of course, there's competitive disruption in the space. And of course, cloud creates on opportunity for work loads to be wrong, Cheaper without your own I t people. And, of course, the era of digital software. I find it myself. I tried myself by it without ever talking to a sales person creates a democratization process for these really powerful tools that's never been invented before in that space. >>Now, when I started in the business a long, long time ago, it was called GSS decision support systems, and they at the time they promised a 360 degree view with business That never really happened. You saw a whole new raft of players come in, and then the whole B I and Enterprise Data Warehouse was gonna deliver on that promise. That kind of didn't happen, either. Sarbanes Oxley brought a big wave of of imperative around these systems because compliance became huge. So that was a real tailwind for it. Then her duke was gonna solve all these problems that really didn't happen. And now you've got a I, and it feels like the combination of those systems of record those data warehouse systems, the traditional business intelligence systems and all this new emerging tech together are actually going to be a game changer. I wonder if you could comment on >>well so they can be a game changer, but you're touching on a couple of subjects here that are connected. Right? Number one is obviously the mass of data, right? Cause data has accelerated at a phenomenal pace on then you're talking about how do I then visualize or use that data in a useful manner? And that really drives the use case for a I right? Because A I in and of itself, for augmented intelligence as we as we talk about, is only useful almost when it's invisible to the user cause the user needs to feel like it's doing something for them that super intuitive, a bit like the sort of transition between the electric car on the normal car. That only really happens when the electric car can do what the normal car can do. So with things like Imagine, you bring a you know, how do cluster into a B. I solution and you're looking at that data Well. If I can correlate, for example, time profit cost. Then I can create KP eyes automatically. I can create visualizations. I know which ones you like to see from that. Or I could give you related ones that I can even automatically create dashboards. I've got the intelligence about the data and the knowledge to know what? How you might what? Visualize adversity. You have to manually construct everything >>and a I is also going to when you when you spring. These disparage data sets together, isn't a I also going to give you an indication of the confidence level in those various data set. So, for example, you know, you're you're B I data set might be part of the General ledger. You know of the income statement and and be corporate fact very high confidence level. More sometimes you mention to do some of the unstructured data. Maybe not as high a confidence level. How our customers dealing with that and applying that first of all, is that a sort of accurate premise? And how is that manifesting itself in terms of business? Oh, >>yeah. So it is an accurate premise because in the world in the world of data. There's the known knowns on the unknown knowns, right? No, no's are what you know about your data. What's interesting about really good B I solutions and planning solutions, especially when they're brought together, right, Because planning and analysis naturally go hand in hand from, you know, one user 70 bucks a month to the Enterprise client. So it's things like, What are your key drivers? So this is gonna be the drivers that you know what drives your profit. But when you've got massive amounts of data and you got a I around that, especially if it's a I that's gone ontology around your particular industry, it can start telling you about drivers that you don't know about. And that's really the next step is tell me what are the drivers around things that I don't know. So when I'm exploring the data, I'd like to see a key driver that I never even knew existed. >>So when I talk to customers, I'm doing this for a while. One of the concerns they had a criticisms they had of the traditional systems was just the process is too hard. I got to go toe like a few guys I could go to I gotta line up, you know, submit a request. By the time I get it back, I'm on to something else. I want self serve beyond just reporting. Um, how is a I and IBM changing that dynamic? Can you put thes tools in the hands of users? >>Right. So this is about democratizing the cleverness, right? So if you're a big, broad organization, you can afford to hire a bunch of people to do that stuff. But if you're a startup or an SNB, and that's where the big market opportunity is for us, you know, abilities like and this it would be we're building this into the software already today is I'll bring a spreadsheet. Long spreadsheets. By definition, they're not rows and columns, right? Anyone could take a Roan Collin spreadsheet and turn into a set of data because it looks like a database. But when you've got different tabs on different sets of data that may or may not be obviously relatable to each other, that ai ai ability to be on introspect a spreadsheet and turn into from a planning point of view, cubes, dimensions and rules which turn your spreadsheet now to a three dimensional in memory cube or a planning application. You know, the our ability to go way, way further than you could ever do with that planning process over thousands of people is all possible now because we don't have taken all the hard work, all the lifting workout, >>so that three dimensional in memory Cuba like the sound of that. So there's a performance implication. Absolutely. On end is what else? Accessibility Maw wraps more users. Is that >>well, it's the ability to be out of process water. What if things on huge amounts of data? Imagine you're bowing, right? Howdy, pastors. Boeing How? I don't know. Three trillion. I'm just guessing, right? If you've got three trillion and you need to figure out based on the lady's hurricane report how many parts you need to go ship toe? Where that hurricane reports report is you need to do a water scenario on massive amounts of data in a second or two. So you know that capability requires an old lap solution. However, the rest of the planet other than old people bless him who are very special. People don't know what a laugh is from a pop tart, so democratizing it right to the person who says, I've got a set of data on as I still need to do what if analysis on things and probably at large data cause even if you're a small company with massive amounts of data coming through, people click. String me through your website just for example. You know what if I What if analysis on putting a 5% discount on this product based on previous sales have that going to affect me from a future sales again? I think it's the democratizing as the well is the ability to hit scale. >>You talk about Cloud and analytics, how they've they've come together, what specifically IBM has done to modernize that platform. And I'm interested in what customers are saying. What's the adoption like? >>So So I manage the Global Cloud team. We have night on 1000 clients that are using cloud the cloud implementations of our software growing actually so actually Maur on two and 1/2 1000. If you include the multi tenant version, there's two steps in this process, right when you've got an enterprise software solution, your clients have a certain expectation that your software runs on cloud just the way as it does on premise, which means in practical terms, you have to build a single tenant will manage cloud instance. And that's just the first step, right? Because getting clients to see the value of running the workload on cloud where they don't need people to install it, configure it, update it, troubleshoot it on all that other sort of I t. Stuff that subtracts you from doing running your business value. We duel that for you. But the future really is in multi tenant on how we can get vast, vast scale and also greatly lower costs. But the adoptions been great. Clients love >>it. Can you share any kind of indication? Or is that all confidential or what kind of metrics do you look at it? >>So obviously we look, we look a growth. We look a user adoption, and we look at how busy the service. I mean, let me give you the best way I can give you is a is a number of servers, volume numbers, right. So we have 8000 virtual machines running on soft layer or IBM cloud for our clients business Analytics is actually the largest client for IBM Cloud running those workloads for our clients. So it's, you know, that the adoption has been really super hard on the growth continues. Interestingly enough, I'll give you another factoid. So we just launched last October. Cognos Alex. Multi tenant. So it is truly multi infrastructure. You try, you buy, you give you credit card and away you go. And you would think, because we don't have software sellers out there selling it per se that it might not adopt as much as people are out there selling software. Okay, well, in one year, it's growing 10% month on month cigarette Ally's 10% month on month, and we're nearly 1400 users now without huge amounts of effort on our part. So clearly this market interest in running those softwares and then they're not want Tuesdays easer. Six people pretending some of people have 150 people pretending on a multi tenant software. So I believe that the future is dedicated is the first step to grow confidence that my own premise investments will lift and shift the cloud, but multi tenant will take us a lot >>for him. So that's a proof point of existing customer saying okay, I want to modernize. I'm buying in. Take 1/2 step of the man dedicated. And then obviously multi tenant for scale. And just way more cost efficient. Yes, very much. All right. Um, last question. Show us a little leg. What? What can you tell us about the road map? What gets you excited about the future? >>So I think the future historically, Planning Analytics and Carlos analytics have been separate products, right? And when they came together under the B I logo in about about a year ago, we've been spending a lot of our time bringing them together because, you know, you can fight in the B I space and you can fight in the planning space. And there's a lot of competitors here, not so many here. But when you bring the two things together, the connected value chain is where we really gonna win. But it's not only just doing is the connected value chain it and it could be being being vice because I'm the the former Lotus guy who believes in democratization of technology. Right? But the market showing us when we create a piece of software that starts at 15 bucks for a single user. For the same power mind you write little less less of the capabilities and 70 bucks for a single user. For all of it, people buy it. So I'm in. >>Tony, thanks so much for coming on. The kid was great to have you. Brilliant. Thank you. Keep it right there, everybody. We'll be back with our next guest. You watching the Cube live from the IBM data and a I form in Miami. We'll be right back.

Published Date : Oct 23 2019

SUMMARY :

IBM is data in a I forum brought to you by IBM. is on the B I and the Enterprise performance management How's that going? I mean, you know, we use things like B. I and enterprise performance management. so that created the imperative to modernize and made sure that there could be things like self service and cloud I mean, the work clothes that we run a really sticky were close right when you're doing and it feels like the combination of those systems of record So with things like Imagine, you bring a you know, and a I is also going to when you when you spring. that you know what drives your profit. By the time I get it back, I'm on to something else. You know, the our ability to go way, way further than you could ever do with that planning process So there's a performance implication. So you know that capability What's the adoption like? t. Stuff that subtracts you from doing running your business value. or what kind of metrics do you look at it? So I believe that the future is dedicated What can you tell us about the road map? For the same power mind you write little less less of the capabilities and 70 bucks for a single user. The kid was great to have you.

ENTITIES

Entity	Category	Confidence
Tony Higham	PERSON	0.99+
Steve Mills	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Boeing	ORGANIZATION	0.99+
Miami	LOCATION	0.99+
$5 billion	QUANTITY	0.99+
15 bucks	QUANTITY	0.99+
Tony	PERSON	0.99+
70 bucks	QUANTITY	0.99+
three trillion	QUANTITY	0.99+
5%	QUANTITY	0.99+
Three trillion	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
150 people	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
two steps	QUANTITY	0.99+
Six people	QUANTITY	0.99+
1000 clients	QUANTITY	0.99+
two things	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
last October	DATE	0.99+
One	QUANTITY	0.97+
one year	QUANTITY	0.97+
Duke	ORGANIZATION	0.97+
Ditch the Digital	ORGANIZATION	0.97+
today	DATE	0.97+
Cuba	LOCATION	0.96+
Amiss	PERSON	0.96+
Planning Analytics	ORGANIZATION	0.96+
single user	QUANTITY	0.96+
Lotus	TITLE	0.95+
nearly 1400 users	QUANTITY	0.95+
Tuesdays	DATE	0.92+
one	QUANTITY	0.92+
10% month	QUANTITY	0.92+
B I	ORGANIZATION	0.91+
about	DATE	0.91+
over thousands of people	QUANTITY	0.91+
Global Cloud	ORGANIZATION	0.91+
Carlos analytics	ORGANIZATION	0.91+
10% month	QUANTITY	0.9+
1/2 1000	QUANTITY	0.87+
Alex	PERSON	0.87+
first	QUANTITY	0.81+
70 bucks a month	QUANTITY	0.81+
8000 virtual machines	QUANTITY	0.8+
Ally	ORGANIZATION	0.79+
Enterprise Data Warehouse	ORGANIZATION	0.79+
single tenant	QUANTITY	0.79+
a year ago	DATE	0.79+
Collin	PERSON	0.78+
single user	QUANTITY	0.76+
1/2 step	QUANTITY	0.73+
Sarbanes Oxley	PERSON	0.73+
single household	QUANTITY	0.7+
Cloud Business Analytics	ORGANIZATION	0.7+
a second	QUANTITY	0.68+
couple	QUANTITY	0.65+
Cognos	PERSON	0.59+
2000 ten	DATE	0.58+
cloud	TITLE	0.57+
Roan	ORGANIZATION	0.56+
IBM Cloud	ORGANIZATION	0.53+
Cube	PERSON	0.37+

Show Wrap | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's three Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back. We're here to wrap up the M I T. Chief data officer officer, information quality. It's hashtag m i t CDO conference. You're watching the Cube. I'm David Dante, and Paul Gill is my co host. This is two days of coverage. We're wrapping up eyes. Our analysis of what's going on here, Paul, Let me let me kick it off. When we first started here, we talked about that are open. It was way saw the chief data officer role emerged from the back office, the information quality role. When in 2013 the CEO's that we talked to when we asked them what was their scope. We heard things like, Oh, it's very wide. Involves analytics, data science. Some CEOs even said Oh, yes, security is actually part of our purview because all the cyber data so very, very wide scope. Even in some cases, some of the digital initiatives were sort of being claimed. The studios were staking their claim. The reality was the CDO also emerged out of highly regulated industries financialservices healthcare government. And it really was this kind of wonky back office role. And so that's what my compliance, that's what it's become again. We're seeing that CEOs largely you're not involved in a lot of the emerging. Aye, aye initiatives. That's what we heard, sort of anecdotally talking to various folks At the same time. I feel as though the CDO role has been more fossilized than it was before. We used to ask, Is this role going to be around anymore? We had C I. Ose tell us that the CEO Rose was going to disappear, so you had both ends of the spectrum. But I feel as though that whatever it's called CDO Data's our chief analytics off officer, head of data, you know, analytics and governance. That role is here to stay, at least for for a fair amount of time and increasingly, issues of privacy and governance. And at least the periphery of security are gonna be supported by that CD a role. So that's kind of takeaway Number one. Let me get your thoughts. >> I think there's a maturity process going on here. What we saw really in 2016 through 2018 was, ah, sort of a celebration of the arrival of the CDO. And we're here, you know, we've got we've got power now we've got an agenda. And that was I mean, that was a natural outcome of all this growth and 90% of organizations putting sea Dios in place. I think what you're seeing now is a realization that Oh, my God, this is a mess. You know what I heard? This year was a lot less of this sort of crowing about the ascendance of sea Dios and Maura about We've got a big integration problem of big data cleansing problem, and we've got to get our hands down to the nitty gritty. And when you talk about, as you said, we had in here so much this year about strategic initiatives, about about artificial intelligence, about getting involved in digital business or customer experience transformation. What we heard this year was about cleaning up data, finding the data that you've got organizing it, applying meditator, too. It is getting in shape to do something with it. There's nothing wrong with that. I just think it's part of the natural maturation process. Organizations now have to go through Tiu to the dirty process of cleaning up this data before they can get to the next stage, which was a couple of three years out for most of >> the second. Big theme, of course. We heard this from the former head of analytics. That G s K on the opening keynote is the traditional methods have failed the the Enterprise Data Warehouse, and we've actually studied this a lot. You know, my analogy is often you snake swallowing a basketball, having to build cubes. E D W practitioners would always used to call it chasing the chips until we come up with a new chip. Oh, we need that because we gotta run faster because it's taking us hours and hours, weeks days to run these analytics. So that really was not an agile. It was a rear view mirror looking thing. And Sarbanes Oxley saved the E. D. W. Business because reporting became part of compliance thing perspective. The master data management piece we've heard. Do you consistently? We heard Mike Stone Breaker, who's obviously a technology visionary, was right on. It doesn't scale through this notion of duping. Everything just doesn't work and manually creating rules. It's just it's just not the right approach. This we also heard the top down data data enterprise data model doesn't works too complicated, can operationalize it. So what they do, they kick the can to governance. The Duke was kind of a sidecar, their big data that failed to live up to its promises. And so it's It's a big question as to whether or not a I will bring that level of automation we heard from KPMG. Certainly, Mike Stone breaker again said way heard this, uh, a cz well, from Andy Palmer. They're using technology toe automate and scale that big number one data science problem, which is? They spend all their time wrangling data. We'll see if that if that actually lives up >> to his probable is something we did here today from several of our guests. Was about the promise of machine learning to automate this day to clean up process and as ah Mark Ramsay kick off the conference saying that all of these efforts to standardize data have failed in the past. This does look, He then showed how how G s K had used some of the tools that were represented here using machine learning to actually clean up the data at G S. K. So there is. And I heard today a lot of optimism from the people we talked to about the capability of Chris, for example, talking about the capability of machine learning to bring some order to solve this scale scale problem Because really organizing data creating enterprise data models is a scale problem, and the only way you can solve that it's with with automation, Mike Stone breaker is right on top of that. So there was optimism at this event. There was kind of an ooh, kind of, ah, a dismay at seeing all the data problems they have to clean up, but also promised that tools are on the way that could do that. >> Yeah, The reason I'm an optimist about this role is because data such a hard problem. And while there is a feeling of wow, this is really a challenge. There's a lot of smart people here who are up for the challenge and have the d n a for it. So the role, that whole 360 thing. We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, which is really bringing machine intelligence to the table. We haven't heard that as much at this event. It's now front and center. It's just another example of a I injecting itself into virtually every aspect every corner of the industry. And again, I often jokes. Same wine, new bottle. Our industry has a habit of doing that, but it's cyclical, but it is. But we seem to be making consistent progress. >> And the machine learning, I thought was interesting. Several very guest spoke to machine learning being applied to the plumbing projects right now to cleaning up data. Those are really self contained projects. You can manage those you can. You can determine out test outcomes. You can vet the quality of the of the algorithms. It's not like you're putting machine learning out there in front of the customer where it could potentially do some real damage. There. They're vetting their burning in machine, learning in a environment that they control. >> Right, So So, Amy, Two solid days here. I think that this this conference has really grown when we first started here is about 130 people, I think. And now it was 500 registrants. This'd year. I think 600 is the sort of the goal for next year. Moving venues. The Cube has been covering this all but one year since 2013. Hope to continue to do that. Paul was great working with you. Um, always great work. I hope we can, uh we could do more together. We heard the verdict is bringing back its conference. You put that together. So we had column. Mahoney, um, had the vertical rock stars on which was fun. Com Mahoney, Mike Stone breaker uh, Andy Palmer and Chris Lynch all kind of weighed in, which was great to get their perspectives kind of the days of MPP and how that's evolved improving on traditional relational database. And and now you're Stone breaker. Applying all these m i. Same thing with that scale with Chris Lynch. So it's fun to tow. Watch those guys all Boston based East Coast folks some news. We just saw the news hit President Trump holding up jet icon contractors is we've talked about. We've been following that story very closely and I've got some concerns over that. It's I think it's largely because he doesn't like Bezos in The Washington Post Post. Exactly. You know, here's this you know, America first. The Pentagon says they need this to be competitive with China >> and a I. >> There's maybe some you know, where there's smoke. There's fire there, so >> it's more important to stick in >> the eye. That's what it seems like. So we're watching that story very closely. I think it's I think it's a bad move for the executive branch to be involved in those type of decisions. But you know what I know? Well, anyway, Paul awesome working with you guys. Thanks. And to appreciate you flying out, Sal. Good job, Alex Mike. Great. Already wrapping up. So thank you for watching. Go to silicon angle dot com for all the news. Youtube dot com slash silicon angles where we house our playlist. But the cube dot net is the main site where we have all the events. It will show you what's coming up next. We've got a bunch of stuff going on straight through the summer. And then, of course, VM World is the big kickoff for the fall season. Goto wicked bond dot com for all the research. We're out. Thanks for watching Dave. A lot day for Paul Gillon will see you next time.

Published Date : Aug 1 2019

SUMMARY :

Brought to you by in 2013 the CEO's that we talked to when we asked them what was their scope. And that was I mean, And Sarbanes Oxley saved the E. data models is a scale problem, and the only way you can solve that it's with with automation, We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, And the machine learning, I thought was interesting. We just saw the news hit President Trump holding up jet icon contractors There's maybe some you know, where there's smoke. And to appreciate you flying out, Sal.

ENTITIES

Entity	Category	Confidence
Andy Palmer	PERSON	0.99+
David Dante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
Chris	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
Paul Gill	PERSON	0.99+
Mike Stone	PERSON	0.99+
2016	DATE	0.99+
Paul Gillon	PERSON	0.99+
Mike Stone Breaker	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2018	DATE	0.99+
Rose	PERSON	0.99+
Alex Mike	PERSON	0.99+
Bezos	PERSON	0.99+
G s K	ORGANIZATION	0.99+
Mahoney	PERSON	0.99+
Boston	LOCATION	0.99+
KPMG	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Sal	PERSON	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
500 registrants	QUANTITY	0.99+
two days	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
today	DATE	0.99+
next year	DATE	0.99+
Mark Ramsay	PERSON	0.99+
360	QUANTITY	0.99+
this year	DATE	0.99+
Maura	PERSON	0.99+
G S. K.	ORGANIZATION	0.98+
Youtube	ORGANIZATION	0.98+
Amy	PERSON	0.98+
Pentagon	ORGANIZATION	0.98+
C I. Ose	PERSON	0.98+
Sarbanes Oxley	PERSON	0.97+
first	QUANTITY	0.97+
This year	DATE	0.96+
one year	QUANTITY	0.96+
Mike Stone breaker	PERSON	0.95+
Enterprise Data Warehouse	ORGANIZATION	0.95+
Dios	PERSON	0.94+
Two solid days	QUANTITY	0.94+
second	QUANTITY	0.94+
three years	QUANTITY	0.92+
about 130 people	QUANTITY	0.91+
600	QUANTITY	0.9+
Duke	ORGANIZATION	0.89+
VM World	EVENT	0.88+
dot com	ORGANIZATION	0.85+
China	ORGANIZATION	0.84+
E. D. W.	ORGANIZATION	0.83+
Cube	ORGANIZATION	0.8+
MIT	ORGANIZATION	0.77+
East Coast	LOCATION	0.75+
M I T.	PERSON	0.75+
2019	DATE	0.74+
President Trump	PERSON	0.71+
both ends	QUANTITY	0.71+
three	QUANTITY	0.68+
M I T.	EVENT	0.64+
cube dot net	ORGANIZATION	0.59+
Chief	PERSON	0.58+
The Washington Post Post	TITLE	0.57+
America	ORGANIZATION	0.56+
Goto wicked	ORGANIZATION	0.54+
CEO	PERSON	0.54+
couple	QUANTITY	0.54+
CDO	ORGANIZATION	0.45+
Stone	PERSON	0.43+
CDOIQ	TITLE	0.24+

Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019

>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Mark	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
15 petabytes	QUANTITY	0.99+
Samsung	ORGANIZATION	0.99+
Inderpal Bhandari	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Barron	PERSON	0.99+
Ramsey International, LLC	ORGANIZATION	0.99+
26 thousand schemas	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
18 years	QUANTITY	0.99+
2015	DATE	0.99+
thousands	QUANTITY	0.99+
Einstein	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
tomorrow	DATE	0.99+
Samsung Mobile	ORGANIZATION	0.99+
26 thousand	QUANTITY	0.99+
Ramsey International LLC	ORGANIZATION	0.99+
30 plus year	QUANTITY	0.99+
a year later	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
20	QUANTITY	0.99+
25	QUANTITY	0.99+
Both	QUANTITY	0.99+
first step	QUANTITY	0.99+
one petabyte	QUANTITY	0.98+
today	DATE	0.98+
15	QUANTITY	0.98+
one	QUANTITY	0.98+
three approaches	QUANTITY	0.98+
13th year	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
seven years ago	DATE	0.97+
McDonald's	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.95+
R&D	ORGANIZATION	0.95+
10 years ago	DATE	0.95+
this morning	DATE	0.94+
this evening	DATE	0.93+
one place	QUANTITY	0.93+
one perspective	QUANTITY	0.92+
about a year and a half ago	DATE	0.91+
over 32 years ago	DATE	0.9+
a lot of talk	QUANTITY	0.9+
a billion documents	QUANTITY	0.9+
CDO	TITLE	0.89+
decades	QUANTITY	0.88+
one statistic	QUANTITY	0.87+
2019	DATE	0.85+
first data	QUANTITY	0.84+
of years ago	DATE	0.83+
Step two	QUANTITY	0.8+
Tamr	OTHER	0.77+
Information Quality Symposium 2019	EVENT	0.77+
PowerPoints	TITLE	0.76+
documents	QUANTITY	0.75+
theCUBE	ORGANIZATION	0.75+
one physical	QUANTITY	0.73+
10 years	QUANTITY	0.72+
87, 88 range	QUANTITY	0.71+
President	PERSON	0.7+
Chief Data Officer	PERSON	0.7+
Enterprise Data Warehouse	ORGANIZATION	0.66+
Goka-lan	ORGANIZATION	0.66+
first Chief Data	QUANTITY	0.63+
first Chief Data Officer	QUANTITY	0.63+
Edge	TITLE	0.63+
tons	QUANTITY	0.62+

Mark Clare, AstraZeneca & Glenn Finch, IBM | IBM CDO Summit 2019

>> live from San Francisco, California. It's the key. You covering the IBM chief Data officer? Someone brought to you by IBM. >> We're back at the IBM CDO conference. Fisherman's Worf Worf in San Francisco. You're watching the Cube, the leader in life tech coverage. My name is David Dante. Glenn Finches. Here's the global leader of Big Data Analytics and IBM, and we're pleased to have Mark Clare. He's the head of data enablement at AstraZeneca. Gentlemen, welcome to the Cube. Thanks for coming on my mark. I'm gonna start with this head of data Data Enablement. That's a title that I've never heard before. And I've heard many thousands of titles in the Cube. What is that all about? >> Well, I think it's the credit goes to some of the executives at AstraZeneca when they recruited me. I've been a cheap date officer. Several the major financial institutions, both in the U. S. And in Europe. Um, AstraZeneca wanted to focus on how we actually enable our business is our science areas in our business is so it's not unlike a traditional CDO role, but we focus a lot more on what the enabling functions or processes would be >> So it sounds like driving business value is really the me and then throw. Sorry. >> I've always looked at this role in three functions value, risk and cost. So I think that in any CDO role, you have to look at all three. I think the you'd slide it if you didn't. This one with the title. Obviously, we're looking at quite a bit at the value we will drive across the the firm on how to leverage our date in a different way. >> I love that because you can quantify all three. All right, Glenn. So you're the host of this event. So awesome. I love that little presentation that you gave. So for those you didn't see it, you gave us pay stubs and then you gave us a website and said, Take a picture of the paste up, uploaded, and then you showed how you're working with your clients. Toe. Actually digitize that and compress all kinds of things. Time to mortgage origination. Time to decision. So explain that a little bit. And what's that? What's the tech behind that? And how are people using it? You know, >> for three decades, we've had this OCR technology where you take a piece of paper, you tell the machine what's on the paper. What longitudinal Enter the coordinates are and you feed it into the hope and pray to God that it isn't in there wrong. The form didn't change anything like that. That's what that's way. We've lived for three decades with cognitive and a I, but I read things like the human eye reads things. And so you put the page in and the machine comes back and says, Hey, is this invoice number? Hey, is this so security number? That's how you train it as compared to saying, Here's what it So we use this cognitive digitization capability to grab data that's locked in documents, and then you bring it back to the process so that you can digitally re imagine the process. Now there's been a lot of use of robotics and things like that. I'm kind of taken existing processes, and I'm making them incrementally. Better write This says look, you now have the data of the process. You can re imagine it. However, in fact, the CEO of our client ADP said, Look, I want you to make me a Netflix, not a blood Urbach Blockbuster, right? So So it's a mind shift right to say we'll use this data will read it with a I will digitally re imagine the process. And it usually cuts like 70 or 80% of the cycle time, 50 to 75% of the cost. I mean, it's it's pretty groundbreaking when you see it. >> So markets ahead of data neighborhood. You hear something like that and you're not. You're not myopically focused on one little use case. You're taking a big picture of you doing strategies and trying to develop a broader business cases for the organization. But when you see an example like that and many examples out there, I'm sure the light bulbs go off. So >> I wrote probably 10 years cases down while >> Glenn was talking about you. You do get tactical, Okay, but but But where do you start when you're trying to solve these problems? >> Well, I look att, Glenn's example, And about five and 1/2 years ago, Glenn was one I went to had gone to a global financial service, firms on obviously having scale across dozens of countries, and I had one simple request. Thio Glenn's team as well as a number of other technology companies. I want cognitive intelligence for on data in Just because the process is we've had done for 20 years just wouldn't scale not not its speed across many different languages and cultures. And I now look five and 1/2 years later, and we have beginning of, I would say technology opportunities. When I asked Glenn that question, he was probably the only one that didn't think I had horns coming out of my head, that I was crazy. I mean, some of the leading technology firms thought I was crazy asking for cognitive data management capabilities, and we are five and 1/2 years later and we're seeing a I applied not just on the front end of analytics, but back in the back end of the data management processes themselves started automate. So So I look, you know, there's a concept now coming out day tops on date offices. You think of what Dev Ops is. It's bringing within our data management processes. It's bringing cognitive capabilities to every process step, And what level of automation can we do? Because the, you know, for typical data science experiment 80 to 90% of that work Estate engineering. If I can automate that, then through a date office process, then I could get to incite much faster, but not in scale it and scale a lot more opportunities and have to manually do it. So I I look at presentations and I think, you know, in every aspect of our business, where we clear could we apply >> what you talk about date engineering? You talk about data scientist spending his or her time just cleaning the wrangling data, All the all the not fun stuff exactly plugging in cables back in the infrastructure date. >> You're seeing horror stories right now. I heard from a major academic institution. A client came to them and their data scientists. They had spent several years building. We're spending 99% of their time trying to cleanse and prep data. They were spend 90% cleansing and prepping, and of the remaining 10% 90% of that fixing it where they fix it wrong and the first time so they had 1% of their job doing their job. So this is a huge opportunity. You can start automating more of that and actually refocusing data science on data >> science. So you've been a chief data officer number of financial institutions. You've got this kind of cool title now, which touches on some of the things a CDO might do and your technical. We got a technical background. So when you look a lot of the what Ginny Rometty calls incumbents, call them incumbent Disruptors two years ago at Ivy and think they've got data that has been hardened, you know, in all these projects and use cases and it's locked and people talk about the silos, part of your role is to figure out Okay, how do we get that data out? Leverage. It put it at the core. Is that is that fair? >> Well, and I'm gonna stay away from the word core cause to make core Kenan for kind of legacy processes of building a single repositories single warehouse, which is very time consuming. So I think I can I leave it where it is, but find a wayto to unify it. >> Not physically, exactly what I say. Corny, but actually the court, that's what we need >> to think about is how to do this logically and cream or of Ah unification approach that has speed and agility with it versus the old physical approaches, which took time. And resource is >> so That's a that's a computer science problem that people have been trying to solve for years. Decentralized, distributed, dark detectors, right? And why is it that we're now able Thio Tap your I think it's >> a perfect storm of a I of Cloud, the cloud native of Io ti, because when you think of I o. T, it's a I ot to be successful fabric that can connect millions of devices or millions of sensors. So you'd be paired those three with the investment big data brought in the last seven or eight years and big data to me. Initially, when I started talking to companies in the Valley 10 years ago, the early days of, um, apparatus, what I saw or companies and I could get almost any of the digital companies in the valley they were not. They were using technology to be more agile. They were finding agile data science. Before we call the data signs the map produce and Hadoop, we're just and after almost not an afterthought. But it was just a mechanism to facilitate agility and speed. And so if you look at how we built out all the way up today and all the convergence of all these new technologies, it's a perfect storm to actually innovate differently. >> Well, what was profound about my producing in the dupe? It was like leave the data where it is and shipped five megabytes a code two upended by the data and that you bring up a good point. We've now, we spent 10 years leveraging that at a much lower cost. And you've got the cloud now for scale. And now machine intelligence comes in that you can apply in the data causes. Bob Pityana once told me, Data's plentiful insights aren't Amen to that. So Okay, so this is really interesting discussion. You guys have known each other for a couple of couple of decades. How do you work together toe to solve problems Where what is that conversation like, Do >> you want to start that? >> So, um, first of all, we've never worked together on solving small problems, not commodity problems. We would usually tackle something that someone would say would not be possible. So normally Mark is a change agent wherever he goes. And so he usually goes to a place that wants to fix something or change something in an abnormally short amount of time for an abnormally small amount of money. Right? So what's strange is that we always find that space together. Mark is very judicious about using us as a service is firm toe help accelerate those things. But then also, we build in a plan to transition us away in transition, in him into full ownership. Right. But we usually work together to jump start one of these wicked, hard, wicked, cool things that nobody else >> was. People hate you. At first. They love you. I would end the one >> institution and on I said, OK, we're going to a four step plan. I'm gonna bring the consultants in day one while we find Thailand internally and recruit talent External. That's kind of phases one and two in parallel. And then we're gonna train our talent as we find them, and and Glenn's team will knowledge transfer, and by face for where, Rayna. And you know, that's a model I've done successfully in several organizations. People can. I hated it first because they're not doing it themselves, but they may not have the experience and the skills, and I think as soon as you show your staff you're willing to invest in them and give them the time and exposure. The conversation changes, but it's always a little awkward. At first, I've run heavy attrition, and some organizations at first build the organizations. But the one instance that Glen was referring to, we came in there and they had a 4 1 1 2 1 12 to 15 year plan and the C I O. Looked at me, he says. I'll give you two years. I'm a bad negotiator. I got three years out of it and I got a business case approved by the CEO a week later. It was a significant size business case in five minutes. I didn't have to go back a second or third time, but we said We're gonna do it in three years. Here's how we're gonna scale an organization. We scaled more than 1000 person organization in three years of talent, but we did it in a planned way and in that particular organization, probably a year and 1/2 in, I had a global map of every data and analytics role I need and I could tell you were in the US they set and with what competitors earning what industry and where in India they set and in what industry And when we needed them. We went out and recruited, but it's time to build that. But you know, in any really period, I've worked because I've done this 20 plus years. The talent changes. The location changes someone, but it's always been a challenge to find him. >> I guess it's good to have a deadline. I guess you did not take the chief data officer role in your current position. Explain that. What's what. What's your point of view on on that role and how it's evolved and how it's maybe being used in ways that don't I >> mean, I think that a CDO, um on during the early days, there wasn't a definition of a matter of fact. Every time I get a recruiter, call me all. We have a great CDO row for first time I first thing I asked him, How would you define what you mean by CDO? Because I've never seen it defined the same way into cos it's just that way But I think that the CDO, regardless of institutions, responsibility end in to make sure there's an Indian framework from strategy execution, including all of the governance and compliance components, and that you have ownership of each piece in the organization. CDO most companies doesn't own all of that, but I think they have a responsibility and too many organizations that hasn't occurred. So you always find gaps and each organization somewhere between risk costs and value, in terms of how how they're, how the how the organization's driving data and in my current role. Like I said, I wanted to focus. We want the focus to really be on how we're enabling, and I may be enabling from a risk and compliance standpoint, Justus greatly as I'm enabling a gross perspective on the business or or cost management and cost reductions. We have been successful in several programs for self funding data programs for multi gears. By finding and costs, I've gone in tow several organizations that it had a decade of merger after merger and Data's afterthought in almost any merger. I mean, there's a Data Silas section session tomorrow. It'd be interesting to sit through that because I've found that data data is the afterthought in a lot of mergers. But yet I knew of one large health care company. They've made data core to all of their acquisitions, and they was one the first places they consolidated. And they grew faster by acquisition than any of their competitors. So I think there's a There's a way to do it correctly. But in most companies you go in, you'll find all kinds of legacy silos on duplication, and those are opportunities to, uh, to find really reduce costs and self fund. All the improvements, all the strategic programs you wanted, >> a number inferring from the Indian in the data roll overlaps or maybe better than gaps and data is that thread between cost risk. And it is >> it is. And I've been lucky in my career. I've report toe CEOs. I reported to see Yellows, and I've reported to CEO, so I've I've kind of reported in three different ways, and each of those executives really looked at it a little bit differently. Value obviously is in a CEO's office, you know, compliance. Maurizio owes office and costs was more in the c i o domain, but you know, we had to build a program looking >> at all three. >> You know, I think this topic, though, that we were just talking about how these rules are evolving. I think it's it's natural, because were about 5 2.0. to 7 years into the evolution of the CDO, it might be time for a CDO Um, and you see Maur CEOs moving away from pure policy and compliance Tomb or value enablement. It's a really hard change, and that's why you're starting to Seymour turnover of some of the studios because people who are really good CEOs at policy and risk and things like that might not be the best enablers, right? So I think it's pretty natural evolution. >> Great discussion, guys. We've got to leave it there, They say. Data is the new oil date is more valuable than oil because you could use data to reduce costs to reduce risk. The same data right toe to drive revenue, and you can't put a gallon of oil in your car and a quart of oil in the car quarter in your house of data. We think it's even more valuable. Gentlemen, thank you so much for coming on the cues. Thanks so much. Lot of fun. Thanks. Keep right, everybody. We'll be back with our next guest. You're watching the Cube from IBM CDO 2019 right back.

Published Date : Jun 24 2019

SUMMARY :

Someone brought to you by IBM. Here's the global leader of Big Data Analytics and IBM, and we're pleased to have Mark Clare. Well, I think it's the credit goes to some of the executives at AstraZeneca when So it sounds like driving business value is really the me and So I think that in any CDO role, you have to look at all three. I love that little presentation that you gave. However, in fact, the CEO of our client ADP said, Look, I want you to But when you see an example like that and Okay, but but But where do you start when you're trying to solve these problems? So I I look at presentations and I think, you know, what you talk about date engineering? and of the remaining 10% 90% of that fixing it where they fix it wrong and the first time so they had 1% of the what Ginny Rometty calls incumbents, call them incumbent Disruptors two years ago Well, and I'm gonna stay away from the word core cause to make core Kenan for kind of legacy Corny, but actually the court, that's what we need to think about is how to do this logically and cream or of Ah unification approach that has speed and I think it's And so if you look at how we built out all the way up today and all the convergence of all And now machine intelligence comes in that you can apply in the data causes. something that someone would say would not be possible. I would end the one I had a global map of every data and analytics role I need and I could tell you were I guess you did not take the chief and that you have ownership of each piece in the organization. a number inferring from the Indian in the data roll overlaps or maybe better domain, but you know, we had to build a program looking Um, and you see Maur CEOs moving away from pure and you can't put a gallon of oil in your car and a quart of oil in the car quarter in your house of data.

ENTITIES

Entity	Category	Confidence
Glenn	PERSON	0.99+
Bob Pityana	PERSON	0.99+
AstraZeneca	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
David Dante	PERSON	0.99+
Mark Clare	PERSON	0.99+
Mark	PERSON	0.99+
50	QUANTITY	0.99+
Europe	LOCATION	0.99+
20 years	QUANTITY	0.99+
99%	QUANTITY	0.99+
70	QUANTITY	0.99+
two years	QUANTITY	0.99+
90%	QUANTITY	0.99+
San Francisco, California	LOCATION	0.99+
10 years	QUANTITY	0.99+
10%	QUANTITY	0.99+
Glen	PERSON	0.99+
India	LOCATION	0.99+
three years	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
Ginny Rometty	PERSON	0.99+
five minutes	QUANTITY	0.99+
US	LOCATION	0.99+
Maurizio	PERSON	0.99+
80%	QUANTITY	0.99+
1%	QUANTITY	0.99+
five megabytes	QUANTITY	0.99+
each piece	QUANTITY	0.99+
millions	QUANTITY	0.99+
three decades	QUANTITY	0.99+
both	QUANTITY	0.99+
Rayna	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
U. S.	LOCATION	0.99+
80	QUANTITY	0.99+
20 plus years	QUANTITY	0.99+
tomorrow	DATE	0.99+
each	QUANTITY	0.99+
Thio Glenn	PERSON	0.99+
a week later	DATE	0.99+
Glenn Finch	PERSON	0.99+
one	QUANTITY	0.99+
more than 1000 person	QUANTITY	0.99+
Big Data Analytics	ORGANIZATION	0.99+
second	QUANTITY	0.99+
75%	QUANTITY	0.99+
ADP	ORGANIZATION	0.99+
7 years	QUANTITY	0.98+
first time	QUANTITY	0.98+
three	QUANTITY	0.98+
10 years ago	DATE	0.98+
each organization	QUANTITY	0.98+
Glenn Finches	PERSON	0.98+
Ivy	ORGANIZATION	0.98+
15 year	QUANTITY	0.98+
third time	QUANTITY	0.98+
two years ago	DATE	0.98+
today	DATE	0.97+
first places	QUANTITY	0.97+
first	QUANTITY	0.97+
single warehouse	QUANTITY	0.97+
first time	QUANTITY	0.97+
a year	QUANTITY	0.97+
millions of devices	QUANTITY	0.97+
Thailand	LOCATION	0.96+
one instance	QUANTITY	0.96+
1/2	QUANTITY	0.96+
Seymour	PERSON	0.95+
two	QUANTITY	0.95+
four step	QUANTITY	0.94+
one simple request	QUANTITY	0.93+
first thing	QUANTITY	0.93+

Mick Hollison, Cloudera | theCUBE NYC 2018

(lively peaceful music) >> Live, from New York, it's The Cube. Covering "The Cube New York City 2018." Brought to you by SiliconANGLE Media and its ecosystem partners. >> Well, everyone, welcome back to The Cube special conversation here in New York City. We're live for Cube NYC. This is our ninth year covering the big data ecosystem, now evolved into AI, machine learning, cloud. All things data in conjunction with Strata Conference, which is going on right around the corner. This is the Cube studio. I'm John Furrier. Dave Vellante. Our next guest is Mick Hollison, who is the CMO, Chief Marketing Officer, of Cloudera. Welcome to The Cube, thanks for joining us. >> Thanks for having me. >> So Cloudera, obviously we love Cloudera. Cube started in Cloudera's office, (laughing) everyone in our community knows that. I keep, keep saying it all the time. But we're so proud to have the honor of working with Cloudera over the years. And, uh, the thing that's interesting though is that the new building in Palo Alto is right in front of the old building where the first Palo Alto office was. So, a lot of success. You have a billboard in the airport. Amr Awadallah is saying, hey, it's a milestone. You're in the airport. But your business is changing. You're reaching new audiences. You have, you're public. You guys are growing up fast. All the data is out there. Tom's doing a great job. But, the business side is changing. Data is everywhere, it's a big, hardcore enterprise conversation. Give us the update, what's new with Cloudera. >> Yeah. Thanks very much for having me again. It's, it's a delight. I've been with the company for about two years now, so I'm officially part of the problem now. (chuckling) It's been a, it's been a great journey thus far. And really the first order of business when I arrived at the company was, like, welcome aboard. We're going public. Time to dig into the S-1 and reimagine who Cloudera is going to be five, ten years out from now. And we spent a good deal of time, about three or four months, actually crafting what turned out to be just 38 total words and kind of a vision and mission statement. But the, the most central to those was what we were trying to build. And it was a modern platform for machine learning analytics in the cloud. And, each of those words, when you unpack them a little bit, are very, very important. And this week, at Strata, we're really happy on the modern platform side. We just released Cloudera Enterprise Six. It's the biggest release in the history of the company. There are now over 30 open-source projects embedded into this, something that Amr and Mike could have never imagined back in the day when it was just a couple of projects. So, a very very large and meaningful update to the platform. The next piece is machine learning, and Hilary Mason will be giving the kickoff tomorrow, and she's probably forgotten more about ML and AI than somebody like me will ever know. But she's going to give the audience an update on what we're doing in that space. But, the foundation of having that data management platform, is absolutely fundamental and necessary to do good machine learning. Without good data, without good data management, you can't do good ML or AI. Sounds sort of simple but very true. And then the last thing that we'll be announcing this week, is around the analytics space. So, on the analytic side, we announced Cloudera Data Warehouse and Altus Data Warehouse, which is a PaaS flavor of our new data warehouse offering. And last, but certainly not least, is just the "optimize for the cloud" bit. So, everything that we're doing is optimized not just around a single cloud but around multi-cloud, hybrid-cloud, and really trying to bridge that gap for enterprises and what they're doing today. So, it's a new Cloudera to say the very least, but it's all still based on that core foundation and platform that, you got to know it, with very early on. >> And you guys have operating history too, so it's not like it's a pivot for Cloudera. I know for a fact that you guys had very large-scale customers, both with three letter, letters in them, the government, as well as just commercial. So, that's cool. Question I want to ask you is, as the conversation changes from, how many clusters do I have, how am I storing the data, to what problems am I solving because of the enterprises. There's a lot of hard things that enterprises want. They want compliance, all these, you know things that have either legacy. You guys work on those technical products. But, at the end of the day, they want the outcomes, they want to solve some problems. And data is clearly an opportunity and a challenge for large enterprises. What problems are you guys going after, these large enterprises in this modern platform? What are the core problems that you guys knock down? >> Yeah, absolutely. It's a great question. And we sort of categorize the way we think about addressing business problems into three broad categories. We use the terms grow, connect, and protect. So, in the "grow" sense, we help companies build or find new revenue streams. And, this is an amazing part of our business. You see it in everything from doing analytics on clickstreams and helping people understand what's happening with their web visitors and the like, all the way through to people standing up entirely new businesses based simply on their data. One large insurance provider that is a customer of ours, as an example, has taken on the challenge and asked us to engage with them on building really, effectively, insurance as a service. So, think of it as data-driven insurance rates that are gauged based on your driving behaviors in real time. So no longer simply just using demographics as the way that you determine, you know, all 18-year old young men are poor drivers. As it turns out, with actual data you can find out there's some excellent 18 year olds. >> Telematic, not demographics! >> Yeah, yeah, yeah, exactly! >> That Tesla don't connect to the >> Exactly! And Parents will love this, love this as well, I think. So they can find out exactly how their kids are really behaving by the way. >> They're going to know I rolled through the stop signs in Palo Alto. (laughing) My rates just went up. >> Exactly, exactly. So, so helping people grow new businesses based on their data. The second piece is "Connect". This is not just simply connecting devices, but that's a big part of it, so the IOT world is a big engine for us there. One of our favorite customer stories is a company called Komatsu. It's a mining manufacturer. Think of it as the ones that make those, just massive mines that are, that are all over the world. They're particularly big in Australia. And, this is equipment that, when you leave it sit somewhere, because it doesn't work, it actually starts to sink into the earth. So, being able to do predictive maintenance on that level and type and expense of equipment is very valuable to a company like Komatsu. We're helping them do that. So that's the "Connect" piece. And last is "Protect". Since data is in fact the new oil, the most valuable resource on earth, you really need to be able to protect it. Whether that's from a cyber security threat or it's just meeting compliance and regulations that are put in place by governments. Certainly GDPR is got a lot of people thinking very differently about their data management strategies. So we're helping a number of companies in that space as well. So that's how we kind of categorize what we're doing. >> So Mick, I wonder if you could address how that's all affected the ecosystem. I mean, one of the misconceptions early on was that Hadoop, Big Data, is going to kill the enterprise data warehouse. NoSQL is going to knock out Oracle. And, Mike has always said, "No, we are incremental". And people are like, "Yeah, right". But that's really, what's happened here. >> Yes. >> EDW was a fundamental component of your big data strategies. As Amr used to say, you know, SQL is the killer app for, for big data. (chuckling) So all those data sources that have been integrated. So you kind of fast forward to today, you talked about IOT and The Edge. You guys have announced, you know, your own data warehouse and platform as a service. So you see this embracing in this hybrid world emerging. How has that affected the evolution of your ecosystem? >> Yeah, it's definitely evolved considerably. So, I think I'd give you a couple of specific areas. So, clearly we've been quite successful in large enterprises, so the big SI type of vendors want a, want a piece of that action these days. And they're, they're much more engaged than they were early days, when they weren't so sure all of this was real. >> I always say, they like to eat at the trough and then the trough is full, so they dive right in. (all laughing) They're definitely very engaged, and they built big data practices and distinctive analytics practices as well. Beyond that, sort of the developer community has also begun to shift. And it's shifted from simply people that could spell, you know, Hive or could spell Kafka and all of the various projects that are involved. And it is elevated, in particular into a data science community. So one of additional communities that we sort of brought on board with what we're doing, not just with the engine and SPARK, but also with tools for data scientists like Cloudera Data Science Workbench, has added that element to the community that really wasn't a part of it, historically. So that's been a nice add on. And then last, but certainly not least, are the cloud providers. And like everybody, they're, those are complicated relationships because on the one hand, they're incredibly valuable partners to it, certainly both Microsoft and Amazon are critical partners for Cloudera, at the same time, they've got competitive offerings. So, like most successful software companies there's a lot of coopetition to contend with that also wasn't there just a few years ago when we didn't have cloud offerings, and they didn't have, you know, data warehouse in the cloud offerings. But, those are things that have sort of impacted the ecosystem. >> So, I've got to ask you a marketing question, since you're the CMO. By the way, great message UL. I like the, the "grow, connect, protect." I think that's really easy to understand. >> Thank you. >> And the other one was modern. The phrase, say the phrase again. >> Yeah. It's the "Cloudera builds the modern platform for machine learning analytics optimized for the cloud." >> Very tight mission statement. Question on the name. Cloudera. >> Mmhmm. >> It's spelled, it's actually cloud with ERA in the letters, so "the cloud era." People use that term all the time. We're living in the cloud era. >> Yes. >> Cloud-native is the hottest market right now in the Linux foundation. The CNCF has over two hundred and forty members and growing. Cloud-native clearly has indicated that the new, modern developers here in the renaissance of software development, in general, enterprises want more developers. (laughs) Not that you want to be against developers, because, clearly, they're going to hire developers. >> Absolutely. >> And you're going to enable that. And then you've got the, obviously, cloud-native on-premise dynamic. Hybrid cloud and multi-cloud. So is there plans to think about that cloud era, is it a cloud positioning? You see cloud certainly important in what you guys do, because the cloud creates more compute, more capabilities to move data around. >> Sure. >> And (laughs) process it. And make it, make machine learning go faster, which gives more data, more AI capabilities, >> It's the flywheel you and I were discussing. >> It's the flywheel of, what's the innovation sandwich, Dave? You know? (laughs) >> A little bit of data, a little bit of machine itelligence, in the cloud. >> So, the innovation's in play. >> Yeah, Absolutely. >> Positioning around Cloud. How are you looking at that? >> Yeah. So, it's a fascinating story. You were with us in the earliest days, so you know that the original architecture of everything that we built was intended to be run in the public cloud. It turns out, in 2008, there were exactly zero customers that wanted all of their data in a public cloud environment. So the company actually pivoted and re-architected the original design of the offerings to work on-prim. And, no sooner did we do that, then it was time to re-architect it yet again. And we are right in the midst of doing that. So, we really have offerings that span the whole gamut. If you want to just pick up you whole current Cloudera environment in an infrastructure as a service model, we offer something called Altus Director that allows you to do that. Just pick up the entire environment, step it up onto AWUS, or Microsoft Azure, and off you go. If you want the convenience and the elasticity and the ease of use of a true platform as a service, just this past week we announced Altus Data Warehouse, which is a platform as a service kind of a model. For data warehousing, we have the data engineering module for Altus as well. Last, but not least, is everybody's not going to sign up for just one cloud vendor. So we're big believers in multi-cloud. And that's why we support the major cloud vendors that are out there. And, in addition to that, it's going to be a hybrid world for as far out as we can see it. People are going to have certain workloads that, either for economics or for security reasons, they're going to continue to want to run in-house. And they're going to have other workloads, certainly more transient workloads, and I think ML and data science will fall into this camp, that the public cloud's going to make a great deal of sense. And, allowing companies to bridge that gap while maintaining one security compliance and management model, something we call a Shared Data Experience, is really our core differentiator as a business. That's at the very core of what we do. >> Classic cloud workload experience that you're bringing, whether it's on-prim or whatever cloud. >> That's right. >> Cloud is an operating environment for you guys. You look at it just as >> The delivery mechanism. In effect. Awesome. All right, future for Cloudera. What can you share with us. I know you're a public company. Can't say any forward-looking statements. Got to do all those disclaimers. But for customers, what's the, what's the North Star for Cloudera? You mentioned going after a much more hardcore enterprise. >> Yes. >> That's clear. What's the North Star for you guys when you talk to customers? What's the big pitch? >> Yeah. I think there's a, there's a couple of really interesting things that we learned about our business over the course of the past six, nine months or so here. One, was that the greatest need for our offerings is in very, very large and complex enterprises. They have the most data, not surprisingly. And they have the most business gain to be had from leveraging that data. So we narrowed our focus. We have now identified approximately five thousand global customers, so think of it as kind of Fortune or Forbes 5000. That is our sole focus. So, we are entirely focused on that end of the market. Within that market, there are certain industries that we play particularly well in. We're incredibly well-positioned in financial services. Very well-positioned in healthcare and telecommunications. Any regulated industry, that really cares about how they govern and maintain their data, is really the great target audience for us. And so, that continues to be the focus for the business. And we're really excited about that narrowing of focus and what opportunities that's going to build for us. To not just land new customers, but more to expand our existing ones into a broader and broader set of use cases. >> And data is coming down faster. There's more data growth than ever seen before. It's never stopping.. It's only going to get worse. >> We love it. >> Bring it on. >> Any way you look at it, it's getting worse or better. Mick, thanks for spending the time. I know you're super busy with the event going on. Congratulations on the success, and the focus, and the positioning. Appreciate it. Thanks for coming on The Cube. >> Absolutely. Thank you gentlemen. It was a pleasure. >> We are Cube NYC. This is our ninth year doing all action. Everything that's going on in the data world now is horizontally scaling across all aspects of the company, the society, as we know. It's super important, and this is what we're talking about here in New York. This is The Cube, and John Furrier. Dave Vellante. Be back with more after this short break. Stay with us for more coverage from New York City. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media This is the Cube studio. is that the new building in Palo Alto is right So, on the analytic side, we announced What are the core problems that you guys knock down? So, in the "grow" sense, we help companies by the way. They're going to know I rolled Since data is in fact the new oil, address how that's all affected the ecosystem. How has that affected the evolution of your ecosystem? in large enterprises, so the big and all of the various projects that are involved. So, I've got to ask you a marketing question, And the other one was modern. optimized for the cloud." Question on the name. We're living in the cloud era. Cloud-native clearly has indicated that the new, because the cloud creates more compute, And (laughs) process it. machine itelligence, in the cloud. How are you looking at that? that the public cloud's going to make a great deal of sense. Classic cloud workload experience that you're bringing, Cloud is an operating environment for you guys. What can you share with us. What's the North Star for you guys is really the great target audience for us. And data is coming down faster. and the positioning. Thank you gentlemen. is horizontally scaling across all aspects of the

ENTITIES

Entity	Category	Confidence
Komatsu	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Mick Hollison	PERSON	0.99+
Mike	PERSON	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2008	DATE	0.99+
Palo Alto	LOCATION	0.99+
Tom	PERSON	0.99+
New York	LOCATION	0.99+
Mick	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Tesla	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Hilary Mason	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
three letter	QUANTITY	0.99+
North Star	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
zero customers	QUANTITY	0.99+
five	QUANTITY	0.99+
18 year	QUANTITY	0.99+
ninth year	QUANTITY	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.99+
ten years	QUANTITY	0.98+
four months	QUANTITY	0.98+
over two hundred and forty members	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
first	QUANTITY	0.98+
NoSQL	TITLE	0.98+
The Cube	ORGANIZATION	0.98+
over 30 open-source projects	QUANTITY	0.98+
Amr	PERSON	0.98+
today	DATE	0.98+
SQL	TITLE	0.98+
each	QUANTITY	0.98+
GDPR	TITLE	0.98+
tomorrow	DATE	0.98+
Cube	ORGANIZATION	0.97+
approximately five thousand global customers	QUANTITY	0.97+
Strata	ORGANIZATION	0.96+
about two years	QUANTITY	0.96+
Altus	ORGANIZATION	0.96+
earth	LOCATION	0.96+
EDW	TITLE	0.95+
18-year old	QUANTITY	0.95+
Strata Conference	EVENT	0.94+
few years ago	DATE	0.94+
one	QUANTITY	0.94+
AWUS	TITLE	0.93+
Altus Data Warehouse	ORGANIZATION	0.93+
first order	QUANTITY	0.93+
single cloud	QUANTITY	0.93+
Cloudera Enterprise Six	TITLE	0.92+
about three	QUANTITY	0.92+
Cloudera	TITLE	0.84+
three broad categories	QUANTITY	0.84+
past six	DATE	0.82+

Ronen Schwartz, Informatica & John Macintyre, Microsoft | Informatica World 2018

>> Narrator: Live from Las Vegas, it's The Cube! Covering Informatica World 2018. Brought to you by Informatica. >> Welcome back, everyone. We're live here in Las Vegas at the Venetian. This is Informatica World 2018. This is The Cube's exclusive coverage. I'm John Furrier, cohost of The Cube, with Peter Burris, my cohost for the past two days. Wall-to-wall coverage. Our next two guests are Ronen Schwartz, SVP's Junior Vice President, General Manager, Big Data Cloud, and Data Integration for Informatica; and John MacIntyre, who's the product management for Azure Sequel Data Warehouse with Microsoft. Part of the big news this morning on the keynote is the relationship between Microsoft Azure Cloud and Informatica. Welcome back, welcome to The Cube! Thanks for coming! >> Yeah, it's good to be here. >> So great to have you guys on, we were looking forward to this interview all morning, all day. We heard about the rumor of the news. Let's jump into it. But I want you to highlight the relationship, how you guys got here, because it's not just news, it's not just an announcement. There's actually code, shipping, product integration, push button, console, it's cloud, it's real cloud, hyper cloud. >> John: Yeah, yeah, absolutely. >> It's a real product. >> John M.: Absolutely. >> Yeah, definitely, this is correct and I do want to encourage the audience to go directly to the Azure environment, try SQL Data Warehouse and try to load as much data as possible, leverage the Informatica intelligent cloud services. It is, as you said, available today. >> Okay, so explain the product. Let's say you got the Informatica intelligent cloud services on Azure. What is the specific product? Take us through specifically what's happening and what is the impact to customers? >> So if you are a customer and you're looking to get agility, you want to get scale, you want to enjoy the benefits of cloud data warehouse, one of the first barriers that you have is how do I get my data into these new amazing capabilities that I can achieve in the cloud. And I think with this announcement we're simplifying that process and making it really streamlined. From within the same place that you start your new data warehouse, in one click you're actually coming to the strongest IPES that exists in the market and you are able to choose your data source and actually decide what data do you want to move and then in a very simple process, move that data into Azure SQL Data Warehouse. >> John, talk about the ease of use, because one of the things that pops in my head when I think about data is, man it's a pain in the butt. I got to do all this stuff, I got to get it off a storage drive, I got to upload it, I got to set it on a drive, FedEx the drive, whatever. Cloud has to be console based. Talk about that aspect of this deal. >> Well I think, John, you know one of the things that you'll hear from Microsoft is that we want to build the most productive cloud available for customers and when we look at it as Ronen was saying, excuse me, we move data, we get data connected into the Azure cloud and how do we do that in a push button way and so what you'll see through the integration that we've done is that all the way through single sign on, that you can just push a button, build that pipeline, get that data flowing from your on-premises environment and get that into the Azure SQL Data Warehouse with just pushing a few buttons and so what we see is customers are able to really accelerate their migration and movement to the cloud through that productivity. >> And how long has it been in the works for? You guys just didn't meet yesterday and did product integration. Talk about the relationship with Informatica. >> Yeah, we've been working with Informatica for years. Informatica's been a great partner and so we started working on this integration, I think, probably over a year ago and really envisioning what we could do for customers. How do we take all of the really great capabilities that Informatica brings to customers and connect those to the Azure cloud. One of the things that we believe for customers is that customers will live in a hybrid world, at least for some foreseeable time and so how do we enable customers to live in that world, to have their data spread across that world, and get all the lineage, governance, and data management capabilities that you need as an enterprise in this world and that's one of the great things that Informatica brings to the table here. >> And Microsoft, your ethos too is also your, seems to be and you can confirm this if it's true or not, to be open for data portability. >> John M.: Yeah. >> Certainly, GDPR has certainly a huge signal to the market that look, no one's going to fool around with this. Data's at the center of the value proposition. It has to move around. >> That's right. And so when we think about data, data interoperability, data portability, recently we introduced Azure Databricks as a GA service on Azure and so we've already done data interoperability across our relational data warehouse products as well as the Databricks products, so Spark and Spark runtimes can interoperate and have data access with the relational warehouse and the relational warehouse can load into Spark Clusters and so we see this giving customers the freedom to move their data and have their data in places that they need them as critical for them to be successful. >> Ronen, let me just get specific on the news here a second. The product is GA or preview, or? >> The product is in preview and it will be fully GA'd in the Q3 time frame, hopefully the middle toward the end of Q3. Customer can start experiencing with the product today and they will actually see us adding more and more capabilities to this experience even before the GA. >> What are some of the things the customers have been asking for? I know you guys do a lot of work on the product side with the customers so I want to ask the requirements that you guys put together on defining this product. What were some of things that were their pain points that you're solving and was it the ease of use, was it part of the plan of enterprise cataloging? Where did you guys come down when you did your PRD, or your requirements and all this stuff? >> So we've been working with customers and with partners for the last few years over their journey to adopt cloud and I think what we've seen is part of the challenges of adopting cloud was where do I start? How do I figure out what data should I move to the cloud first? What is actually going to be impacted by me doing this? One impact you touch which is security and privacy. Am I putting something in risk? Am I following the company policies? But other things is like, what other system are depending on this data to exist here and so when I move to the cloud, am I actually changing my overall enterprise data architecture? Where Informatica have been focusing, especially with the new catalog capabilities is in really giving the enterprise the full picture of the data. If data is the most important asset that you have, we're actually trying to map it for you, including impact analysis, including relationship dependencies. What we're trying to simplify is actually choosing the right data to move to the cloud and actually dealing with rest of the impact that is happening when you're adopting cloud fast. I think cloud is bringing an amazing premise. We want to make it really, really easy. This latest announcement is actually touching the experience itself, how can a customer go from starting a new data warehouse to bringing the data to the data warehouse. I think we are now making it even simpler than ever before. >> So one of the challenges that enterprises have overall is that they're so few people who really understand how to build these pipelines, how to administer these pipelines. Data scientists are not, the numbers are not growing fast. Microsoft also is an enormously powerful ecosystem itself. Do you anticipate that by doing IICS in this relationship way that your developers can actually start incorporating higher, more complex, more higher value data services in a simple way so that they can start putting it into their applications and reduce the need for those really smart people at large and small companies? >> I mean, I think what we want to get to is this notion of self-service data. And to Ronen's point, but that data has to be governed, that data has to be protected, you need to know that you can trust that data, you can trust the source of that data, (coughs) excuse me, you know that you can make decisions from that data, but we hear from customers is they really want IT and these specialists to get out of the way of the business. And so they want to enable their workforce to actually do data production, to say I can create a data set that I can actually make decisions around. I know the lineage of that data set, I know the quality of that data set, and I know where it's appropriate to go use that data set. It could be for data science. It could be for a data engineer to go pick up and use for another pipeline, or it could be for a business analyst. But I think with this partnership, what we're really focusing on is how do we accelerate that productivity for those people who are discovering the data, managing the data, and then those that can then build these data streams and build these data sets that can be consumed inside an organization. Now I think to your point, once we do that, we believe that we will see a proliferation of analysis and higher level advanced analytics on top of that data. What we're hearing from customers is the challenge isn't necessarily getting machine enlargening services up and running or doing advanced analytics or building models and training models. Yes there is a narrow set of people that go and do that, but inordinately what we hear is that customers are spending the bulk of their time, shaping, managing that data, wrangling that data, getting that data in a form that it can actually be consumed and I think this partnership-- >> A lot of prep work. >> Yeah, a ton of prep work. >> Talk about the dynamic. We've been hearing on The Cube here, certainly, and also out in the industry, that 80% of the time spent managing all this stuff, you guys have a value proposition of caching all the metadata so you can get a clear view and customers, we had Toyota on earlier, said we had all the data, we just actually made all these mistakes because we didn't connect it all. What you guys are doing, coming from Ronen, you're going to bring all of the Microsoft tools to the table now, so I'm a customer, the benefit to me is I get to leverage the power, BI stuff or whatever is coming down the pipe, whatever tools you have in your ecosystem, on-prem and also in the cloud, is that? >> Absolutely and so things like PowerApps going to be an ability with no code, low code experiences to actually go build intelligent applications, build things like sales oriented applications, recruiting oriented applications, and leverage that data, that is really what we want to unlock for enterprises and for data professionals. >> What do you think the time will be, just ballpark, ballpark order of magnitude, time to, that you're going to save on the setup? If 80% is industry benchmark people throwing around, but say 80% is wrangling setup, 20% analysis. What do you guys see the impact with something like the intelligent cloud service with Azure? >> Ronen, you can speak to what you're seeing already from some of the customers, but I think even from what we saw this morning in the keynote, we're cutting down the time dramatically in terms of, from identifying what data has value and then actually getting that, moving into Azure, what you saw in less than 10 minutes today would take days if not weeks to actually get done without these tools-- >> So significant number, big number? >> John M.: Yeah, absolutely. >> And I think there are actually two parts to people going through the adoption. One is the technology of moving the data, but the other one that is even, I think, a bigger barrier and sometimes even more important is can I actually just discover and identify the data and can I actually get all the metadata needed so that I can get the approval or I can get personally comfortable with the data that I'm choosinng. I think this cost now is actually being eliminated and that is actually going to allow more people to consume more data even faster, but I do agree that I think the demo speaks better than anything else, got a lot of good-- >> John F.: A few clicks and you're there, got some great props on Twitter, saw some great tweets. The question that begs next is now that I got a pipeline and automating, all this stuff's going on, console based and cataloging all this great stuff, AI, machine learning involved, where, is there, did you guys put the secret sauce in some of the tech? I mean, can you share what's under the hood at all? (laughs) Or is that the secret sauce? >> So, I can not steal some of the demos of tomorrow, but I think you will-- >> Yes you can. (laughs) >> Come on, tell us. >> But I think you will see an interesting AI driven interface-- >> That's a yes. >> From Microsoft working very interestingly with the catalog to drive intelligence to the users, so we will definitely demo it tomorrow on stage. >> John F.: So that's a yes. >> Yes, the answer is yes. >> But I want to build on this because I asked a question about whether or not developers are going to get access to this. If I have a platform that allows me to build very, very complex, but very rich, in a simple way, pipelines to data, I have a catalog that allows me to discover data, sustain knowledge about that data as the data changes over time, and I have a very simple way of setting that up and running it through an Azure cloud experience, can I anticipate that over time certain conventions for how data gets established, gets set up, organized, formats, all that other stuff, starts to emerge as a combination of this partnership so that developers can go into an account and say, okay so we're going to do this for you, oh, you have customer data, you have this data, I want to be able to grab that and make it part of my application. Isn't that where this goes over time? >> Yes, yes, in a very substantive way. I think we're also looking at it from, you'll have stay tuned on the Microsoft side, but we're working towards looking at data entities, business entities, and how do we enrich those entities and to your point, where do they get enriched in that data pipeline and then how do they get consumed and how do they get consumed in a way where we're expressing the data model, the schema, the lineage, and all of these things in a way that's very discoverable for those consuming that data, so they understand where it's coming from so that people, so we look at this partnership in terms of getting that data, getting that data more enriched, and getting that data more consumable in a standard way for application developers. Again, it could be those building intelligent applications, it could be those building business applications and there's a whole set of tools-- >> Or some as-yet-undefined class of applications that are made possible because it's easier to find the data, acknowledge the data, use the data. >> John M.: Yeah, absolutely. >> If we had more time, I'd love to drill down on the future with Microservices, containers, Kubernetes, all the cool stuff that's going on around cloud native. I'm sure there's a lot of head room there from a developer standpoint. Final question is, extending the partnership. Is there a go to market together? Are you guys taking it to the field? What's the relationship with Microsoft, your ecosystem, your developers, your customers, and Informatica? >> Yeah, we're doing a lot of joint go-to-market. Today already we've been doing a lot all the way up to this announcement and I think you'll see that increase based on this announcement. I don't know if Ronen you want to talk about specific things we're doing. >> Yeah, I think the success with the customer is already there and there is actually a really nice list of customers here that are mutual customer of ours doing exactly these scenarios. We'll make it easier for them to do it from now on. >> Yep. >> From a go-to-market perspective, we have a really nice go-to-market motion where the sales teams are actually getting aligned. The new visible integration will make it even easier for them. >> Yeah, this really hits a lot of the sweet spot, multi-cloud, hybrid cloud, truly data-driven, ease of use, getting up and running. Congratulations, Ronen, great job. John, great to see you. Here inside The Cube, putting all the data, packing it, sharing it out over the airwaves and over the Internet. Just The Cube, I'm John Furrier, Peter Burris, thanks for watching. Back with more live coverage. Stay with us for more coverage here at Informatica World 2018, live in Las Vegas. We'll be right back. (soft electronic music)

Published Date : May 22 2018

SUMMARY :

Brought to you by Informatica. Part of the big news this So great to have you guys on, leverage the Informatica What is the specific product? in the market and you are able because one of the things and get that into the been in the works for? and that's one of the great things seems to be and you can confirm this Data's at the center of and the relational warehouse on the news here a second. in the Q3 time frame, What are some of the the right data to move to the cloud and reduce the need for that data has to be governed, that 80% of the time spent and leverage that data, What do you guys see the impact so that I can get the approval (laughs) Or is that the secret sauce? Yes you can. intelligence to the users, that allows me to build and to your point, where acknowledge the data, use the data. on the future with Microservices, all the way up to this announcement them to do it from now on. we have a really nice go-to-market motion and over the Internet.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Ronen Schwartz	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Ronen	PERSON	0.99+
John MacIntyre	PERSON	0.99+
80%	QUANTITY	0.99+
John F.	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
John M.	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Las Vegas	LOCATION	0.99+
FedEx	ORGANIZATION	0.99+
two parts	QUANTITY	0.99+
20%	QUANTITY	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
less than 10 minutes	QUANTITY	0.99+
tomorrow	DATE	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
Informatica World 2018	EVENT	0.98+
Q3	DATE	0.98+
Databricks	ORGANIZATION	0.98+
One	QUANTITY	0.97+
Azure	TITLE	0.97+
this morning	DATE	0.95+
two guests	QUANTITY	0.95+
one click	QUANTITY	0.95+
Twitter	ORGANIZATION	0.95+
John Macintyre	PERSON	0.95+
single	QUANTITY	0.94+
SQL Data Warehouse	TITLE	0.94+
Azure cloud	TITLE	0.94+
first	QUANTITY	0.92+
first barriers	QUANTITY	0.92+
Informatica World 2018	EVENT	0.91+
GDPR	TITLE	0.91+
Venetian	LOCATION	0.91+
Spark	ORGANIZATION	0.91+
Big Data Cloud	ORGANIZATION	0.89+
PowerApps	TITLE	0.87+
over a year ago	DATE	0.84+
things	QUANTITY	0.81+
SVP	PERSON	0.77+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Seth Dobrin & Jennifer Gibbs | IBM CDO Strategy Summit 2017

>> Live from Boston, Massachusetts. It's The Cube! Covering IBM Chief Data Officer's Summit. Brought to you by IBM. (techno music) >> Welcome back to The Cube's live coverage of the IBM CDO Strategy Summit here in Boston, Massachusetts. I'm your host Rebecca Knight along with my Co-host Dave Vellante. We're joined by Jennifer Gibbs, the VP Enterprise Data Management of TD Bank, and Seth Dobrin who is VP and Chief Data Officer of IBM Analytics. Thanks for joining us Seth and Jennifer. >> Thanks for having us. >> Thank you. >> So Jennifer, I want to start with you can you tell our viewers a little about TD Bank, America's Most Convenient Bank. Based, of course, in Toronto. (laughs). >> Go figure. (laughs) >> So tell us a little bit about your business. >> So TD is a, um, very old bank, headquartered in Toronto. We do have, ah, a lot of business as well in the U.S. Through acquisition we've built quite a big business on the Eastern seaboard of the United States. We've got about 85 thousand employees and we're servicing 42 lines of business when it comes to our Data Management and our Analytics programs, bank wide. >> So talk about your Data Management and Analytics programs a little bit. Tell our viewers a little bit about those. >> So, we split up our office of the Chief Data Officer, about 3 to 4 years ago and so we've been maturing. >> That's relatively new. >> Relatively new, probably, not unlike peers of ours as well. We started off with a strong focus on Data Governance. Setting up roles and responsibilities, data storage organization and councils from which we can drive consensus and discussion. And then we started rolling out some of our Data Management programs with a focus on Data Quality Management and Meta Data Management, across the business. So setting standards and policies and supporting business processes and tooling for those programs. >> Seth when we first met, now you're a long timer at IBM. (laughs) When we first met you were a newbie. But we heard today, about,it used to be the Data Warehouse was king but now Process is king. Can you unpack that a little bit? What does that mean? >> So, you know, to make value of data, it's more than just having it in one place, right? It's what you do with the data, how you ingest the data, how you make it available for other uses. And so it's really, you know, data is not for the sake of data. Data is not a digital dropping of applications, right? The whole purpose of having and collecting data is to use it to generate new value for the company. And that new value could be cost savings, it could be a cost avoidance, or it could be net new revenue. Um, and so, to do that right, you need processes. And the processes are everything from business processes, to technical processes, to implementation processes. And so it's the whole, you need all of it. >> And so Jennifer, I don't know if you've seen kind of a similar evolution from data warehouse to data everywhere, I'm sure you have. >> Yeah. >> But the data quality problem was hard enough when you had this sort of central master data management approach. How are you dealing with it? Is there less of a single version of the truth now than there ever was, and how do you deal with the data quality challenge? >> I think it's important to scope out the work effort in a way that you can get the business moving in the right direction without overwhelming and focusing on the areas that are most important to the bank. So, we've identified and scoped out what we call critical data. So each line of business has to identify what's critical to them. Does relate very strongly to what Seth said around what are your core business processes and what data are you leveraging to provide value to that, to the bank. So, um, data quality for us is about a consistent approach, to ensure the most critical elements of data that used for business processes are where they need to be from a quality perspective. >> You can go down a huge rabbit whole with data quality too, right? >> Yeah. >> Data quality is about what's good enough, and defining, you know. >> Right. >> Mm-hmm (affirmative) >> It's not, I liked your, someone, I think you said, it's not about data quality, it's about, you know it's, you got to understand what good enough is, and it's really about, you know, what is the state of the data and under, it's really about understanding the data, right? Than it is perfection. There are some cases, especially in banking, where you need perfection, but there's tons of cases where you don't. And you shouldn't spend a lot of resources on something that's not value added. And I think it's important to do, even things like, data quality, around a specific use case so that you do it right. >> And what you were saying too, it that it's good enough but then that, that standard is changing too, all the time. >> Yeah and that changes over time and it's, you know, if you drive it by use case and not just, we have get this boil the ocean kind of approach where all data needs to be perfect. And all data will never be perfect. And back to your question about processes, usually, a data quality issue, is not a data issue, it's a process issue. You get bad data quality because a process is broken or it's not working for a business or it's changed and no one's documented it so there's a work around, right? And so that's really where your data quality issues come from. Um, and I think that's important to remember. >> Yeah, and I think also coming out of the data quality efforts that we're making, to your point, is it central wise or is it cross business? It's really driving important conversations around who's the producer of this data, who's the consumer of this data? What does data quality mean to you? So it's really generating a lot of conversation across lines of business so that we can start talking about data in more of a shared way versus more of a business by business point of view. So those conversations are important by-products I would say of the individual data quality efforts that we're doing across the bank. >> Well, and of course, you're in a regulated business so you can have the big hammer of hey, we've got regulations, so if somebody spins up a Hadoop Cluster in some line of business you can reel 'em in, presumably, more easily, maybe not always. Seth you operate in an unregulated business. You consult with clients that are in unregulated businesses, is that a bigger challenge for you to reel in? >> So, I think, um, I think that's changing. >> Mm-hmm (affirmative) >> You know, there's new regulations coming out in Europe that basically have global impact, right? This whole GDPR thing. It's not just if you're based in Europe. It's if you have a subject in Europe and that's an employee, a contractor, a customer. And so everyone is subject to regulations now, whether they like it or not. And, in fact, there was some level of regulation even in the U.S., which is kind of the wild, wild, west when it comes to regulations. But I think, um, you should, even doing it because of regulation is not the right answer. I mean it's a great stick to hold up. It's great to be able to go to your board and say, "Hey if we don't do this, we need to spend this money 'cause it's going to cost us, in the case of GDPR, four percent of our revenue per instance.". Yikes, right? But really it's about what's the value and how do you use that information to drive value. A lot of these regulation are about lineage, right? Understanding where your data came from, how it's being processed, who's doing what with it. A lot of it is around quality, right? >> Yep. >> And so these are all good things, even if you're not in a regulated industry. And they help you build a better connection with your customer, right? I think lots of people are scared of GDPR. I think it's a really good thing because it forces companies to build a personal relationship with each of their clients. Because you need to get consent to do things with their data, very explicitly. No more of these 30 pages, two point font, you know ... >> Click a box. >> Click a box. >> Yeah. >> It's, I am going to use your data for X. Are you okay with that? Yes or no. >> So I'm interested from, to hear from both of you, what are you hearing from customers on this? Because this is such a sensitive topic and, in particularly, financial data, which is so private. What are you, what are you hearing from customers on this? >> Um, I think customers are, um, are, especially us in our industry, and us as a bank. Our relationship with our customer is top priority and so maintaining that trust and confidence is always a top priority. So whenever we leverage data or look for use cases to leverage data, making sure that that trust will not be compromised is critically important. So finding that balance between innovating with data while also maintaining that trust and frankly being very transparent with customers around what we're using it for, why we're using it, and what value it brings to them, is something that we're focused on with, with all of our data initiatives. >> So, big part of your job is understanding how data can affect and contribute to the monetization, you know, of your businesses. Um, at the simplest level, two ways, cut costs, increase revenue. Where do you each see the emphasis? I'm sure both, but is there a greater emphasis on cutting costs 'cause you're both established, you know, businesses, with hundreds of thousands, well in your case, 85 thousand employees. Where do you see the emphasis? Is it greater on cutting costs or not necessarily? >> I think for us, I don't necessarily separate the two. Anything we can do to drive more efficiency within our business processes is going to help us focus our efforts on innovative use of data, innovative ways to interact with our customers, innovative ways to understand more about out customers. So, I see them both as, um, I don't see them mutually exclusive, I see them as contributing to each. >> Mm-hmm (affirmative) >> So our business cases tend to have an efficiency slant to them or a productivity slant to them and that helps us redirect effort to other, other things that provide extra value to our clients. So I'd say it's a mix. >> I mean I think, I think you have to do the cost savings and cost avoidance ones first. Um, you learn a lot about your data when you do that. You learn a lot about the gaps. You learn about how would I even think about bringing external data in to generate that new revenue if I don't understand my own data? How am I going to tie 'em all together? Um, and there's a whole lot of cultural change that needs to happen before you can even start generating revenue from data. And you kind of cut your teeth on that by doing the really, simple cost savings, cost avoidance ones first, right? Inevitably, maybe not in the bank, but inevitably most company's supply chain. Let's go find money we can take out of your supply chain. Most companies, if you take out one percent of the supply chain budget, you're talking a lot of money for the company, right? And so you can generate a lot of money to free up to spend on some of these other things. >> So it's a proof of concept to bring everyone along. >> Well it's a proof of concept but it's also, it's more of a cultural change, right? >> Mm-hmm (affirmative) It's not even, you don't even frame it up as a proof of concept for data or analytics, you just frame it up, we're going to save the company, you know, one percent of our supply chain, right? We're going to save the company a billion dollars. >> Yes. >> And then there's gain share there 'cause we're going to put that thing there. >> And then there's a gain share and then other people are like, "Well, how do I do that?". And how do I do that, and how do I do that? And it kind of picks up. >> Mm-hmm (affirmative) But I don't think you can jump just to making new revenue. You got to kind of get there iteratively. >> And it becomes a virtuous circle. >> It becomes a virtuous circle and you kind of change the culture as you do it. But you got to start with, I don't, I don't think they're mutually exclusive, but I think you got to start with the cost avoidance and cost savings. >> Mm-hmm (affirmative) >> Great. Well, Seth, Jennifer thanks so much for coming on The Cube. We've had a great conversation. >> Thanks for having us. >> Thanks. >> Thanks you guys. >> We will have more from the IBM CDO Summit in Boston, Massachusetts, just after this. (techno music)

Published Date : Oct 25 2017

SUMMARY :

Brought to you by IBM. Cube's live coverage of the So Jennifer, I want to start with you (laughs) So tell us a little of the United States. So talk about your Data Management and of the Chief Data Officer, And then we started met you were a newbie. And so it's the whole, you need all of it. to data everywhere, I'm sure you have. How are you dealing with it? So each line of business has to identify and defining, you know. And I think it's important to do, And what you were And back to your question about processes, across lines of business so that we can business so you can have the big hammer of So, I think, um, I and how do you use that And they help you build Are you okay with that? what are you hearing and so maintaining that Where do you each see the emphasis? as contributing to each. So our business cases tend to have And so you can generate a lot of money to bring everyone along. It's not even, you don't even frame it up to put that thing there. And it kind of picks up. But I don't think you can jump change the culture as you do it. much for coming on The Cube. from the IBM CDO Summit

ENTITIES

Entity	Category	Confidence
Seth	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jennifer	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer Gibbs	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
TD Bank	ORGANIZATION	0.99+
Toronto	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
TD	ORGANIZATION	0.99+
42 lines	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
30 pages	QUANTITY	0.99+
United States	LOCATION	0.99+
one percent	QUANTITY	0.99+
both	QUANTITY	0.99+
two point	QUANTITY	0.99+
U.S.	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
each line	QUANTITY	0.99+
GDPR	TITLE	0.99+
today	DATE	0.98+
each	QUANTITY	0.98+
85 thousand employees	QUANTITY	0.98+
hundreds of thousands	QUANTITY	0.98+
four percent	QUANTITY	0.97+
first	QUANTITY	0.97+
one place	QUANTITY	0.97+
two ways	QUANTITY	0.97+
about 85 thousand employees	QUANTITY	0.95+
4 years ago	DATE	0.93+
IBM	EVENT	0.93+
IBM CDO Summit	EVENT	0.91+
IBM CDO Strategy Summit	EVENT	0.91+
Data Warehouse	ORGANIZATION	0.89+
billion dollars	QUANTITY	0.89+
IBM Chief Data Officer's	EVENT	0.88+
about 3	DATE	0.81+
tons of cases	QUANTITY	0.79+
America	ORGANIZATION	0.77+
CDO Strategy Summit 2017	EVENT	0.76+
single version	QUANTITY	0.67+
Data Officer	PERSON	0.59+
Cube	ORGANIZATION	0.58+
money	QUANTITY	0.52+
lot	QUANTITY	0.45+
The Cube	ORGANIZATION	0.36+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for EnterpriseData Warehouse: