Breaking Analysis: Unpacking Oracle’s Autonomous Data Warehouse Announcement

(upbeat music) >> On February 19th of this year, Barron's dropped an article declaring Oracle, a cloud giant and the article explained why the stock was a buy. Investors took notice and the stock ran up 18% over the next nine trading days and it peaked on March 9th, the day before Oracle announced its latest earnings. The company beat consensus earnings on both top-line and EPS last quarter, but investors, they did not like Oracle's tepid guidance and the stock pulled back. But it's still, as you can see, well above its pre-Barron's article price. What does all this mean? Is Oracle a cloud giant? What are its growth prospects? Now many parts of Oracle's business are growing including Fusion ERP, Fusion HCM, NetSuite, we're talking deep into the double digits, 20 plus percent growth. It's OnPrem legacy licensed business however, continues to decline and that moderates, the overall company growth because that OnPrem business is so large. So the overall Oracle's growing in the low single digits. Now what stands out about Oracle is it's recurring revenue model. That figure, the company says now it represents 73% of its revenue and that's going to continue to grow. Now two other things stood out on the earnings call to us. First, Oracle plans on increasing its CapEX by 50% in the coming quarter, that's a lot. Now it's still far less than AWS Google or Microsoft Spend on capital but it's a meaningful data point. Second Oracle's consumption revenue for Autonomous Database and Cloud Infrastructure, OCI or Oracle Cloud Infrastructure grew at 64% and 139% respectively and these two factors combined with the CapEX Spend suggest that the company has real momentum. I mean look, it's possible that the CapEx announcements maybe just optics in they're front loading, some spend to show the street that it's a player in cloud but I don't think so. Oracle's Safra Catz's usually pretty disciplined when it comes to it's spending. Now today on March 17th, Oracle announced updates towards Autonomous Data Warehouse and with me is David Floyer who has extensively researched Oracle over the years and today we're going to unpack the Oracle Autonomous Data Warehouse, ADW announcement. What it means to customers but we also want to dig into Oracle's strategy. We want to compare it to some other prominent database vendors specifically, AWS and Snowflake. David Floyer, Welcome back to The Cube, thanks for making some time for me. >> Thank you Vellante, great pleasure to be here. >> All right, I want to get into the news but I want to start with this idea of the autonomous database which Oracle's announcement today is building on. Oracle uses the analogy of a self-driving car. It's obviously powerful metaphor as they call it the self-driving database and my takeaway is that, this means that the system automatically provisions, it upgrades, it does all the patching for you, it tunes itself. Oracle claims that all reduces labor costs or admin costs by 90%. So I ask you, is this the right interpretation of what Oracle means by autonomous database? And is it real? >> Is that the right interpretation? It's a nice analogy. It's a test to that analogy, isn't it? I would put it as the first stage of the Autonomous Data Warehouse was to do the things that you talked about, which was the tuning, the provisioning, all of that sort of thing. The second stage is actually, I think more interesting in that what they're focusing on is making it easy to use for the end user. Eliminating the requirement for IT, staff to be there to help in the actual using of it and that is a very big step for them but an absolutely vital step because all of the competition focusing on ease of use, ease of use, ease of use and cheapness of being able to manage and deploy. But, so I think that is the really important area that Oracle has focused on and it seemed to have done so very well. >> So in your view, is this, I mean you don't really hear a lot of other companies talking about this analogy of the self-driving database, is this unique? Is it differentiable for Oracle? If so, why, or maybe you could help us understand that a little bit better. >> Well, the whole strategy is unique in its breadth. It has really brought together a whole number of things together and made it of its type the best. So it has a single, whole number of data sources and database types. So it's got a very broad range of different ways that you can look at the data and the second thing that is also excellent is it's a platform. It is fully self provisioned and its functionality is very, very broad indeed. The quality of the original SQL and the query languages, etc, is very, very good indeed and it's a better agent to do joints for example, is excellent. So all of the building blocks are there and together with it's sharing of the same data with OLTP and inference and in memory data paces as well. All together the breadth of what they have is unique and very, very powerful. >> I want to come back to this but let's get into the news a little bit and the announcement. I mean, it seems like what's new in the autonomous data warehouse piece for Oracle's new tooling around four areas that so Andy Mendelsohn, the head of this group instead of the guy who releases his baby, he talked about four things. My takeaway, faster simpler loads, simplified transforms, autonomous machine learning models which are facilitating, What do you call it? Citizen data science and then faster time to insights. So tooling to make those four things happen. What's your take and takeaways on the news? >> I think those are all correct. I would add the ease of use in terms of being able to drag and drop, the user interface has been dramatically improved. Again, I think those, strategically are actually more important that the others are all useful and good components of it but strategically, I think is more important. There's ease of use, the use of apex for example, are more important. And, >> Why are they more important strategically? >> Because they focus on the end users capability. For example, one of other things that they've started to introduce is Python together with their spatial databases, for example. That is really important that you reach out to the developer as they are and what tools they want to use. So those type of ease of use things, those types of things are respecting what the end users use. For example, they haven't come out with anything like click or Tableau. They've left that there for that marketplace for the end user to use what they like best. >> Do you mean, they're not trying to compete with those two tools. They indeed had a laundry list of stuff that they supported, Talend, Tableau, Looker, click, Informatica, IBM, I had IBM there. So their claim was, hey, we're open. But so that's smart. That's just, hey, they realized that people use these tools. >> I'm trying to exclude other people, be a platform and be an ecosystem for the end users. >> Okay, so Mendelsohn who made the announcement said that Oracle's the smartphone of databases and I think, I actually think Alison kind of used that or maybe that was us planing to have, I thought he did like the iPhone of when he announced the exit data way back when the integrated hardware and software but is that how you see it, is Oracle, the smartphone of databases? >> It is, I mean, they are trying to own the complete stack, the hardware with the exit data all the way up to the databases at the data warehouses and the OLTP databases, the inference databases. They're trying to own the complete stack from top to bottom and that's what makes autonomy process possible. You can make it autonomous when you control all of that. Take away all of the requirements for IT in the business itself. So it's democratizing the use of data warehouses. It is pushing it out to the lines of business and it's simplifying it and making it possible to push out so that they can own their own data. They can manage their own data and they do not need an IT person from headquarters to help them. >> Let's stay in this a little bit more and then I want to go into some of the competitive stuff because Mendelsohn mentioned AWS several times. One of the things that struck me, he said, hey, we're basically one API 'cause we're doing analytics in the cloud, we're doing data in the cloud, we're doing integration in the cloud and that's sort of a big part of the value proposition. He made some comparisons to Redshift. Of course, I would say, if you can't find a workload where you beat your big competitor then you shouldn't be in this business. So I take those things with a grain of salt but one of the other things that caught me is that migrating from OnPrem to Oracle, Oracle Cloud was very simple and I think he might've made some comparisons to other platforms. And this to me is important because he also brought in that Gartner data. We looked at that Gardner data when they came out with it in the operational database class, Oracle smoked everybody. They were like way ahead and the reason why I think that's important is because let's face it, the Mission Critical Workloads, when you look at what's moving into AWS, the Mission Critical Workloads, the high performance, high criticality OLTP stuff. That's not moving in droves and you've made the point often that companies with their own cloud particularly, Oracle you've mentioned this about IBM for certain, DB2 for instance, customers are going to, there should be a lower risk environment moving from OnPrem to their cloud, because you could do, I don't think you could get Oracle RAC on AWS. For example, I don't think EXIF data is running in AWS data centers and so that like component is going to facilitate migration. What's your take on all that spiel? >> I think that's absolutely right. You all crown Jewels, the most expensive and the most valuable applications, the mission-critical applications. The ones that have got to take a beating, keep on taking. So those types of applications are where Oracle really shines. They own a very large high percentage of those Mission Critical Workloads and you have the choice if you're going to AWS, for example of either migrating to Oracle on AWS and that is frankly not a good fit at all. There're a lot of constraints to running large systems on AWS, large mission critical systems. So that's not an option and then the option, of course, that AWS will push is move to a Roller, change your way of writing applications, make them tiny little pieces and stitch them all together with microservices and that's okay if you're a small organization but that has got a lot of problems in its own, right? Because then you, the user have to stitch all those pieces together and you're responsible for testing it and you're responsible for looking after it. And that as you grow becomes a bigger and bigger overhead. So AWS, in my opinion needs to have a move towards a tier-one database of it's own and it's not in that position at the moment. >> Interesting, okay. So, let's talk about the competitive landscape and the choices that customers have. As I said, Mendelssohn mentioned AWS many times, Larry on the calls often take shy, it's a compliment to me. When Larry Ellison calls you out, that means you've made it, you're doing well. We've seen it over the years, whether it's IBM or Workday or Salesforce, even though Salesforce's big Oracle customer 'cause AWS, as we know are Oracle customer as well, even though AWS tells us they've off called when you peel the onion >> Five years should be great, some of the workers >> Well, as I said, I believe they're still using Oracle in certain workloads. Way, way, we digress. So AWS though, they take a different approach and I want to push on this a little bit with database. It's got more than a dozen, I think purpose-built databases. They take this kind of right tool for the right job approach was Oracle there converging all this function into a single database. SQL JSON graph databases, machine learning, blockchain. I'd love to talk about more about blockchain if we have time but seems to me that the right tool for the right job purpose-built, very granular down to the primitives and APIs. That seems to me to be a pretty viable approach versus kind of a Swiss Army approach. How do you compare the two? >> Yes, and it is to many initial programmers who are very interested for example, in graph databases or in time series databases. They are looking for a cheap database that will do the job for a particular project and that makes, for the program or for that individual piece of work is making a very sensible way of doing it and they pay for ads on it's clear cloud dynamics. The challenge as you have more and more data and as you're building up your data warehouse in your data lakes is that you do not want to have to move data from one place to another place. So for example, if you've got a Roller,, you have to move the database and it's a pretty complicated thing to do it, to move it to Redshift. It's a five or six steps to do that and each of those costs money and each of those take time. More importantly, they take time. The Oracle approach is a single database in terms of all the pieces that obviously you have multiple databases you have different OLTP databases and data warehouse databases but as a single architecture and a single design which means that all of the work in terms of moving stuff from one place to another place is within Oracle itself. It's Oracle that's doing that work for you and as you grow, that becomes very, very important. To me, very, very important, cost saving. The overhead of all those different ones and the databases themselves originate with all as open source and they've done very well with it and then there's a large revenue stream behind the, >> The AWS, you mean? >> Yes, the original database is in AWS and they've done a lot of work in terms of making it set with the panels, etc. But if a larger organization, especially very large ones and certainly if they want to combine, for example data warehouse with the OLTP and the inference which is in my opinion, a very good thing that they should be trying to do then that is incredibly difficult to do with AWS and in my opinion, AWS has to invest enormously in to make the whole ecosystem much better. >> Okay, so innovation required there maybe is part of the TAM expansion strategy but just to sort of digress for a second. So it seems like, and by the way, there are others that are doing, they're taking this converged approach. It seems like that is a trend. I mean, you certainly see it with single store. I mean, the name sort of implies that formerly MemSQL I think Monte Zweben of splice machine is probably headed in a similar direction, embedding AI in Microsoft's, kind of interesting. It seems like Microsoft is willing to build this abstraction layer that hides that complexity of the different tooling. AWS thus far has not taken that approach and then sort of looking at Snowflake, Snowflake's got a completely different, I think Snowflake's trying to do something completely different. I don't think they're necessarily trying to take Oracle head-on. I mean, they're certainly trying to just, I guess, let's talk about this. Snowflake simplified EDW, that's clear. Zero to snowflake in 90 minutes. It's got this data cloud vision. So you sign on to this Snowflake, speaking of layers they're abstracting the complexity of the underlying cloud. That's what the data cloud vision is all about. They, talk about this Global Mesh but they've not done a good job of explaining what the heck it is. We've been pushing them on that, but we got, >> Aspiration of moment >> Well, I guess, yeah, it seems that way. And so, but conceptually, it's I think very powerful but in reality, what snowflake is doing with data sharing, a lot of reading it's probably mostly read-only and I say, mostly read-only, oh, there you go. You'll get better but it's mostly read and so you're able to share the data, it's governed. I mean, it's exactly, quite genius how they've implemented this with its simplicity. It is a caching architecture. We've talked about that, we can geek out about that. There's good, there's bad, there's ugly but generally speaking, I guess my premise here I would love your thoughts. Is snowflakes trying to do something different? It's trying to be not just another data warehouse. It's not just trying to compete with data lakes. It's trying to create this data cloud to facilitate data sharing, put data in the hands of business owners in terms of a product build, data product builders. That's a different vision than anything I've seen thus far, your thoughts. >> I agree and even more going further, being a place where people can sell data. Put it up and make it available to whoever needs it and making it so simple that it can be shared across the country and across the world. I think it's a very powerful vision indeed. The challenge they have is that the pieces at the moment are very, very easy to use but the quality in terms of the, for example, joints, I mentioned, the joints were very powerful in Oracle. They don't try and do joints. They, they say >> They being Snowflake, snowflake. Yeah, they don't even write it. They would say use another Postgres >> Yeah. >> Database to do that. >> Yeah, so then they have a long way to go. >> Complex joints anyway, maybe simple joints, yeah. >> Complex joints, so they have a long way to go in terms of the functionality of their product and also in my opinion, they sure be going to have more types of databases inside it, including OLTP and they can do that. They have obviously got a great market gap and they can do that by acquisition as well as they can >> They've started. I think, I think they support JSON, right. >> Do they support JSON? And graph, I think there's a graph database that's either coming or it's there, I can't keep all that stuff in my head but there's no reason they can't go in that direction. I mean, in speaking to the founders in Snowflake they were like, look, we're kind of new. We would focus on simple. A lot of them came from Oracle so they know all database and they know how hard it is to do things like facilitate complex joints and do complex workload management and so they said, let's just simplify, we'll put it in the cloud and it will spin up a separate data warehouse. It's a virtual data warehouse every time you want one to. So that's how they handle those things. So different philosophy but again, coming back to some of the mission critical work and some of the larger Oracle customers, they said they have a thousand autonomous database customers. I think it was autonomous database, not ADW but anyway, a few stood out AON, lift, I think Deloitte stood out and as obviously, hundreds more. So we have people who misunderstand Oracle, I think. They got a big install base. They invest in R and D and they talk about lock-in sure but the CIO that I talked to and you talked to David, they're looking for business value. I would say that 75 to 80% of them will gravitate toward business value over the fear of lock-in and I think at the end of the day, they feel like, you know what? If our business is performing, it's a better business decision, it's a better business case. >> I fully agree, they've been very difficult to do business with in the past. Everybody's in dread of the >> The audit. >> The knock on the door from the auditor. >> Right. >> And that from a purchasing point of view has been really bad experience for many, many customers. The users of the database itself are very happy indeed. I mean, you talk to them and they understand why, what they're paying for. They understand the value and in terms of availability and all of the tools for complex multi-dimensional types of applications. It's pretty well, the only game in town. It's only DB2 and SQL that had any hope of doing >> Doing Microsoft, Microsoft SQL, right. >> Okay, SQL >> Which, okay, yeah, definitely competitive for sure. DB2, no IBM look, IBM lost its dominant position in database. They kind of seeded that. Oracle had to fight hard to win it. It wasn't obvious in the 80s who was going to be the database King and all had to fight. And to me, I always tell people the difference is that the chairman of Oracle is also the CTO. They spend money on R and D and they throw off a ton of cash. I want to say something about, >> I was just going to make one extra point. The simplicity and the capability of their cloud versions of all of this is incredibly good. They are better in terms of spending what you need or what you use much better than AWS, for example or anybody else. So they have really come full circle in terms of attractiveness in a cloud environment. >> You mean charging you for what you consume. Yeah, Mendelsohn talked about that. He made a big point about the granularity, you pay for only what you need. If you need 33 CPUs or the other databases you've got to shape, if you need 33, you've got to go to 64. I know that's true for everyone. I'm not sure if that's true too for snowflake. It may be, I got to dig into that a little bit, but maybe >> Yes, Snowflake has got a front end to hiding behind. >> Right, but I didn't want to push it that a little bit because I want to go look at their pricing strategies because I still think they make you buy, I may be wrong. I thought they make you still do a one-year or two-year or three-year term. I don't know if you can just turn it off at any time. They might allow, I should hold off. I'll do some more research on that but I wanted to make a point about the audits, you mentioned audits before. A big mistake that a lot of Oracle customers have made many times and we've written about this, negotiating with Oracle, you've got to bring your best and your brightest when you negotiate with Oracle. Some of the things that people didn't pay attention to and I think they've sort of caught onto this is that Oracle's SOW is adjudicate over the MSA, a lot of legal departments and procurement department. Oh, do we have an MSA? With all, Yes, you do, okay, great and because they think the MSA, they then can run. If they have an MSA, they can rubber stamp it but the SOW really dictateS and Oracle's gotcha there and they're really smart about that. So you got to bring your best and the brightest and you've got to really negotiate hard with Oracle, you get trouble. >> Sure. >> So it is what it is but coming back to Oracle, let's sort of wrap on this. Dominant position in mission critical, we saw that from the Gartner research, especially for operational, giant customer base, there's cloud-first notion, there's investing in R and D, open, we'll put a question Mark around that but hey, they're doing some cool stuff with Michael stuff. >> Ecosystem, I put that, ecosystem they're promoting their ecosystem. >> Yeah, and look, I mean, for a lot of their customers, we've talked to many, they say, look, there's actually, a tail at the tail way, this saves us money and we don't have to migrate. >> Yeah. So interesting, so I'll give you the last word. We started sort of focusing on the announcement. So what do you want to leave us with? >> My last word is that there are platforms with a certain key application or key parts of the infrastructure, which I think can differentiate themselves from the Azures or the AWS. and Oracle owns one of those, SAP might be another one but there are certain platforms which are big enough and important enough that they will, in my opinion will succeed in that cloud strategy for this. >> Great, David, thanks so much, appreciate your insights. >> Good to be here. Thank you for watching everybody, this is Dave Vellante for The Cube. We'll see you next time. (upbeat music)

Published Date : Mar 17 2021

SUMMARY :

and that moderates, the great pleasure to be here. that the system automatically and it seemed to have done so very well. So in your view, is this, I mean and the second thing and the announcement. that the others are all useful that they've started to of stuff that they supported, and be an ecosystem for the end users. and the OLTP databases, and the reason why I and the most valuable applications, and the choices that customers have. for the right job approach was and that makes, for the program OLTP and the inference that complexity of the different tooling. put data in the hands of business owners that the pieces at the moment Yeah, they don't even write it. Yeah, so then they Complex joints anyway, and also in my opinion, they sure be going I think, I think they support JSON, right. and some of the larger Everybody's in dread of the and all of the tools is that the chairman of The simplicity and the capability He made a big point about the granularity, front end to hiding behind. and because they think the but coming back to Oracle, Ecosystem, I put that, ecosystem Yeah, and look, I mean, on the announcement. and important enough that much, appreciate your insights. Good to be here.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Mendelsohn	PERSON	0.99+
Andy Mendelsohn	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
March 9th	DATE	0.99+
February 19th	DATE	0.99+
five	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.99+
75	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Mendelssohn	PERSON	0.99+
two	QUANTITY	0.99+
each	QUANTITY	0.99+
90%	QUANTITY	0.99+
one-year	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
73%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
two tools	QUANTITY	0.99+
Michael	PERSON	0.99+
64%	QUANTITY	0.99+
two factors	QUANTITY	0.99+
more than a dozen	QUANTITY	0.99+
last quarter	DATE	0.99+
SQL	TITLE	0.99+

Breaking Analysis: How Snowflake Plans to Change a Flawed Data Warehouse Model

>> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE in ETR. This is Breaking Analysis with Dave Vellante. >> Snowflake is not going to grow into its valuation by stealing the croissant from the breakfast table of the on-prem data warehouse vendors. Look, even if snowflake got 100% of the data warehouse business, it wouldn't come close to justifying its market cap. Rather Snowflake has to create an entirely new market based on completely changing the way organizations think about monetizing data. Every organization I talk to says it wants to be, or many say they already are data-driven. why wouldn't you aspire to that goal? There's probably nothing more strategic than leveraging data to power your digital business and creating competitive advantage. But many businesses are failing, or I predict, will fail to create a true data-driven culture because they're relying on a flawed architectural model formed by decades of building centralized data platforms. Welcome everyone to this week's Wikibon Cube Insights powered by ETR. In this Breaking Analysis, I want to share some new thoughts and fresh ETR data on how organizations can transform their businesses through data by reinventing their data architectures. And I want to share our thoughts on why we think Snowflake is currently in a very strong position to lead this effort. Now, on November 17th, theCUBE is hosting the Snowflake Data Cloud Summit. Snowflake's ascendancy and its blockbuster IPO has been widely covered by us and many others. Now, since Snowflake went public, we've been inundated with outreach from investors, customers, and competitors that wanted to either better understand the opportunities or explain why their approach is better or different. And in this segment, ahead of Snowflake's big event, we want to share some of what we learned and how we see it. Now, theCUBE is getting paid to host this event, so I need you to know that, and you draw your own conclusions from my remarks. But neither Snowflake nor any other sponsor of theCUBE or client of SiliconANGLE Media has editorial influence over Breaking Analysis. The opinions here are mine, and I would encourage you to read my ethics statement in this regard. I want to talk about the failed data model. The problem is complex, I'm not debating that. Organizations have to integrate data and platforms with existing operational systems, many of which were developed decades ago. And as a culture and a set of processes that have been built around these systems, and they've been hardened over the years. This chart here tries to depict the progression of the monolithic data source, which, for me, began in the 1980s when Decision Support Systems or DSS promised to solve our data problems. The data warehouse became very popular and data marts sprung up all over the place. This created more proprietary stovepipes with data locked inside. The Enron collapse led to Sarbanes-Oxley. Now, this tightened up reporting. The requirements associated with that, it breathed new life into the data warehouse model. But it remained expensive and cumbersome, I've talked about that a lot, like a snake swallowing a basketball. The 2010s ushered in the big data movement, and Data Lakes emerged. With a dupe, we saw the idea of no schema online, where you put structured and unstructured data into a repository, and figure it all out on the read. What emerged was a fairly complex data pipeline that involved ingesting, cleaning, processing, analyzing, preparing, and ultimately serving data to the lines of business. And this is where we are today with very hyper specialized roles around data engineering, data quality, data science. There's lots of batch of processing going on, and Spark has emerged to improve the complexity associated with MapReduce, and it definitely helped improve the situation. We're also seeing attempts to blend in real time stream processing with the emergence of tools like Kafka and others. But I'll argue that in a strange way, these innovations actually compound the problem. And I want to discuss that because what they do is they heighten the need for more specialization, more fragmentation, and more stovepipes within the data life cycle. Now, in reality, and it pains me to say this, it's the outcome of the big data movement, as we sit here in 2020, that we've created thousands of complicated science projects that have once again failed to live up to the promise of rapid cost-effective time to insights. So, what will the 2020s bring? What's the next silver bullet? You hear terms like the lakehouse, which Databricks is trying to popularize. And I'm going to talk today about data mesh. These are other efforts they look to modernize datalakes and sometimes merge the best of data warehouse and second-generation systems into a new paradigm, that might unify batch and stream frameworks. And this definitely addresses some of the gaps, but in our view, still suffers from some of the underlying problems of previous generation data architectures. In other words, if the next gen data architecture is incremental, centralized, rigid, and primarily focuses on making the technology to get data in and out of the pipeline work, we predict it's going to fail to live up to expectations again. Rather, what we're envisioning is an architecture based on the principles of distributed data, where domain knowledge is the primary target citizen, and data is not seen as a by-product, i.e, the exhaust of an operational system, but rather as a service that can be delivered in multiple forms and use cases across an ecosystem. This is why we often say the data is not the new oil. We don't like that phrase. A specific gallon of oil can either fuel my home or can lubricate my car engine, but it can't do both. Data does not follow the same laws of scarcity like natural resources. Again, what we're envisioning is a rethinking of the data pipeline and the associated cultures to put data needs of the domain owner at the core and provide automated, governed, and secure access to data as a service at scale. Now, how is this different? Let's take a look and unpack the data pipeline today and look deeper into the situation. You all know this picture that I'm showing. There's nothing really new here. The data comes from inside and outside the enterprise. It gets processed, cleanse or augmented so that it can be trusted and made useful. Nobody wants to use data that they can't trust. And then we can add machine intelligence and do more analysis, and finally deliver the data so that domain specific consumers can essentially build data products and services or reports and dashboards or content services, for instance, an insurance policy, a financial product, a loan, that these are packaged and made available for someone to make decisions on or to make a purchase. And all the metadata associated with this data is packaged along with the dataset. Now, we've broken down these steps into atomic components over time so we can optimize on each and make them as efficient as possible. And down below, you have these happy stick figures. Sometimes they're happy. But they're highly specialized individuals and they each do their job and they do it well to make sure that the data gets in, it gets processed and delivered in a timely manner. Now, while these individual pieces seemingly are autonomous and can be optimized and scaled, they're all encompassed within the centralized big data platform. And it's generally accepted that this platform is domain agnostic. Meaning the platform is the data owner, not the domain specific experts. Now there are a number of problems with this model. The first, while it's fine for organizations with smaller number of domains, organizations with a large number of data sources and complex domain structures, they struggle to create a common data parlance, for example, in a data culture. Another problem is that, as the number of data sources grows, organizing and harmonizing them in a centralized platform becomes increasingly difficult, because the context of the domain and the line of business gets lost. Moreover, as ecosystems grow and you add more data, the processes associated with the centralized platform tend to get further genericized. They again lose that domain specific context. Wait (chuckling), there are more problems. Now, while in theory organizations are optimizing on the piece parts of the pipeline, the reality is, as the domain requires a change, for example, a new data source or an ecosystem partnership requires a change in access or processes that can benefit a domain consumer, the reality is the change is subservient to the dependencies and the need to synchronize across these discrete parts of the pipeline or actually, orthogonal to each of those parts. In other words, in actuality, the monolithic data platform itself remains the most granular part of the system. Now, when I complain about this faulty structure, some folks tell me this problem has been solved. That there are services that allow new data sources to really easily be added. A good example of this is Databricks Ingest, which is, it's an auto loader. And what it does is it simplifies the ingestion into the company's Delta Lake offering. And rather than centralizing in a data warehouse, which struggles to efficiently allow things like Machine Learning frameworks to be incorporated, this feature allows you to put all the data into a centralized datalake. More so the argument goes, that the problem that I see with this, is while the approach does definitely minimizes the complexities of adding new data sources, it still relies on this linear end-to-end process that slows down the introduction of data sources from the domain consumer beside of the pipeline. In other words, the domain experts still has to elbow her way into the front of the line or the pipeline, in this case, to get stuff done. And finally, the way we are organizing teams is a point of contention, and I believe is going to continue to cause problems down the road. Specifically, we've again, we've optimized on technology expertise, where for example, data engineers, well, really good at what they do, they're often removed from the operations of the business. Essentially, we created more silos and organized around technical expertise versus domain knowledge. As an example, a data team has to work with data that is delivered with very little domain specificity, and serves a variety of highly specialized consumption use cases. All right. I want to step back for a minute and talk about some of the problems that people bring up with Snowflake and then I'll relate it back to the basic premise here. As I said earlier, we've been hammered by dozens and dozens of data points, opinions, criticisms of Snowflake. And I'll share a few here. But I'll post a deeper technical analysis from a software engineer that I found to be fairly balanced. There's five Snowflake criticisms that I'll highlight. And there are many more, but here are some that I want to call out. Price transparency. I've had more than a few customers telling me they chose an alternative database because of the unpredictable nature of Snowflake's pricing model. Snowflake, as you probably know, prices based on consumption, just like AWS and other cloud providers. So just like AWS, for example, the bill at the end of the month is sometimes unpredictable. Is this a problem? Yes. But like AWS, I would say, "Kill me with that problem." Look, if users are creating value by using Snowflake, then that's good for the business. But clearly this is a sore point for some users, especially for procurement and finance, which don't like unpredictability. And Snowflake needs to do a better job communicating and managing this issue with tooling that can predict and help better manage costs. Next, workload manage or lack thereof. Look, if you want to isolate higher performance workloads with Snowflake, you just spin up a separate virtual warehouse. It's kind of a brute force approach. It works generally, but it will add expense. I'm kind of reminded of Pure Storage and its approach to storage management. The engineers at Pure, they always design for simplicity, and this is the approach that Snowflake is taking. Usually, Pure and Snowflake, as I have discussed in a moment, is Pure's ascendancy was really based largely on stealing share from Legacy EMC systems. Snowflake, in my view, has a much, much larger incremental market opportunity. Next is caching architecture. You hear this a lot. At the end of the day, Snowflake is based on a caching architecture. And a caching architecture has to be working for some time to optimize performance. Caches work well when the size of the working set is small. Caches generally don't work well when the working set is very, very large. In general, transactional databases have pretty small datasets. And in general, analytics datasets are potentially much larger. Is it Snowflake in the analytics business? Yes. But the good thing that Snowflake has done is they've enabled data sharing, and it's caching architecture serves its customers well because it allows domain experts, you're going to hear this a lot from me today, to isolate and analyze problems or go after opportunities based on tactical needs. That said, very big queries across whole datasets or badly written queries that scan the entire database are not the sweet spot for Snowflake. Another good example would be if you're doing a large audit and you need to analyze a huge, huge dataset. Snowflake's probably not the best solution. Complex joins, you hear this a lot. The working set of complex joins, by definition, are larger. So, see my previous explanation. Read only. Snowflake is pretty much optimized for read only data. Maybe stateless data is a better way of thinking about this. Heavily right intensive workloads are not the wheelhouse of Snowflake. So where this is maybe an issue is real-time decision-making and AI influencing. A number of times, Snowflake, I've talked about this, they might be able to develop products or acquire technology to address this opportunity. Now, I want to explain. These issues would be problematic if Snowflake were just a data warehouse vendor. If that were the case, this company, in my opinion, would hit a wall just like the NPP vendors that proceeded them by building a better mouse trap for certain use cases hit a wall. Rather, my promise in this episode is that the future of data architectures will be really to move away from large centralized warehouses or datalake models to a highly distributed data sharing system that puts power in the hands of domain experts at the line of business. Snowflake is less computationally efficient and less optimized for classic data warehouse work. But it's designed to serve the domain user much more effectively in our view. We believe that Snowflake is optimizing for business effectiveness, essentially. And as I said before, the company can probably do a better job at keeping passionate end users from breaking the bank. But as long as these end users are making money for their companies, I don't think this is going to be a problem. Let's look at the attributes of what we're proposing around this new architecture. We believe we'll see the emergence of a total flip of the centralized and monolithic big data systems that we've known for decades. In this architecture, data is owned by domain-specific business leaders, not technologists. Today, it's not much different in most organizations than it was 20 years ago. If I want to create something of value that requires data, I need to cajole, beg or bribe the technology and the data team to accommodate. The data consumers are subservient to the data pipeline. Whereas in the future, we see the pipeline as a second class citizen, with a domain expert is elevated. In other words, getting the technology and the components of the pipeline to be more efficient is not the key outcome. Rather, the time it takes to envision, create, and monetize a data service is the primary measure. The data teams are cross-functional and live inside the domain versus today's structure where the data team is largely disconnected from the domain consumer. Data in this model, as I said, is not the exhaust coming out of an operational system or an external source that is treated as generic and stuffed into a big data platform. Rather, it's a key ingredient of a service that is domain-driven and monetizable. And the target system is not a warehouse or a lake. It's a collection of connected domain-specific datasets that live in a global mesh. What is a distributed global data mesh? A data mesh is a decentralized architecture that is domain aware. The datasets in the system are purposely designed to support a data service or data product, if you prefer. The ownership of the data resides with the domain experts because they have the most detailed knowledge of the data requirement and its end use. Data in this global mesh is governed and secured, and every user in the mesh can have access to any dataset as long as it's governed according to the edicts of the organization. Now, in this model, the domain expert has access to a self-service and obstructed infrastructure layer that is supported by a cross-functional technology team. Again, the primary measure of success is the time it takes to conceive and deliver a data service that could be monetized. Now, by monetize, we mean a data product or data service that it either cuts cost, it drives revenue, it saves lives, whatever the mission is of the organization. The power of this model is it accelerates the creation of value by putting authority in the hands of those individuals who are closest to the customer and have the most intimate knowledge of how to monetize data. It reduces the diseconomies at scale of having a centralized or a monolithic data architecture. And it scales much better than legacy approaches because the atomic unit is a data domain, not a monolithic warehouse or a lake. Zhamak Dehghani is a software engineer who is attempting to popularize the concept of a global mesh. Her work is outstanding, and it's strengthened our belief that practitioners see this the same way that we do. And to paraphrase her view, "A domain centric system must be secure and governed with standard policies across domains." It has to be trusted. As I said, nobody's going to use data they don't trust. It's got to be discoverable via a data catalog with rich metadata. The data sets have to be self-describing and designed for self-service. Accessibility for all users is crucial as is interoperability, without which distributed systems, as we know, fail. So what does this all have to do with Snowflake? As I said, Snowflake is not just a data warehouse. In our view, it's always had the potential to be more. Our assessment is that attacking the data warehouse use cases, it gave Snowflake a straightforward easy-to-understand narrative that allowed it to get a foothold in the market. Data warehouses are notoriously expensive, cumbersome, and resource intensive, but they're a critical aspect to reporting and analytics. So it was logical for Snowflake to target on-premise legacy data warehouses and their smaller cousins, the datalakes, as early use cases. By putting forth and demonstrating a simple data warehouse alternative that can be spun up quickly, Snowflake was able to gain traction, demonstrate repeatability, and attract the capital necessary to scale to its vision. This chart shows the three layers of Snowflake's architecture that have been well-documented. The separation of compute and storage, and the outer layer of cloud services. But I want to call your attention to the bottom part of the chart, the so-called Cloud Agnostic Layer that Snowflake introduced in 2018. This layer is somewhat misunderstood. Not only did Snowflake make its Cloud-native database compatible to run on AWS than Azure in the 2020 GCP, what Snowflake has done is to obstruct cloud infrastructure complexity and create what it calls the data cloud. What's the data cloud? We don't believe the data cloud is just a marketing term that doesn't have any substance. Just as SAS is Simplified Application Software and iOS made it possible to eliminate the value drain associated with provisioning infrastructure, a data cloud, in concept, can simplify data access, and break down fragmentation and enable shared data across the globe. Snowflake, they have a first mover advantage in this space, and we see a number of fundamental aspects that comprise a data cloud. First, massive scale with virtually unlimited compute and storage resource that are enabled by the public cloud. We talk about this a lot. Second is a data or database architecture that's built to take advantage of native public cloud services. This is why Frank Slootman says, "We've burned the boats. We're not ever doing on-prem. We're all in on cloud and cloud native." Third is an obstruction layer that hides the complexity of infrastructure. and fourth is a governed and secured shared access system where any user in the system, if allowed, can get access to any data in the cloud. So a key enabler of the data cloud is this thing called the global data mesh. Now, earlier this year, Snowflake introduced its global data mesh. Over the course of its recent history, Snowflake has been building out its data cloud by creating data regions, strategically tapping key locations of AWS regions and then adding Azure and GCP. The complexity of the underlying cloud infrastructure has been stripped away to enable self-service, and any Snowflake user becomes part of this global mesh, independent of the cloud that they're on. Okay. So now, let's go back to what we were talking about earlier. Users in this mesh will be our domain owners. They're building monetizable services and products around data. They're most likely dealing with relatively small read only datasets. They can adjust data from any source very easily and quickly set up security and governance to enable data sharing across different parts of an organization, or, very importantly, an ecosystem. Access control and governance is automated. The data sets are addressable. The data owners have clearly defined missions and they own the data through the life cycle. Data that is specific and purposely shaped for their missions. Now, you're probably asking, "What happens to the technical team and the underlying infrastructure and the cluster it's in? How do I get the compute close to the data? And what about data sovereignty and the physical storage later, and the costs?" All these are good questions, and I'm not saying these are trivial. But the answer is these are implementation details that are pushed to a self-service layer managed by a group of engineers that serves the data owners. And as long as the domain expert/data owner is driving monetization, this piece of the puzzle becomes self-funding. As I said before, Snowflake has to help these users to optimize their spend with predictive tooling that aligns spend with value and shows ROI. While there may not be a strong motivation for Snowflake to do this, my belief is that they'd better get good at it or someone else will do it for them and steal their ideas. All right. Let me end with some ETR data to show you just how Snowflake is getting a foothold on the market. Followers of this program know that ETR uses a consistent methodology to go to its practitioner base, its buyer base each quarter and ask them a series of questions. They focus on the areas that the technology buyer is most familiar with, and they ask a series of questions to determine the spending momentum around a company within a specific domain. This chart shows one of my favorite examples. It shows data from the October ETR survey of 1,438 respondents. And it isolates on the data warehouse and database sector. I know I just got through telling you that the world is going to change and Snowflake's not a data warehouse vendor, but there's no construct today in the ETR dataset to cut a data cloud or globally distributed data mesh. So you're going to have to deal with this. What this chart shows is net score in the y-axis. That's a measure of spending velocity, and it's calculated by asking customers, "Are you spending more or less on a particular platform?" And then subtracting the lesses from the mores. It's more granular than that, but that's the basic concept. Now, on the x-axis is market share, which is ETR's measure of pervasiveness in the survey. You can see superimposed in the upper right-hand corner, a table that shows the net score and the shared N for each company. Now, shared N is the number of mentions in the dataset within, in this case, the data warehousing sector. Snowflake, once again, leads all players with a 75% net score. This is a very elevated number and is higher than that of all other players, including the big cloud companies. Now, we've been tracking this for a while, and Snowflake is holding firm on both dimensions. When Snowflake first hit the dataset, it was in the single digits along the horizontal axis and continues to creep to the right as it adds more customers. Now, here's another chart. I call it the wheel chart that breaks down the components of Snowflake's net score or spending momentum. The lime green is new adoption, the forest green is customers spending more than 5%, the gray is flat spend, the pink is declining by more than 5%, and the bright red is retiring the platform. So you can see the trend. It's all momentum for this company. Now, what Snowflake has done is they grabbed a hold of the market by simplifying data warehouse. But the strategic aspect of that is that it enables the data cloud leveraging the global mesh concept. And the company has introduced a data marketplace to facilitate data sharing across ecosystems. This is all about network effects. In the mid to late 1990s, as the internet was being built out, I worked at IDG with Bob Metcalfe, who was the publisher of InfoWorld. During that time, we'd go on speaking tours all over the world, and I would listen very carefully as he applied Metcalfe's law to the internet. Metcalfe's law states that the value of the network is proportional to the square of the number of connected nodes or users on that system. Said another way, while the cost of adding new nodes to a network scales linearly, the consequent value scores scales exponentially. Now, apply that to the data cloud. The marginal cost of adding a user is negligible, practically zero, but the value of being able to access any dataset in the cloud... Well, let me just say this. There's no limitation to the magnitude of the market. My prediction is that this idea of a global mesh will completely change the way leading companies structure their businesses and, particularly, their data architectures. It will be the technologists that serve domain specialists as it should be. Okay. Well, what do you think? DM me @dvellante or email me at david.vellante@siliconangle.com or comment on my LinkedIn? Remember, these episodes are all available as podcasts, so please subscribe wherever you listen. I publish weekly on wikibon.com and siliconangle.com, and don't forget to check out etr.plus for all the survey analysis. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching. Be well, and we'll see you next time. (upbeat music)

Published Date : Nov 14 2020

SUMMARY :

This is Breaking Analysis and the data team to accommodate.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Bob Metcalfe	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Metcalfe	PERSON	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
November 17th	DATE	0.99+
75%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
five	QUANTITY	0.99+
2020	DATE	0.99+
Snowflake	TITLE	0.99+
1,438 respondents	QUANTITY	0.99+
2018	DATE	0.99+
October	DATE	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
today	DATE	0.99+
more than 5%	QUANTITY	0.99+
theCUBE Studios	ORGANIZATION	0.99+
First	QUANTITY	0.99+
2020s	DATE	0.99+
Snowflake Data Cloud Summit	EVENT	0.99+
Second	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both dimensions	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
iOS	TITLE	0.99+
DSS	ORGANIZATION	0.99+
1980s	DATE	0.99+
each company	QUANTITY	0.99+
decades ago	DATE	0.98+
zero	QUANTITY	0.98+
first	QUANTITY	0.98+
2010s	DATE	0.98+
each quarter	QUANTITY	0.98+
Third	QUANTITY	0.98+
20 years ago	DATE	0.98+
Databricks	ORGANIZATION	0.98+
earlier this year	DATE	0.98+
both	QUANTITY	0.98+
Pure	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
IDG	ORGANIZATION	0.97+
Today	DATE	0.97+
each	QUANTITY	0.97+
Decision Support Systems	ORGANIZATION	0.96+
Boston	LOCATION	0.96+
single digits	QUANTITY	0.96+
siliconangle.com	OTHER	0.96+
one	QUANTITY	0.96+
Spark	TITLE	0.95+
Legacy EMC	ORGANIZATION	0.95+
Kafka	TITLE	0.94+
LinkedIn	ORGANIZATION	0.94+
Snowflake	EVENT	0.92+
first mover	QUANTITY	0.92+
Azure	TITLE	0.91+
InfoWorld	ORGANIZATION	0.91+
dozens and	QUANTITY	0.91+
mid to	DATE	0.91+

UNLIST TILL 4/2 - The Shortest Path to Vertica – Best Practices for Data Warehouse Migration and ETL

hello everybody and thank you for joining us today for the virtual verdict of BBC 2020 today's breakout session is entitled the shortest path to Vertica best practices for data warehouse migration ETL I'm Jeff Healey I'll leave verdict and marketing I'll be your host for this breakout session joining me today are Marco guesser and Mauricio lychee vertical product engineer is joining us from yume region but before we begin I encourage you to submit questions or comments or in the virtual session don't have to wait just type question in a comment in the question box below the slides that click Submit as always there will be a Q&A session the end of the presentation will answer as many questions were able to during that time any questions we don't address we'll do our best to answer them offline alternatively visit Vertica forums that formed at vertical comm to post your questions there after the session our engineering team is planning to join the forums to keep the conversation going also reminder that you can maximize your screen by clicking the double arrow button and lower right corner of the sides and yes this virtual session is being recorded be available to view on demand this week send you a notification as soon as it's ready now let's get started over to you mark marco andretti oh hello everybody this is Marco speaking a sales engineer from Amir said I'll just get going ah this is the agenda part one will be done by me part two will be done by Mauricio the agenda is as you can see big bang or piece by piece and the migration of the DTL migration of the physical data model migration of et I saw VTL + bi functionality what to do with store procedures what to do with any possible existing user defined functions and migration of the data doctor will be by Maurice it you want to talk about emeritus Rider yeah hello everybody my name is Mauricio Felicia and I'm a birth record pre-sales like Marco I'm going to talk about how to optimize that were always using some specific vertical techniques like table flattening live aggregated projections so let me start with be a quick overview of the data browser migration process we are going to talk about today and normally we often suggest to start migrating the current that allows the older disease with limited or minimal changes in the overall architecture and yeah clearly we will have to port the DDL or to redirect the data access tool and we will platform but we should minimizing the initial phase the amount of changes in order to go go live as soon as possible this is something that we also suggest in the second phase we can start optimizing Bill arouse and which again with no or minimal changes in the architecture as such and during this optimization phase we can create for example dog projections or for some specific query or optimize encoding or change some of the visual spools this is something that we normally do if and when needed and finally and again if and when needed we go through the architectural design for these operations using full vertical techniques in order to take advantage of all the features we have in vertical and this is normally an iterative approach so we go back to name some of the specific feature before moving back to the architecture and science we are going through this process in the next few slides ok instead in order to encourage everyone to keep using their common sense when migrating to a new database management system people are you often afraid of it it's just often useful to use the analogy of how smooth in your old home you might have developed solutions for your everyday life that make perfect sense there for example if your old cent burner dog can't walk anymore you might be using a fork lifter to heap in through your window in the old home well in the new home consider the elevator and don't complain that the window is too small to fit the dog through this is very much in the same way as Narita but starting to make the transition gentle again I love to remain in my analogy with the house move picture your new house as your new holiday home begin to install everything you miss and everything you like from your old home once you have everything you need in your new house you can shut down themselves the old one so move each by feet and go for quick wins to make your audience happy you do bigbang only if they are going to retire the platform you are sitting on where you're really on a sinking ship otherwise again identify quick wings implement published and quickly in Vertica reap the benefits enjoy the applause use the gained reputation for further funding and if you find that nobody's using the old platform anymore you can shut it down if you really have to migrate you can still go to really go to big battle in one go only if you absolutely have to otherwise migrate by subject area use the group all similar clear divisions right having said that ah you start off by migrating objects objects in the database that's one of the very first steps it consists of migrating verbs the places where you can put the other objects into that is owners locations which is usually schemers then what do you have that you extract tables news then you convert the object definition deploy them to Vertica and think that you shouldn't do it manually never type what you can generate ultimate whatever you can use it enrolls usually there is a system tables in the old database that contains all the roads you can export those to a file reformat them and then you have a create role and create user scripts that you can apply to Vertica if LDAP Active Directory was used for the authentication the old database vertical supports anything within the l dubs standard catalogued schemas should be relatively straightforward with maybe sometimes the difference Vertica does not restrict you by defining a schema as a collection of all objects owned by a user but it supports it emulates it for old times sake Vertica does not need the catalog or if you absolutely need the catalog from the old tools that you use it it usually said it is always set to the name of the database in case of vertical having had now the schemas the catalogs the users and roles in place move the take the definition language of Jesus thought if you are allowed to it's best to use a tool that translates to date types in the PTL generated you might see as a mention of old idea to listen by memory to by the way several times in this presentation we are very happy to have it it actually can export the old database table definition because they got it works with the odbc it gets what the old database ODBC driver translates to ODBC and then it has internal translation tables to several target schema to several target DBMS flavors the most important which is obviously vertical if they force you to use something else there are always tubes like sequel plots in Oracle the show table command in Tara data etc H each DBMS should have a set of tools to extract the object definitions to be deployed in the other instance of the same DBMS ah if I talk about youth views usually a very new definition also in the old database catalog one thing that you might you you use special a bit of special care synonyms is something that were to get emulated different ways depending on the specific needs I said I stop you on the view or table to be referred to or something that is really neat but other databases don't have the search path in particular that works that works very much like the path environment variable in Windows or Linux where you specify in a table an object name without the schema name and then it searched it first in the first entry of the search path then in a second then in third which makes synonym hugely completely unneeded when you generate uvl we remained in the analogy of moving house dust and clean your stuff before placing it in the new house if you see a table like the one here at the bottom this is usually corpse of a bad migration in the past already an ID is usually an integer and not an almost floating-point data type a first name hardly ever has 256 characters and that if it's called higher DT it's not necessarily needed to store the second when somebody was hired so take good care in using while you are moving dust off your stuff and use better data types the same applies especially could string how many bytes does a string container contains for eurozone's it's not for it's actually 12 euros in utf-8 in the way that Vertica encodes strings and ASCII characters one died but the Euro sign thinks three that means that you have to very often you have when you have a single byte character set up a source you have to pay attention oversize it first because otherwise it gets rejected or truncated and then you you will have to very carefully check what their best science is the best promising is the most promising approach is to initially dimension strings in multiples of very initial length and again ODP with the command you see there would be - I you 2 comma 4 will double the lengths of what otherwise will single byte character and multiply that for the length of characters that are wide characters in traditional databases and then load the representative sample of your cells data and profile using the tools that we personally use to find the actually longest datatype and then make them shorter notice you might be talking about the issues of having too long and too big data types on projection design are we live and die with our projects you might know remember the rules on how default projects has come to exist the way that we do initially would be just like for the profiling load a representative sample of the data collector representative set of already known queries from the Vertica database designer and you don't have to decide immediately you can always amend things and otherwise follow the laws of physics avoid moving data back and forth across nodes avoid heavy iOS if you can design your your projections initially by hand encoding matters you know that the database designer is a very tight fisted thing it would optimize to use as little space as possible you will have to think of the fact that if you compress very well you might end up using more time in reading it this is the testimony to run once using several encoding types and you see that they are l e is the wrong length encoded if sorted is not even visible while the others are considerably slower you can get those nights and look it in look at them in detail I will go in detail you now hear about it VI migrations move usually you can expect 80% of everything to work to be able to live to be lifted and shifted you don't need most of the pre aggregated tables because we have live like regain projections many BI tools have specialized query objects for the dimensions and the facts and we have the possibility to use flatten tables that are going to be talked about later you might have to ride those by hand you will be able to switch off casting because vertical speeds of everything with laps Lyle aggregate projections and you have worked with molap cubes before you very probably won't meet them at all ETL tools what you will have to do is if you do it row by row in the old database consider changing everything to very big transactions and if you use in search statements with parameter markers consider writing to make pipes and using verticals copy command mouse inserts yeah copy c'mon that's what I have here ask you custom functionality you can see on this slide the verticals the biggest number of functions in the database we compare them regularly by far compared to any other database you might find that many of them that you have written won't be needed on the new database so look at the vertical catalog instead of trying to look to migrate a function that you don't need stored procedures are very often used in the old database to overcome their shortcomings that Vertica doesn't have very rarely you will have to actually write a procedure that involves a loop but it's really in our experience very very rarely usually you can just switch to standard scripting and this is basically repeating what Mauricio said in the interest of time I will skip this look at this one here the most of the database data warehouse migration talks should be automatic you can use you can automate GDL migration using ODB which is crucial data profiling it's not crucial but game-changing the encoding is the same thing you can automate at you using our database designer the physical data model optimization in general is game-changing you have the database designer use the provisioning use the old platforms tools to generate the SQL you have no objects without their onus is crucial and asking functions and procedures they are only crucial if they depict the company's intellectual property otherwise you can almost always replace them with something else that's it from me for now Thank You Marco Thank You Marco so we will now point our presentation talking about some of the Vertica that overall the presentation techniques that we can implement in order to improve the general efficiency of the dot arouse and let me start with a few simple messages well the first one is that you are supposed to optimize only if and when this is needed in most of the cases just a little shift from the old that allows to birth will provide you exhaust the person as if you were looking for or even better so in this case probably is not really needed to to optimize anything in case you want optimize or you need to optimize then keep in mind some of the vertical peculiarities for example implement delete and updates in the vertical way use live aggregate projections in order to avoid or better in order to limit the goodbye executions at one time used for flattening in order to avoid or limit joint and and then you can also implement invert have some specific birth extensions life for example time series analysis or machine learning on top of your data we will now start by reviewing the first of these ballots optimize if and when needed well if this is okay I mean if you get when you migrate from the old data where else to birth without any optimization if the first four month level is okay then probably you only took my jacketing but this is not the case one very easier to dispute in session technique that you can ask is to ask basket cells to optimize the physical data model using the birth ticket of a designer how well DB deal which is the vertical database designer has several interfaces here I'm going to use what we call the DB DB programmatic API so basically sequel functions and using other databases you might need to hire experts looking at your data your data browser your table definition creating indexes or whatever in vertical all you need is to run something like these are simple as six single sequel statement to get a very well optimized physical base model you see that we start creating a new design then we had to be redesigned tables and queries the queries that we want to optimize we set our target in this case we are tuning the physical data model in order to maximize query performances this is why we are using my design query and in our statement another possible journal tip would be to tune in order to reduce storage or a mix between during storage and cheering queries and finally we asked Vertica to produce and deploy these optimized design in a matter of literally it's a matter of minutes and in a few minutes what you can get is a fully optimized fiscal data model okay this is something very very easy to implement keep in mind some of the vertical peculiarities Vaska is very well tuned for load and query operations aunt Berta bright rose container to biscuits hi the Pharos container is a group of files we will never ever change the content of this file the fact that the Rose containers files are never modified is one of the political peculiarities and these approach led us to use minimal locks we can add multiple load operations in parallel against the very same table assuming we don't have a primary or unique constraint on the target table in parallel as a sage because they will end up in two different growth containers salad in read committed requires in not rocket fuel and can run concurrently with insert selected because the Select will work on a snapshot of the catalog when the transaction start this is what we call snapshot isolation the kappa recovery because we never change our rows files are very simple and robust so we have a huge amount of bandages due to the fact that we never change the content of B rows files contain indiarose containers but on the other side believes and updates require a little attention so what about delete first when you believe in the ethica you basically create a new object able it back so it appeared a bit later in the Rose or in memory and this vector will point to the data being deleted so that when the feed is executed Vertica will just ignore the rules listed in B delete records and it's not just about the leak and updating vertical consists of two operations delete and insert merge consists of either insert or update which interim is made of the little insert so basically if we tuned how the delete work we will also have tune the update in the merge so what should we do in order to optimize delete well remember what we said that every time we please actually we create a new object a delete vector so avoid committing believe and update too often we reduce work the work for the merge out for the removal method out activities that are run afterwards and be sure that all the interested projections will contain the column views in the dedicate this will let workers directly after access the projection without having to go through the super projection in order to create the vector and the delete will be much much faster and finally another very interesting optimization technique is trying to segregate the update and delete operation from Pyrenean third workload in order to reduce lock contention beliefs something we are going to discuss and these contain using partition partition operation this is exactly what I want to talk about now here you have a typical that arouse architecture so we have data arriving in a landing zone where the data is loaded that is from the data sources then we have a transformation a year writing into a staging area that in turn will feed the partitions block of data in the green data structure we have at the end those green data structure we have at the end are the ones used by the data access tools when they run their queries sometimes we might need to change old data for example because we have late records or maybe because we want to fix some errors that have been originated in the facilities so what we do in this case is we just copied back the partition we want to change or we want to adjust from the green interior a the end to the stage in the area we have a very fast operation which is Tokyo Station then we run our updates or our adjustment procedure or whatever we need in order to fix the errors in the data in the staging area and at the very same time people continues to you with green data structures that are at the end so we will never have contention between the two operations when we updating the staging area is completed what we have to do is just to run a swap partition between tables in order to swap the data that we just finished to adjust in be staging zone to the query area that is the green one at the end this swap partition is very fast is an atomic operation and basically what will happens is just that well exchange the pointer to the data this is a very very effective techniques and lot of customer useless so why flops on table and live aggregate for injections well basically we use slot in table and live aggregate objection to minimize or avoid joint this is what flatten table are used for or goodbye and this is what live aggregate projections are used for now compared to traditional data warehouses better can store and process and aggregate and join order of magnitudes more data that is a true columnar database joint and goodbye normally are not a problem at all they run faster than any traditional data browse that page there are still scenarios were deficits are so big and we are talking about petabytes of data and so quickly going that would mean be something in order to boost drop by and join performances and this is why you can't reduce live aggregate projections to perform aggregations hard loading time and limit the need for global appear on time and flux and tables to combine information from different entity uploading time and again avoid running joint has query undefined okay so live aggregate projections at this point in time we can use live aggregate projections using for built in aggregate functions which are some min Max and count okay let's see how this works suppose that you have a normal table in this case we have a table unit sold with three columns PIB their time and quantity which has been segmented in a given way and on top of this base table we call it uncle table we create a projection you see that we create the projection using the salad that will aggregate the data we get the PID we get the date portion of the time and we get the sum of quantity from from the base table grouping on the first two columns so PID and the date portion of day time okay what happens in this case when we load data into the base table all we have to do with load data into the base table when we load data into the base table we will feel of course big injections that assuming we are running with k61 we will have to projection to projections and we will know the data in those two projection with all the detail in data we are going to load into the table so PAB playtime and quantity but at the very same time at the very same time and without having to do nothing any any particular operation or without having to run any any ETL procedure we will also get automatically in the live aggregate projection for the data pre aggregated with be a big day portion of day time and the sum of quantity into the table name total quantity you see is something that we get for free without having to run any specific procedure and this is very very efficient so the key concept is that during the loading operation from VDL point of view is executed again the base table we do not explicitly aggregate data or we don't have any any plc do the aggregation is automatic and we'll bring the pizza to be live aggregate projection every time we go into the base table you see the two selection that we have we have on in this line on the left side and you see that those two selects will produce exactly the same result so running select PA did they trying some quantity from the base table or running the select star from the live aggregate projection will result exactly in the same data you know this is of course very useful but is much more useful result that if we and we can observe this if we run an explained if we run the select against the base table asking for this group data what happens behind the scene is that basically vertical itself that is a live aggregate projection with the data that has been already aggregating loading phase and rewrite your query using polite aggregate projection this happens automatically you see this is a query that ran a group by against unit sold and vertical decided to rewrite this clearly as something that has to be collected against the light aggregates projection because if I decrease this will save a huge amount of time and effort during the ETL cycle okay and is not just limited to be information you want to aggregate for example another query like select count this thing you might note that can't be seen better basically our goodbyes will also take advantage of the live aggregate injection and again this is something that happens automatically you don't have to do anything to get this okay one thing that we have to keep very very clear in mind Brassica what what we store in the live aggregate for injection are basically partially aggregated beta so in this example we have two inserts okay you see that we have the first insert that is entered in four volts and the second insert which is inserting five rules well in for each of these insert we will have a partial aggregation you will never know that after the first insert you will have a second one so better will calculate the aggregation of the data every time irin be insert it is a key concept and be also means that you can imagine lies the effectiveness of bees technique by inserting large chunk of data ok if you insert data row by row this technique live aggregate rejection is not very useful because for every goal that you insert you will have an aggregation so basically they'll live aggregate injection will end up containing the same number of rows that you have in the base table but if you everytime insert a large chunk of data the number of the aggregations that you will have in the library get from structure is much less than B base data so this is this is a key concept you can see how these works by counting the number of rows that you have in alive aggregate injection you see that if you run the select count star from the solved live aggregate rejection the query on the left side you will get four rules but actually if you explain this query you will see that he was reading six rows so this was because every of those two inserts that we're actively interested a few rows in three rows in India in the live aggregate projection so this is a key concept live aggregate projection keep partially aggregated data this final aggregation will always happen at runtime okay another which is very similar to be live aggregate projection or what we call top K projection we actually do not aggregate anything in the top case injection we just keep the last or limit the amount of rows that we collect using the limit over partition by all the by clothes and this again in this case we create on top of the base stable to top gay projection want to keep the last quantity that has been sold and the other one to keep the max quantity in both cases is just a matter of ordering the data in the first case using the B time column in the second page using quantity in both cases we fill projection with just the last roof and again this is something that we do when we insert data into the base table and this is something that happens automatically okay if we now run after the insert our select against either the max quantity okay or be lost wanted it okay we will get the very last you see that we have much less rows in the top k projections okay we told at the beginning that basically we can use for built-in function you might remember me max sum and count what if I want to create my own specific aggregation on top of the lid and customer sum up because our customers have very specific needs in terms of live aggregate projections well in this case you can code your own live aggregate production user-defined functions so you can create the user-defined transport function to implement any sort of complex aggregation while loading data basically after you implemented miss VPS you can deploy using a be pre pass approach that basically means the data is aggregated as loading time during the data ingestion or the batch approach that means that the data is when that woman is running on top which things to remember on live a granade projections they are limited to be built in function again some max min and count but you can call your own you DTF so you can do whatever you want they can reference only one table and for bass cab version before 9.3 it was impossible to update or delete on the uncle table this limit has been removed in 9.3 so you now can update and delete data from the uncle table okay live aggregate projection will follow the segmentation of the group by expression and in some cases the best optimizer can decide to pick the live aggregates objection or not depending on if depending on the fact that the aggregation is a consistent or not remember that if we insert and commit every single role to be uncoachable then we will end up with a live aggregate indirection that contains exactly the same number of rows in this case living block or using the base table it would be the same okay so this is one of the two fantastic techniques that we can implement in Burtka this live aggregate projection is basically to avoid or limit goodbyes the other which we are going to talk about is cutting table and be reused in order to avoid the means for joins remember that K is very fast running joints but when we scale up to petabytes of beta we need to boost and this is what we have in order to have is problem fixed regardless the amount of data we are dealing with so how what about suction table let me start with normalized schemas everybody knows what is a normalized scheme under is no but related stuff in this slide the main scope of an normalized schema is to reduce data redundancies so and the fact that we reduce data analysis is a good thing because we will obtain fast and more brides we will have to write into a database small chunks of data into the right table the problem with these normalized schemas is that when you run your queries you have to put together the information that arrives from different table and be required to run joint again jointly that again normally is very good to run joint but sometimes the amount of data makes not easy to deal with joints and joints sometimes are not easy to tune what happens in in the normal let's say traditional data browser is that we D normalize the schemas normally either manually or using an ETL so basically we have on one side in this light on the left side the normalized schemas where we can get very fast right on the other side on the left we have the wider table where we run all the three joints and pre aggregation in order to prepare the data for the queries and so we will have fast bribes on the left fast reads on the Left sorry fast bra on the right and fast read on the left side of these slides the probability in the middle because we will push all the complexity in the middle in the ETL that will have to transform be normalized schema into the water table and the way we normally implement these either manually using procedures that we call the door using ETL this is what happens in traditional data warehouse is that we will have to coach in ETL layer in order to round the insert select that will feed from the normalized schema and right into the widest table at the end the one that is used by the data access tools we we are going to to view store to run our theories so this approach is costly because of course someone will have to code this ETL and is slow because someone will have to execute those batches normally overnight after loading the data and maybe someone will have to check the following morning that everything was ok with the batch and is resource intensive of course and is also human being intensive because of the people that will have to code and check the results it ever thrown because it can fail and introduce a latency because there is a get in the time axis between the time t0 when you load the data into be normalized schema and the time t1 when we get the data finally ready to be to be queried so what would be inverter to facilitate this process is to create this flatten table with the flattened T work first you avoid data redundancy because you don't need the wide table on the normalized schema on the left side second is fully automatic you don't have to do anything you just have to insert the data into the water table and the ETL that you have coded is transformed into an insert select by vatika automatically you don't have to do anything it's robust and this Latin c0 is a single fast as soon as you load the data into the water table you will get all the joints executed for you so let's have a look on how it works in this case we have the table we are going to flatten and basically we have to focus on two different clauses the first one is you see that there is one table here I mentioned value 1 which can be defined as default and then the Select or set using okay the difference between the fold and set using is when the data is populated if we use default data is populated as soon as we know the data into the base table if we use set using Google Earth to refresh but everything is there I mean you don't need them ETL you don't need to code any transformation because everything is in the table definition itself and it's for free and of course is in latency zero so as soon as you load the other columns you will have the dimension value valued as well okay let's see an example here suppose here we have a dimension table customer dimension that is on the left side and we have a fact table on on the right you see that the fact table uses columns like o underscore name or Oh the score city which are basically the result of the salad on top of the customer dimension so Beezus were the join is executed as soon as a remote data into the fact table directly into the fact table without of course loading data that arise from the dimension all the data from the dimension will be populated automatically so let's have an example here suppose that we are running this insert as you can see we are running be inserted directly into the fact table and we are loading o ID customer ID and total we are not loading made a major name no city those name and city will be automatically populated by Vertica for you because of the definition of the flood table okay you see behave well all you need in order to have your widest tables built for you your flattened table and this means that at runtime you won't need any join between base fuck table and the customer dimension that we have used in order to calculate name and city because the data is already there this was using default the other option was is using set using the concept is absolutely the same you see that in this case on the on the right side we have we have basically replaced this all on the school name default with all underscore name set using and same is true for city the concept that I said is the same but in this case which we set using then we will have to refresh you see that we have to run these select trash columns and then the name of the table in this case all columns will be fresh or you can specify only certain columns and this will bring the values for name and city reading from the customer dimension so this technique this technique is extremely useful the difference between default and said choosing just to summarize the most important differences remember you just have to remember that default will relate your target when you load set using when you refresh end and in some cases you might need to use them both so in some cases you might want to use both default end set using in this example here we'll see that we define the underscore name using both default and securing and this means that we love the data populated either when we load the data into the base table or when we run the Refresh this is summary of the technique that we can implement in birth in order to make our and other browsers even more efficient and well basically this is the end of our presentation thank you for listening and now we are ready for the Q&A session you

Published Date : Mar 30 2020

SUMMARY :

the end to the stage in the area we have

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Tom	PERSON	0.99+
Marta	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris Keg	PERSON	0.99+
Laura Ipsen	PERSON	0.99+
Jeffrey Immelt	PERSON	0.99+
Chris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Chris O'Malley	PERSON	0.99+
Andy Dalton	PERSON	0.99+
Chris Berg	PERSON	0.99+
Dave Velante	PERSON	0.99+
Maureen Lonergan	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Paul Forte	PERSON	0.99+
Erik Brynjolfsson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Andrew McCafee	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Cheryl	PERSON	0.99+
Mark	PERSON	0.99+
Marta Federici	PERSON	0.99+
Larry	PERSON	0.99+
Matt Burr	PERSON	0.99+
Sam	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Dave Wright	PERSON	0.99+
Maureen	PERSON	0.99+
Google	ORGANIZATION	0.99+
Cheryl Cook	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
$8,000	QUANTITY	0.99+
Justin Warren	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Europe	LOCATION	0.99+
Andy	PERSON	0.99+
30,000	QUANTITY	0.99+
Mauricio	PERSON	0.99+
Philips	ORGANIZATION	0.99+
Robb	PERSON	0.99+
Jassy	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Mike Nygaard	PERSON	0.99+

Phil Kippen, Snowflake, Dave Whittington, AT&T & Roddy Tranum, AT&T | | MWC Barcelona 2023

(gentle music) >> Narrator: "TheCUBE's" live coverage is made possible by funding from Dell Technologies, creating technologies that drive human progress. (upbeat music) >> Hello everybody, welcome back to day four of "theCUBE's" coverage of MWC '23. We're here live at the Fira in Barcelona. Wall-to-wall coverage, John Furrier is in our Palo Alto studio, banging out all the news. Really, the whole week we've been talking about the disaggregation of the telco network, the new opportunities in telco. We're really excited to have AT&T and Snowflake here. Dave Whittington is the AVP, at the Chief Data Office at AT&T. Roddy Tranum is the Assistant Vice President, for Channel Performance Data and Tools at AT&T. And Phil Kippen, the Global Head Of Industry-Telecom at Snowflake, Snowflake's new telecom business. Snowflake just announced earnings last night. Typical Scarpelli, they beat earnings, very conservative guidance, stocks down today, but we like Snowflake long term, they're on that path to 10 billion. Guys, welcome to "theCUBE." Thanks so much >> Phil: Thank you. >> for coming on. >> Dave and Roddy: Thanks Dave. >> Dave, let's start with you. The data culture inside of telco, We've had this, we've been talking all week about this monolithic system. Super reliable. You guys did a great job during the pandemic. Everything shifting to landlines. We didn't even notice, you guys didn't miss a beat. Saved us. But the data culture's changing inside telco. Explain that. >> Well, absolutely. So, first of all IoT and edge processing is bringing forth new and exciting opportunities all the time. So, we're bridging the world between a lot of the OSS stuff that we can do with edge processing. But bringing that back, and now we're talking about working, and I would say traditionally, we talk data warehouse. Data warehouse and big data are now becoming a single mesh, all right? And the use cases and the way you can use those, especially I'm taking that edge data and bringing it back over, now I'm running AI and ML models on it, and I'm pushing back to the edge, and I'm combining that with my relational data. So that mesh there is making all the difference. We're getting new use cases that we can do with that. And it's just, and the volume of data is immense. >> Now, I love ChatGPT, but I'm hoping your data models are more accurate than ChatGPT. I never know. Sometimes it's really good, sometimes it's really bad. But enterprise, you got to be clean with your AI, don't you? >> Not only you have to be clean, you have to monitor it for bias and be ethical about it. We're really good about that. First of all with AT&T, our brand is Platinum. We take care of that. So, we may not be as cutting-edge risk takers as others, but when we go to market with an AI or an ML or a product, it's solid. >> Well hey, as telcos go, you guys are leaning into the Cloud. So I mean, that's a good starting point. Roddy, explain your role. You got an interesting title, Channel Performance Data and Tools, what's that all about? >> So literally anything with our consumer, retail, concenters' channels, all of our channels, from a data perspective and metrics perspective, what it takes to run reps, agents, all the way to leadership levels, scorecards, how you rank in the business, how you're driving the business, from sales, service, customer experience, all that data infrastructure with our great partners on the CDO side, as well as Snowflake, that comes from my team. >> And that's traditionally been done in a, I don't mean the pejorative, but we're talking about legacy, monolithic, sort of data warehouse technologies. >> Absolutely. >> We have a love-hate relationship with them. It's what we had. It's what we used, right? And now that's evolving. And you guys are leaning into the Cloud. >> Dramatic evolution. And what Snowflake's enabled for us is impeccable. We've talked about having, people have dreamed of one data warehouse for the longest time and everything in one system. Really, this is the only way that becomes a reality. The more you get in Snowflake, we can have golden source data, and instead of duplicating that 50 times across AT&T, it's in one place, we just share it, everybody leverages it, and now it's not duplicated, and the process efficiency is just incredible. >> But it really hinges on that separation of storage and compute. And we talk about the monolithic warehouse, and one of the nightmares I've lived with, is having a monolithic warehouse. And let's just go with some of my primary, traditional customers, sales, marketing and finance. They are leveraging BSS OSS data all the time. For me to coordinate a deployment, I have to make sure that each one of these units can take an outage, if it's going to be a long deployment. With the separation of storage, compute, they own their own compute cluster. So I can move faster for these people. 'Cause if finance, I can implement his code without impacting finance or marketing. This brings in CI/CD to more reality. It brings us faster to market with more features. So if he wants to implement a new comp plan for the field reps, or we're reacting to the marketplace, where one of our competitors has done something, we can do that in days, versus waiting weeks or months. >> And we've reported on this a lot. This is the brilliance of Snowflake's founders, that whole separation >> Yep. >> from compute and data. I like Dave, that you're starting with sort of the business flexibility, 'cause there's a cost element of this too. You can dial down, you can turn off compute, and then of course the whole world said, "Hey, that's a good idea." And a VC started throwing money at Amazon, but Redshift said, "Oh, we can do that too, sort of, can't turn off the compute." But I want to ask you Phil, so, >> Sure. >> it looks from my vantage point, like you're taking your Data Cloud message which was originally separate compute from storage simplification, now data sharing, automated governance, security, ultimately the marketplace. >> Phil: Right. >> Taking that same model, break down the silos into telecom, right? It's that same, >> Mm-hmm. >> sorry to use the term playbook, Frank Slootman tells me he doesn't use playbooks, but he's not a pattern matcher, but he's a situational CEO, he says. But the situation in telco calls for that type of strategy. So explain what you guys are doing in telco. >> I think there's, so, what we're launching, we launched last week, and it really was three components, right? So we had our platform as you mentioned, >> Dave: Mm-hmm. >> and that platform is being utilized by a number of different companies today. We also are adding, for telecom very specifically, we're adding capabilities in marketplace, so that service providers can not only use some of the data and apps that are in marketplace, but as well service providers can go and sell applications or sell data that they had built. And then as well, we're adding our ecosystem, it's telecom-specific. So, we're bringing partners in, technology partners, and consulting and services partners, that are very much focused on telecoms and what they do internally, but also helping them monetize new services. >> Okay, so it's not just sort of generic Snowflake into telco? You have specific value there. >> We're purposing the platform specifically for- >> Are you a telco guy? >> I am. You are, okay. >> Total telco guy absolutely. >> So there you go. You see that Snowflake is actually an interesting organizational structure, 'cause you're going after verticals, which is kind of rare for a company of your sort of inventory, I'll say, >> Absolutely. >> I don't mean that as a negative. (Dave laughs) So Dave, take us through the data journey at AT&T. It's a long history. You don't have to go back to the 1800s, but- (Dave laughs) >> Thank you for pointing out, we're a 149-year-old company. So, Jesse James was one of the original customers, (Dave laughs) and we have no longer got his data. So, I'll go back. I've been 17 years singular AT&T, and I've watched it through the whole journey of, where the monolithics were growing, when the consolidation of small, wireless carriers, and we went through that boom. And then we've gone through mergers and acquisitions. But, Hadoop came out, and it was going to solve all world hunger. And we had all the aspects of, we're going to monetize and do AI and ML, and some of the things we learned with Hadoop was, we had this monolithic warehouse, we had this file-based-structured Hadoop, but we really didn't know how to bring this all together. And we were bringing items over to the relational, and we were taking the relational and bringing it over to the warehouse, and trying to, and it was a struggle. Let's just go there. And I don't think we were the only company to struggle with that, but we learned a lot. And so now as tech is finally emerging, with the cloud, companies like Snowflake, and others that can handle that, where we can create, we were discussing earlier, but it becomes more of a conducive mesh that's interoperable. So now we're able to simplify that environment. And the cloud is a big thing on that. 'Cause you could not do this on-prem with on-prem technologies. It would be just too cost prohibitive, and too heavy of lifting, going back and forth, and managing the data. The simplicity the cloud brings with a smaller set of tools, and I'll say in the data space specifically, really allows us, maybe not a single instance of data for all use cases, but a greatly reduced ecosystem. And when you simplify your ecosystem, you simplify speed to market and data management. >> So I'm going to ask you, I know it's kind of internal organizational plumbing, but it'll inform my next question. So, Dave, you're with the Chief Data Office, and Roddy, you're kind of, you all serve in the business, but you're really serving the, you're closer to those guys, they're banging on your door for- >> Absolutely. I try to keep the 130,000 users who may or may not have issues sometimes with our data and metrics, away from Dave. And he just gets a call from me. >> And he only calls when he has a problem. He's never wished me happy birthday. (Dave and Phil laugh) >> So the reason I asked that is because, you describe Dave, some of the Hadoop days, and again love-hate with that, but we had hyper-specialized roles. We still do. You've got data engineers, data scientists, data analysts, and you've got this sort of this pipeline, and it had to be this sequential pipeline. I know Snowflake and others have come to simplify that. My question to you is, how is that those roles, how are those roles changing? How is data getting closer to the business? Everybody talks about democratizing business. Are you doing that? What's a real use example? >> From our perspective, those roles, a lot of those roles on my team for years, because we're all about efficiency, >> Dave: Mm-hmm. >> we cut across those areas, and always have cut across those areas. So now we're into a space where things have been simplified, data processes and copying, we've gone from 40 data processes down to five steps now. We've gone from five steps to one step. We've gone from days, now take hours, hours to minutes, minutes to seconds. Literally we're seeing that time in and time out with Snowflake. So these resources that have spent all their time on data engineering and moving data around, are now freed up more on what they have skills for and always have, the data analytics area of the business, and driving the business forward, and new metrics and new analysis. That's some of the great operational value that we've seen here. As this simplification happens, it frees up brain power. >> So, you're pumping data from the OSS, the BSS, the OKRs everywhere >> Everywhere. >> into Snowflake? >> Scheduling systems, you name it. If you can think of what drives our retail and centers and online, all that data, scheduling system, chat data, call center data, call detail data, all of that enters into this common infrastructure to manage the business on a day in and day out basis. >> How are the roles and the skill sets changing? 'Cause you're doing a lot less ETL, you're doing a lot less moving of data around. There were guys that were probably really good at that. I used to joke in the, when I was in the storage world, like if your job is bandaging lungs, you need to look for a new job, right? So, and they did and people move on. So, are you able to sort of redeploy those assets, and those people, those human resources? >> These folks are highly skilled. And we were talking about earlier, SQL hasn't gone away. Relational databases are not going away. And that's one thing that's made this migration excellent, they're just transitioning their skills. Experts in legacy systems are now rapidly becoming experts on the Snowflake side. And it has not been that hard a transition. There are certainly nuances, things that don't operate as well in the cloud environment that we have to learn and optimize. But we're making that transition. >> Dave: So just, >> Please. >> within the Chief Data Office we have a couple of missions, and Roddy is a great partner and an example of how it works. We try to bring the data for democratization, so that we have one interface, now hopefully know we just have a logical connection back to these Snowflake instances that we connect. But we're providing that governance and cleansing, and if there's a business rule at the enterprise level, we provide it. But the goal at CDO is to make sure that business units like Roddy or marketing or finance, that they can come to a platform that's reliable, robust, and self-service. I don't want to be in his way. So I feel like I'm providing a sub-level of platform, that he can come to and anybody can come to, and utilize, that they're not having to go back and undo what's in Salesforce, or ServiceNow, or in our billers. So, I'm sort of that layer. And then making sure that that ecosystem is robust enough for him to use. >> And that self-service infrastructure is predominantly through the Azure Cloud, correct? >> Dave: Absolutely. >> And you work on other clouds, but it's predominantly through Azure? >> We're predominantly in Azure, yeah. >> Dave: That's the first-party citizen? >> Yeah. >> Okay, I like to think in terms sometimes of data products, and I know you've mentioned upfront, you're Gold standard or Platinum standard, you're very careful about personal information. >> Dave: Yeah. >> So you're not trying to sell, I'm an AT&T customer, you're not trying to sell my data, and make money off of my data. So the value prop and the business case for Snowflake is it's simpler. You do things faster, you're in the cloud, lower cost, et cetera. But I presume you're also in the business, AT&T, of making offers and creating packages for customers. I look at those as data products, 'cause it's not a, I mean, yeah, there's a physical phone, but there's data products behind it. So- >> It ultimately is, but not everybody always sees it that way. Data reporting often can be an afterthought. And we're making it more on the forefront now. >> Yeah, so I like to think in terms of data products, I mean even if the financial services business, it's a data business. So, if we can think about that sort of metaphor, do you see yourselves as data product builders? Do you have that, do you think about building products in that regard? >> Within the Chief Data Office, we have a data product team, >> Mm-hmm. >> and by the way, I wouldn't be disingenuous if I said, oh, we're very mature in this, but no, it's where we're going, and it's somewhat of a journey, but I've got a peer, and their whole job is to go from, especially as we migrate from cloud, if Roddy or some other group was using tables three, four and five and joining them together, it's like, "Well look, this is an offer for data product, so let's combine these and put it up in the cloud, and here's the offer data set product, or here's the opportunity data product," and it's a journey. We're on the way, but we have dedicated staff and time to do this. >> I think one of the hardest parts about that is the organizational aspects of it. Like who owns the data now, right? It used to be owned by the techies, and increasingly the business lines want to have access, you're providing self-service. So there's a discussion about, "Okay, what is a data product? Who's responsible for that data product? Is it in my P&L or your P&L? Somebody's got to sign up for that number." So, it sounds like those discussions are taking place. >> They are. And, we feel like we're more the, and CDO at least, we feel more, we're like the guardians, and the shepherds, but not the owners. I mean, we have a role in it all, but he owns his metrics. >> Yeah, and even from our perspective, we see ourselves as an enabler of making whatever AT&T wants to make happen in terms of the key products and officers' trade-in offers, trade-in programs, all that requires this data infrastructure, and managing reps and agents, and what they do from a channel performance perspective. We still ourselves see ourselves as key enablers of that. And we've got to be flexible, and respond quickly to the business. >> I always had empathy for the data engineer, and he or she had to service all these different lines of business with no business context. >> Yeah. >> Like the business knows good data from bad data, and then they just pound that poor individual, and they're like, "Okay, I'm doing my best. It's just ones and zeros to me." So, it sounds like that's, you're on that path. >> Yeah absolutely, and I think, we do have refined, getting more and more refined owners of, since Snowflake enables these golden source data, everybody sees me and my organization, channel performance data, go to Roddy's team, we have a great team, and we go to Dave in terms of making it all happen from a data infrastructure perspective. So we, do have a lot more refined, "This is where you go for the golden source, this is where it is, this is who owns it. If you want to launch this product and services, and you want to manage reps with it, that's the place you-" >> It's a strong story. So Chief Data Office doesn't own the data per se, but it's your responsibility to provide the self-service infrastructure, and make sure it's governed properly, and in as automated way as possible. >> Well, yeah, absolutely. And let me tell you more, everybody talks about single version of the truth, one instance of the data, but there's context to that, that we are taking, trying to take advantage of that as we do data products is, what's the use case here? So we may have an entity of Roddy as a prospective customer, and we may have a entity of Roddy as a customer, high-value customer over here, which may have a different set of mix of data and all, but as a data product, we can then create those for those specific use cases. Still point to the same data, but build it in different constructs. One for marketing, one for sales, one for finance. By the way, that's where your data engineers are struggling. >> Yeah, yeah, of course. So how do I serve all these folks, and really have the context-common story in telco, >> Absolutely. >> or are these guys ahead of the curve a little bit? Or where would you put them? >> I think they're definitely moving a lot faster than the industry is generally. I think the enabling technologies, like for instance, having that single copy of data that everybody sees, a single pane of glass, right, that's definitely something that everybody wants to get to. Not many people are there. I think, what AT&T's doing, is most definitely a little bit further ahead than the industry generally. And I think the successes that are coming out of that, and the learning experiences are starting to generate momentum within AT&T. So I think, it's not just about the product, and having a product now that gives you a single copy of data. It's about the experiences, right? And now, how the teams are getting trained, domains like network engineering for instance. They typically haven't been a part of data discussions, because they've got a lot of data, but they're focused on the infrastructure. >> Mm. >> So, by going ahead and deploying this platform, for platform's purpose, right, and the business value, that's one thing, but also to start bringing, getting that experience, and bringing new experience in to help other groups that traditionally hadn't been data-centric, that's also a huge step ahead, right? So you need to enable those groups. >> A big complaint of course we hear at MWC from carriers is, "The over-the-top guys are killing us. They're riding on our networks, et cetera, et cetera. They have all the data, they have all the client relationships." Do you see your client relationships changing as a result of sort of your data culture evolving? >> Yes, I'm not sure I can- >> It's a loaded question, I know. >> Yeah, and then I, so, we want to start embedding as much into our network on the proprietary value that we have, so we can start getting into that OTT play, us as any other carrier, we have distinct advantages of what we can do at the edge, and we just need to start exploiting those. But you know, 'cause whether it's location or whatnot, so we got to eat into that. Historically, the network is where we make our money in, and we stack the services on top of it. It used to be *69. >> Dave: Yeah. >> If anybody remembers that. >> Dave: Yeah, of course. (Dave laughs) >> But you know, it was stacked on top of our network. Then we stack another product on top of it. It'll be in the edge where we start providing distinct values to other partners as we- >> I mean, it's a great business that you're in. I mean, if they're really good at connectivity. >> Dave: Yeah. >> And so, it sounds like it's still to be determined >> Dave: Yeah. >> where you can go with this. You have to be super careful with private and for personal information. >> Dave: Yep. >> Yeah, but the opportunities are enormous. >> There's a lot. >> Yeah, particularly at the edge, looking at, private networks are just an amazing opportunity. Factories and name it, hospital, remote hospitals, remote locations. I mean- >> Dave: Connected cars. >> Connected cars are really interesting, right? I mean, if you start communicating car to car, and actually drive that, (Dave laughs) I mean that's, now we're getting to visit Xen Fault Tolerance people. This is it. >> Dave: That's not, let's hold the traffic. >> Doesn't scare me as much as we actually learn. (all laugh) >> So how's the show been for you guys? >> Dave: Awesome. >> What're your big takeaways from- >> Tremendous experience. I mean, someone who doesn't go outside the United States much, I'm a homebody. The whole experience, the whole trip, city, Mobile World Congress, the technologies that are out here, it's been a blast. >> Anything, top two things you learned, advice you'd give to others, your colleagues out in general? >> In general, we talked a lot about technologies today, and we talked a lot about data, but I'm going to tell you what, the accelerator that you cannot change, is the relationship that we have. So when the tech and the business can work together toward a common goal, and it's a partnership, you get things done. So, I don't know how many CDOs or CIOs or CEOs are out there, but this connection is what accelerates and makes it work. >> And that is our audience Dave. I mean, it's all about that alignment. So guys, I really appreciate you coming in and sharing your story in "theCUBE." Great stuff. >> Thank you. >> Thanks a lot. >> All right, thanks everybody. Thank you for watching. I'll be right back with Dave Nicholson. Day four SiliconANGLE's coverage of MWC '23. You're watching "theCUBE." (gentle music)

Published Date : Mar 2 2023

SUMMARY :

that drive human progress. And Phil Kippen, the Global But the data culture's of the OSS stuff that we But enterprise, you got to be So, we may not be as cutting-edge Channel Performance Data and all the way to leadership I don't mean the pejorative, And you guys are leaning into the Cloud. and the process efficiency and one of the nightmares I've lived with, This is the brilliance of the business flexibility, like you're taking your Data Cloud message But the situation in telco and that platform is being utilized You have specific value there. I am. So there you go. I don't mean that as a negative. and some of the things we and Roddy, you're kind of, And he just gets a call from me. (Dave and Phil laugh) and it had to be this sequential pipeline. and always have, the data all of that enters into How are the roles and in the cloud environment that But the goal at CDO is to and I know you've mentioned upfront, So the value prop and the on the forefront now. I mean even if the and by the way, I wouldn't and increasingly the business and the shepherds, but not the owners. and respond quickly to the business. and he or she had to service Like the business knows and we go to Dave in terms doesn't own the data per se, and we may have a entity and really have the and having a product now that gives you and the business value, that's one thing, They have all the data, on the proprietary value that we have, Dave: Yeah, of course. It'll be in the edge business that you're in. You have to be super careful Yeah, but the particularly at the edge, and actually drive that, let's hold the traffic. much as we actually learn. the whole trip, city, is the relationship that we have. and sharing your story in "theCUBE." Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Whittington	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Roddy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Phil	PERSON	0.99+
Phil Kippen	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Jesse James	PERSON	0.99+
AT&T.	ORGANIZATION	0.99+
five steps	QUANTITY	0.99+
Dave Nicholson	PERSON	0.99+
John Furrier	PERSON	0.99+
50 times	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Roddy Tranum	PERSON	0.99+
10 billion	QUANTITY	0.99+
one step	QUANTITY	0.99+
17 years	QUANTITY	0.99+
130,000 users	QUANTITY	0.99+
United States	LOCATION	0.99+
1800s	DATE	0.99+
last week	DATE	0.99+
Barcelona	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
last night	DATE	0.99+
MWC '23	EVENT	0.98+
telco	ORGANIZATION	0.98+
one system	QUANTITY	0.98+
one	QUANTITY	0.98+
40 data processes	QUANTITY	0.98+
today	DATE	0.98+
one place	QUANTITY	0.97+
P&L	ORGANIZATION	0.97+
telcos	ORGANIZATION	0.97+
CDO	ORGANIZATION	0.97+
149-year-old	QUANTITY	0.97+
five	QUANTITY	0.97+
single	QUANTITY	0.96+
three components	QUANTITY	0.96+
One	QUANTITY	0.96+

Karthik Narain and Tanuja Randery | AWS Executive Summit 2022

(relaxing intro music) >> Welcome back to theCUBE's Coverage here live at reinvent 2022. We're here at the Executive Summit upstairs with the Accenture Set three sets broadcasting live four days with theCUBE. I'm John Furrier your host, with two great guests, cube alumnis, back Tanuja Randery, managing director Amazon web service for Europe middle East and Africa, known as EMEA. Welcome back to the Cube. >> Thank you. >> Great to see you. And Karthik Narain, who's the Accenture first cloud lead. Great to see you back again. >> Thank you. >> Thanks for coming back on. All right, so business transformation is all about digital transformation taken to its conclusion. When companies transform, they are now a digital business. Technologies powering value proposition, data security all in the keynotes higher level service at industry specific solutions. The dynamics of the industry are changing radically in front of our eyes for for the better. Karthik, what's your position on this as Accenture looks at this, we've covered all your successes during the pandemic with AWS. What, what do you guys see out there now as this next layer of power dynamics in the industry take place? >> I think cloud is getting interesting and I think there's a general trend towards specialization that's happening in the world of cloud. And cloud is also moving from a general purpose technology backbone to providing specific industry capabilities for every customer within various industries. But the industry cloud is not a new term. It has been used in the past and it's been used in the past in various degrees, whether that's building horizontal solutions, certain specialized SaaS software or providing capabilities that are horizontal for certain industries. But we see the evolution of industry cloud a little differently and a lot more dynamic, which is we see this as a marketplace where ecosystem of capabilities are going to come together to interact with a common data platform data backbone, data model with workflows that'll come together and integrate all of this stuff and help clients reinvent their industry with newer capabilities, but at the same time use the power of democratized innovation that's already there within that industry. So that's the kind of change we are seeing where customers in their strategy are going to implement industry cloud as one of the tenants as they go through their strategy. >> Yeah, and I see in my notes, fit for purposes is a buzzword people are talking about right size in the cloud and then just building on that. And what's interesting, Tanuja I want to get your thoughts because in the US we're one country, so yeah, integrating is kind of within services. You have purview over countries and these regions it's global impact. This is now a global environment. So it's not just the US North America, it's Latin America it's EMEA, this is another variable in the cross connecting of these fit for purpose. What's your view of the these industry specific solutions? >> Yeah, no and thanks Karthik 'cause I'm a hundred percent aligned. You know, I mean, you know this better than me, John, but 90% of workloads have not yet moved to the cloud. And the only way that we think that's going to happen is by bringing together business and IT. So what does that mean? It means starting with business use cases whether that's digital banking or smart connected factories or frankly if it's predictive maintenance or connected beds. But how do we take those use cases leverage them to really drive outcomes with the technology behind them? I think that's the key unlock that we have to get to. And very specifically, and Adam talked about this a lot today, but data, data is the single unifier for all of business and IT coming together to drive value, right? However, the issue is there's a ton of it, (John Furrier chuckling) right? In fact, fun fact if you put all the data that's going to be created over the next five years, which is more than the last 30 years, on a one terabyte little floppy, disk drive, remember those? Well that's going to be 15 round trips to the moon (John Furrier chuckling) and back. That's how much data it is. So our perspective is you got to unify, single data lake, you got to modernize with AI and ML, and then you're going to have to drive innovation on that. Now, I'll give you one tiny example if I may which I love Ryanair, big airline, 150 million passengers. They are also the largest supplier of ham and cheese sandwiches in the air. And catering at that scale is really difficult, right? If you have too much food wastage, sustainability issues, too little customers are really unhappy. So we work with them leveraging AWS cloud and AI ML to build a panini predictor. And in essence, it's taking the data they've got, data we've got, and actually giving them the opportunity to have just the right number of paninis. >> I love the lock and and the key is data to unlock the value. We heard that in the keynote. Karthik, you guys have been working together with AWS and a lot of successes. We've covered some of those on the cube. As you look at these industry solutions they're not the obvious big problems. They're like businesses, you know it could be the pizza shop it could be the dentist office, it could be any business any industry specific carries over. What is the key to unlock it? Is it the data? Is it the solution? What's that key? >> I think, you know the easier answer is all of the about, but like Tanuja said it all starts by bringing the data together and this is a funny thing. It's not creating new data. This data is there within enterprises. Our clients have these data the industries have the data, but for ages these data has been trapped in functional silos and organizations have been doing analytics within those functions. It's about bringing the data together whether that's a single data warehouse or a data mesh. Those are architectural considerations. But it's about bringing cross-functional data together as step one. Step two, is about utilizing the power of cloud for democratized innovation. It's no longer about one company trying to reinvent the wheel, or create a a new wheel within their enterprise. It's about looking around through the power of cloud marketplace to see if there's a solution that is already existing can we use that? Or if I've created something within my company can I use that as a service for others to use? So, the number one thing is using the power of democratized innovation. Second thing is how do you standardize and digitize functions that does not need to be reinvented every single time so that, you know, your organization can do it or you could use that or take that from elsewhere. And the third element is using the power of the platform economy or platforms to find new avenues of revenue opportunity, customer engagement and experiences. So these are all the things that differentiates organization, but all of this is underpinned by a unified data model that helps, you know, use all the (indistinct) there. >> Tanuja, you have mentioned earlier that not everyone has their journey of the cloud looks the same and certainly in the US and EMEA you have different countries and different areas. >> Yep. >> Their journeys are different. Some want speed and fees, some will roll their own. I mean data brick CEO, when I interviewed them that last week, they started database on a credit card swiped it and they didn't want any support. Amazon's knocking on their door saying, "you want support?" "No, we got it covered." Obviously they're from Berkeley and they're nerds, and they're cool. They can roll their own, but not everyone can. >> Yeah. >> And so you have a mix of customer profiles. How do you view that and what's your strategy? How do you get them over productive seeing that business value? What's that transformation look like? >> Yeah, John, you're absolutely right. So you've got those who are born in cloud, they're very savvy, they know exactly what they need. However, what I do find increasingly, even with these digital native customers, is they're also starting to talk business use cases. So they're talking about, "okay how do I take my platform and build a whole bunch of new services on top of that platform?" So, we still have to work with them on this business use case dimension for the next curve of growth that they want to drive. Currently with the global macroeconomic factors obviously they're also very concerned about profitability and costs. So that's one model. In the enterprise space, you have differences. >> Yeah. >> Right, You have the sort of very, very, very savvy enterprises, right? Who know exactly what they're looking for. But for them then it's about how do I lean into sustainability? In fact, we did a survey, and 77% of users that we surveyed said that they could accelerate their sustainably goals by using cloud. So in many cases they haven't cracked that and we can help them do that. So it's really about horses for courses there. And then, then with some other companies, they've done a lot of the basic infrastructure modernization. However, what they haven't been able to yet do is figure out how they're going to actually become a tech company. So I keep getting asked, can I become a tech company? How do I do that? Right? And then finally there are companies which don't have the skills. So if I go to the SMB segment, they don't always have the skills or the resources. And there using scalable market platforms like AWS marketplace, >> Yeah. >> Allows them to get access to solutions without having to have all the capabilities. So it really is- >> This is where partner network really kind of comes in. >> Absolutely. >> Huge value. Having that channel of solution providers I use that term specifically 'cause you're providing the solution for those folks. >> Yeah. Exact- >> And then the folks at the enterprise, we had a quote on the analyst segment earlier on our Cube, "spend more, save more." >> Yeah. >> That's the cloud equations, >> Yeah. because you're going to get it on sustainability you're going to save it on, you're going to save on cost recovery for revenue, time to revenue. So the cloud is the answer for a lot of enterprises out of the recession. >> Absolutely, and in fact, we need to lean in now you heard Adam say this, right? I mean the cost savings potential alone from on-prem to cloud is between 40 and 60 percent. Just that. But I don't think that's it John. >> The bell tightening he said is reigning some right size. Okay, but then also do more, he didn't say that, but analysts are generally saying, if you spend right on the cloud, you'll save more. That's a general thesis. >> Yeah. >> Do you agree with that? >> I absolutely think so. And by the way, usage is, people use it differently as they get smarter. We're constantly working with our customers by the way though, to continuously cost optimize. So you heard about our Graviton3 instances for example. We're using that to constantly optimize, but at the same time, what are the workloads that you haven't yet brought over to the cloud? (John Furrier chuckling) And so supply chain is a great idea. Our health cloud initiative. So we worked with Accenture on the Accenture Health Insights platform, which runs on AWS as an example or the Goldman Sachs one last year, if you remember. >> I do >> The financial cloud. So those, those are some of the things that I think make it easier for people to consume cloud and reimagine their businesses. >> It's funny, I was talking with Adam and we had a little debate about what an ISV is and I talked to the CEO of Mongo. They don't see themselves on the ISV. As they grew up on the cloud, they become platforms, they have their own ISVs and data bricks and Snowflake and others are developing that dynamic. But there's still ISVs out there. So there's a dynamic of growth going on and the need for partners and our belief is that the ecosystem is going to start doubling in size we believe, because of the demand for purpose built or so out of the box. I hate to use that word "out of the box", but you know turnkey solutions that you can buy another one if it breaks. But use the building blocks if you want to build the foundation. That is more durable, more customizable. Do that if you can. >> Well, >> but- >> we've got a phenomenal, >> shall we talk about this? >> Yeah, go get into- >> So, we've built a five year vision together, Accenture and us. which is called Velocity and you'll be much better in describing it, but I'll give you the simple version of Velocity which is taking AWS powered industry solutions and bringing it to market faster, more repeatable and at lower cost. And so think about vertical solutions sitting on a horizontal accelerator platform able to be deployed making transformation less complex. >> Yeah. >> Karthik, weight in on this, because I've talked to you about this before. We've said years ago the horizontal scalability of the cloud's a beautiful thing but verticals where the ML works great too. Now you got ML in all aspects of it. Horizontal verticals here now. >> Yeah, Yeah, absolutely. Again, the power of this kind of platform that we are launching, by the way we're launching tomorrow we are very excited about it, is, create a platform- >> What are you launching tomorrow? Hold on, I got news out there. What's launching? >> We are going to launch a giant platform, which will help clients accelerate their journey to industry cloud. So that's going to happen tomorrow. So what this platform would provide is that this is going to provide the horizontal capabilities that will help clients bootstrap their launch into cloud. And once they get into cloud, they would be able to build industry solutions on this. The way I imagine this is create the chassis that you need for your industry and then add the cartridges, industry cartridges, which are going to be solutions that are going to be built on top of it. And we are going to do this across various industries starting from, you know, healthcare, life sciences to energy to, you know, public services and so on and so forth >> You're going to create a channel machine. A channel creation machine, you're going to allow people to build their own solutions on top of that platform. And that's launching tomorrow. Make sure we get the news on that. >> Exactly. And- >> Ah, No, >> Sorry, and we genuinely believe the power of industry cloud, if you think about it in the past to create a solution one had to be an ISV to create a solution. What cloud is providing for industry today in the concept of industry clouds, this, industry companies are creating industry solution. The best example is, along with, you know, AWS and Accenture, Ecopetrol, which is a leader in the energy industry, has created a platform, you know called Water Intelligence and Management platform. And through this platform, they are attacking the audacious goal of water sustainability, which is going to be a huge problem for humanity that everybody needs to solve. As part of this platform, the goal is to reduce, you know, fresh water usage by 66% or zero, you know, you know, impact to, you know, groundwater is going to be the goal or ambition of Ecopetrol. So all of this is possible because industry players want to jump to the bandwagon because they have all the toolkit of of the cloud that's available with which they could build a software platform with which they can power their entire industry. >> And make money and have a good business. You guys are doing great. Final word, partnership. Where's it go next? You're doing great. Put a plugin for the Accenture AWS partnership. >> Well, I mean we have a phenomenal relationship and partnership, which is amazing. We really believe in the power of three which is the GSI, the ISV, and us together. And I have to go back to the thing I keep focused on 90% of workloads not in cloud. I think together we can enable those companies to come into the cloud. Very importantly, start to innovate launch new products and refuel the economy. So I think- >> We'll have to check on that >> Very, very optimistic. >> We'll have to check on that number. >> That seems a little- >> You got to check on that number. >> 90 seems a little bit amazing. >> 90% of workloads. >> That sounds, maybe, I'd be surprised. Maybe a little bit lower than that. Maybe. We'll see. >> We got to start turning it. >> It's still a lot. >> (laughs) It's still a lot. >> A lot more. Still first, still early days. Thanks so much for the conversation Karthik great to see you again Tanuja, thanks for your time. >> Thank you, John. >> Congratulations, on your success. Okay, this is theCube up here in the executive summit. You're watching theCube, the leader in high tech coverage, we'll be right back with more coverage here, and the Accenture set after the short break. (calm outro music)

Published Date : Nov 30 2022

SUMMARY :

We're here at the Great to see you. in front of our eyes for for the better. So that's the kind of change So it's not just the US North the opportunity to have just and the key is data to unlock the value. And the third element is using and certainly in the US and they're nerds, And so you have a mix for the next curve of growth of the basic infrastructure modernization. to have all the capabilities. This is where partner Having that channel of solution providers we had a quote on the So the cloud is the answer I mean the cost savings potential alone if you spend right on the are the workloads that you the things that I think make it of the box", but you know and bringing it to market the cloud's a beautiful thing Again, the power of this What are you create the chassis that you need You're going to create the goal is to reduce, you know, Put a plugin for the and refuel the economy. You got to check 90 seems a little Maybe a little bit lower than that. great to see you again Tanuja, and the Accenture set

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Tanuja Randery	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adam	PERSON	0.99+
Tanuja	PERSON	0.99+
Karthik	PERSON	0.99+
90%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
Karthik Narain	PERSON	0.99+
US	LOCATION	0.99+
Ryanair	ORGANIZATION	0.99+
zero	QUANTITY	0.99+
77%	QUANTITY	0.99+
third element	QUANTITY	0.99+
tomorrow	DATE	0.99+
Ecopetrol	ORGANIZATION	0.99+
last year	DATE	0.99+
Mongo	ORGANIZATION	0.99+
five year	QUANTITY	0.99+
66%	QUANTITY	0.99+
four days	QUANTITY	0.99+
last week	DATE	0.99+
three	QUANTITY	0.99+
Europe	LOCATION	0.99+
one	QUANTITY	0.99+
60 percent	QUANTITY	0.99+
one terabyte	QUANTITY	0.98+
one model	QUANTITY	0.98+
first	QUANTITY	0.98+
Africa	LOCATION	0.98+
today	DATE	0.98+
Berkeley	LOCATION	0.98+
Latin America	LOCATION	0.98+
theCUBE	ORGANIZATION	0.98+
single	QUANTITY	0.98+
one country	QUANTITY	0.97+
150 million passengers	QUANTITY	0.97+
Second thing	QUANTITY	0.97+
two great guests	QUANTITY	0.97+
40	QUANTITY	0.97+
hundred percent	QUANTITY	0.96+
step one	QUANTITY	0.96+
three sets	QUANTITY	0.96+
90	QUANTITY	0.96+
GSI	ORGANIZATION	0.95+
Step two	QUANTITY	0.93+
Accenture AWS	ORGANIZATION	0.93+
one company	QUANTITY	0.92+
15 round trips	QUANTITY	0.91+
Snowflake	TITLE	0.91+
EMEA	LOCATION	0.9+
ISV	ORGANIZATION	0.89+
EMEA	ORGANIZATION	0.88+
US North America	LOCATION	0.88+
first cloud	QUANTITY	0.85+
last 30 years	DATE	0.84+

theCUBE Insights with Industry Analysts | Snowflake Summit 2022

>>Okay. Okay. We're back at Caesar's Forum. The Snowflake summit 2022. The cubes. Continuous coverage this day to wall to wall coverage. We're so excited to have the analyst panel here, some of my colleagues that we've done a number. You've probably seen some power panels that we've done. David McGregor is here. He's the senior vice president and research director at Ventana Research. To his left is Tony Blair, principal at DB Inside and my in the co host seat. Sanjeev Mohan Sanremo. Guys, thanks so much for coming on. I'm glad we can. Thank you. You're very welcome. I wasn't able to attend the analyst action because I've been doing this all all day, every day. But let me start with you, Dave. What have you seen? That's kind of interested you. Pluses, minuses. Concerns. >>Well, how about if I focus on what I think valuable to the customers of snowflakes and our research shows that the majority of organisations, the majority of people, do not have access to analytics. And so a couple of things they've announced I think address those are helped to address those issues very directly. So Snow Park and support for Python and other languages is a way for organisations to embed analytics into different business processes. And so I think that will be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most most people in the organisation or not, analysts, they're doing some line of business function. Their HR managers, their marketing people, their salespeople, their finance people right there, not sitting there mucking around in the data. They're doing a job and they need analytics in that job. So, >>Tony, I thank you. I've heard a lot of data mesh talk this week. It's kind of funny. Can't >>seem to get away from it. You >>can't see. It seems to be gathering momentum, but But what have you seen? That's been interesting. >>What I have noticed. Unfortunately, you know, because the rooms are too small, you just can't get into the data mesh sessions, so there's a lot of interest in it. Um, it's still very I don't think there's very much understanding of it, but I think the idea that you can put all the data in one place which, you know, to me, stuff like it seems to be kind of sort of in a way, it sounds like almost like the Enterprise Data warehouse, you know, Clouded Cloud Native Edition, you know, bring it all in one place again. Um, I think it's providing, sort of, You know, it's I think, for these folks that think this might be kind of like a a linchpin for that. I think there are several other things that actually that really have made a bigger impression on me. Actually, at this event, one is is basically is, um we watch their move with Eunice store. Um, and it's kind of interesting coming, you know, coming from mongo db last week. And I see it's like these two companies seem to be going converging towards the same place at different speeds. I think it's not like it's going to get there faster than Mongo for a number of different reasons, but I see like a number of common threads here. I mean, one is that Mongo was was was a company. It's always been towards developers. They need you know, start cultivating data, people, >>these guys going the other way. >>Exactly. Bingo. And the thing is that but they I think where they're converging is the idea of operational analytics and trying to serve all constituencies. The other thing, which which also in terms of serving, you know, multiple constituencies is how snowflake is laid out Snow Park and what I'm finding like. There's an interesting I economy. On one hand, you have this very ingrained integration of Anaconda, which I think is pretty ingenious. On the other hand, you speak, let's say, like, let's say the data robot folks and say, You know something our folks wanna work data signs us. We want to work in our environment and use snowflake in the background. So I see those kind of some interesting sort of cross cutting trends. >>So, Sandy, I mean, Frank Sullivan, we'll talk about there's definitely benefits into going into the walled garden. Yeah, I don't think we dispute that, but we see them making moves and adding more and more open source capabilities like Apache iceberg. Is that a Is that a move to sort of counteract the narrative that the data breaks is put out there. Is that customer driven? What's your take on that? >>Uh, primarily I think it is to contract this whole notion that once you move data into snowflake, it's a proprietary format. So I think that's how it started. But it's hugely beneficial to the customers to the users, because now, if you have large amounts of data in parquet files, you can leave it on s three. But then you using the the Apache iceberg table format. In a snowflake, you get all the benefits of snowflakes. Optimizer. So, for example, you get the, you know, the micro partitioning. You get the meta data. So, uh, in a single query, you can join. You can do select from a snowflake table union and select from iceberg table, and you can do store procedures, user defined functions. So I think they what they've done is extremely interesting. Uh, iceberg by itself still does not have multi table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache iceberg in a raw format, they don't have it. But snowflake does, >>right? There's hence the delta. And maybe that maybe that closes over time. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I mean, it reminds me of, like reinvent in 2013, you know? But then I'm struck by the complexity of the last big data era and a dupe and all the different tools. And is this different, or is it the sort of same wine new new bottle? You guys have any thoughts on that? >>I think it's different and I'll tell you why. I think it's different because it's based around sequel. So if back to Tony's point, these vendors are coming at this from different angles, right? You've got data warehouse vendors and you've got data lake vendors and they're all going to meet in the middle. So in your case, you're taught operational analytical. But the same thing is true with Data Lake and Data Warehouse and Snowflake no longer wants to be known as the Data Warehouse. There a data cloud and our research again. I like to base everything off of that. >>I love what our >>research shows that organisation Two thirds of organisations have sequel skills and one third have big data skills, so >>you >>know they're going to meet in the middle. But it sure is a lot easier to bring along those people who know sequel already to that midpoint than it is to bring big data people to remember. >>Mrr Odula, one of the founders of Cloudera, said to me one time, John Kerry and the Cube, that, uh, sequel is the killer app for a Yeah, >>the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. Animals really have thought out the ease of use, you know? I mean, they thought about I mean, from the get go, they thought of too thin to polls. One is ease of use, and the other is scale. And they've had. And that's basically, you know, I think very much differentiates it. I mean, who do have the scale, but it didn't have the ease of use. But don't I >>still need? Like, if I have, you know, governance from this vendor or, you know, data prep from, you know, don't I still have to have expertise? That's sort of distributed in those those worlds, right? I mean, go ahead. Yeah. >>So the way I see it is snowflake is adding more and more capabilities right into the database. So, for example, they've they've gone ahead and added security and privacy so you can now create policies and do even set level masking, dynamic masking. But most organisations have more than snowflake. So what we are starting to see all around here is that there's a whole series of data catalogue companies, a bunch of companies that are doing dynamic data masking security and governance data observe ability, which is not a space snowflake has gone into. So there's a whole ecosystem of companies that that is mushrooming, although, you know so they're using the native capabilities of snowflake, but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other, like relational databases, you can run these cross platform capabilities in that layer. So so that way, you know, snowflakes done a great job of enabling that ecosystem about >>the stream lit acquisition. Did you see anything here that indicated there making strong progress there? Are you excited about that? You're sceptical. Go ahead. >>And I think it's like the last mile. Essentially. In other words, it's like, Okay, you have folks that are basically that are very, very comfortable with tableau. But you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency, um, to San James Point. I think part of it, this kind of plays into it is what makes this different from the ado Pere is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously to make put this native obviously snowflake acquired stream. Let's so we can expect that's extremely capabilities are going to be native. >>And the other thing, too, about the Hadoop ecosystem is Claudia had to help fund all those different projects and got really, really spread thin. I want to ask you guys about this super cloud we use. Super Cloud is this sort of metaphor for the next wave of cloud. You've got infrastructure aws, azure, Google. It's not multi cloud, but you've got that infrastructure you're building a layer on top of it that hides the underlying complexities of the primitives and the a p I s. And you're adding new value in this case, the data cloud or super data cloud. And now we're seeing now is that snowflake putting forth the notion that they're adding a super path layer. You can now build applications that you can monetise, which to me is kind of exciting. It makes makes this platform even less discretionary. We had a lot of talk on Wall Street about discretionary spending, and that's not discretionary. If you're monetising it, um, what do you guys think about that? Is this something that's that's real? Is it just a figment of my imagination, or do you see a different way of coming any thoughts on that? >>So, in effect, they're trying to become a data operating system, right? And I think that's wonderful. It's ambitious. I think they'll experience some success with that. As I said, applications are important. That's a great way to deliver information. You can monetise them, so you know there's there's a good economic model around it. I think they will still struggle, however, with bringing everything together onto one platform. That's always the challenge. Can you become the platform that's hard, hard to predict? You know, I think this is This is pretty exciting, right? A lot of energy, a lot of large ecosystem. There is a network effect already. Can they succeed in being the only place where data exists? You know, I think that's going to be a challenge. >>I mean, the fact is, I mean, this is a classic best of breed versus the umbrella play. The thing is, this is nothing new. I mean, this is like the you know, the old days with enterprise applications were basically oracle and ASAP vacuumed up all these. You know, all these applications in their in their ecosystem, whereas with snowflake is. And if you look at the cloud, folks, the hyper scale is still building out their own portfolios as well. Some are, You know, some hyper skills are more partner friendly than others. What? What Snowflake is saying is that we're going to give all of you folks who basically are competing against the hyper skills in various areas like data catalogue and pipelines and all that sort of wonderful stuff will make you basically, you know, all equal citizens. You know the burden is on you to basically we will leave. We will lay out the A P. I s Well, we'll allow you to basically, you know, integrate natively to us so you can provide as good experience. But the but the onus is on your back. >>Should the ecosystem be concerned, as they were back to reinvent 2014 that Amazon was going to nibble away at them or or is it different? >>I find what they're doing is different. Uh, for example, data sharing. They were the first ones out the door were data sharing at a large scale. And then everybody has jumped in and said, Oh, we also do data sharing. All the hyper scholars came in. But now what snowflake has done is they've taken it to the next level. Now they're saying it's not just data sharing. It's up sharing and not only up sharing. You can stream the thing you can build, test deploy, and then monetise it. Make it discoverable through, you know, through your marketplace >>you can monetise it. >>Yes. Yeah, so So I I think what they're doing is they are taking it a step further than what hyper scale as they are doing. And because it's like what they said is becoming like the data operating system You log in and you have all of these different functionalities you can do in machine learning. Now you can do data quality. You can do data preparation and you can do Monetisation. Who do you >>think is snowflakes? Biggest competitor? What do you guys think? It's a hard question, isn't it? Because you're like because we all get the we separate computer from storage. We have a cloud data and you go, Okay, that's nice, >>but there's, like, a crack. I think >>there's uniqueness. I >>mean, put it this way. In the old days, it would have been you know, how you know the prime household names. I think today is the hyper scholars and the idea what I mean again, this comes down to the best of breed versus by, you know, get it all from one source. So where is your comfort level? Um, so I think they're kind. They're their co op a Titian the hyper scale. >>Okay, so it's not data bricks, because why they're smaller. >>Well, there is some okay now within the best of breed area. Yes, there is competition. The obvious is data bricks coming in from the data engineering angle. You know, basically the snowflake coming from, you know, from the from the data analyst angle. I think what? Another potential competitor. And I think Snowflake, basically, you know, admitted as such potentially is mongo >>DB. Yeah, >>Exactly. So I mean, yes, there are two different levels of sort >>of a on a longer term collision course. >>Exactly. Exactly. >>Sort of service now and in salesforce >>thing that was that we actually get when I say that a lot of people just laughed. I was like, No, you're kidding. There's no way. I said Excuse me, >>But then you see Mongo last week. We're adding some analytics capabilities and always been developers, as you say, and >>they trashed sequel. But yet they finally have started to write their first real sequel. >>We have M c M Q. Well, now we have a sequel. So what >>were those numbers, >>Dave? Two thirds. One third. >>So the hyper scale is but the hyper scale urz are you going to trust your hyper scale is to do your cross cloud. I mean, maybe Google may be I mean, Microsoft, perhaps aws not there yet. Right? I mean, how important is cross cloud, multi cloud Super cloud Whatever you want to call it What is your data? >>Shows? Cloud is important if I remember correctly. Our research shows that three quarters of organisations are operating in the cloud and 52% are operating across more than one cloud. So, uh, two thirds of the organisations are in the cloud are doing multi cloud, so that's pretty significant. And now they may be operating across clouds for different reasons. Maybe one application runs in one cloud provider. Another application runs another cloud provider. But I do think organisations want that leverage over the hyper scholars right they want they want to be able to tell the hyper scale. I'm gonna move my workloads over here if you don't give us a better rate. Uh, >>I mean, I I think you know, from a database standpoint, I think you're right. I mean, they are competing against some really well funded and you look at big Query barely, you know, solid platform Red shift, for all its faults, has really done an amazing job of moving forward. But to David's point, you know those to me in any way. Those hyper skills aren't going to solve that cross cloud cloud problem, right? >>Right. No, I'm certainly >>not as quickly. No. >>Or with as much zeal, >>right? Yeah, right across cloud. But we're gonna operate better on our >>Exactly. Yes. >>Yes. Even when we talk about multi cloud, the many, many definitions, like, you know, you can mean anything. So the way snowflake does multi cloud and the way mongo db two are very different. So a snowflake says we run on all the hyper scalar, but you have to replicate your data. What Mongo DB is claiming is that one cluster can have notes in multiple different clouds. That is right, you know, quite something. >>Yeah, right. I mean, again, you hit that. We got to go. But, uh, last question, um, snowflake undervalued, overvalued or just about right >>in the stock market or in customers. Yeah. Yeah, well, but, you know, I'm not sure that's the right question. >>That's the question I'm asking. You know, >>I'll say the question is undervalued or overvalued for customers, right? That's really what matters. Um, there's a different audience. Who cares about the investor side? Some of those are watching, but But I believe I believe that the from the customer's perspective, it's probably valued about right, because >>the reason I I ask it, is because it has so hyped. You had $100 billion value. It's the past service now is value, which is crazy for this student Now. It's obviously come back quite a bit below its IPO price. So But you guys are at the financial analyst meeting. Scarpelli laid out 2029 projections signed up for $10 billion.25 percent free time for 20% operating profit. I mean, they better be worth more than they are today. If they do >>that. If I If I see the momentum here this week, I think they are undervalued. But before this week, I probably would have thought there at the right evaluation, >>I would say they're probably more at the right valuation employed because the IPO valuation is just such a false valuation. So hyped >>guys, I could go on for another 45 minutes. Thanks so much. David. Tony Sanjeev. Always great to have you on. We'll have you back for sure. Having us. All right. Thank you. Keep it right there. Were wrapping up Day two and the Cube. Snowflake. Summit 2022. Right back. Mm. Mhm.

Published Date : Jun 16 2022

SUMMARY :

What have you seen? And I also think that the native applications as part of the I've heard a lot of data mesh talk this week. seem to get away from it. It seems to be gathering momentum, but But what have you seen? but I think the idea that you can put all the data in one place which, And the thing is that but they I think where they're converging is the idea of operational that the data breaks is put out there. So, for example, you get the, you know, the micro partitioning. I want to ask you as you look around this I mean the ecosystems pretty vibrant. I think it's different and I'll tell you why. But it sure is a lot easier to bring along those people who know sequel already the difference at this, you know, with with snowflake, is that you don't have to worry about taming the zoo. you know, data prep from, you know, don't I still have to have expertise? So so that way, you know, snowflakes done a great job of Did you see anything here that indicated there making strong is the fact that this all these capabilities, you know, a lot of vendors are taking it very seriously I want to ask you guys about this super cloud we Can you become the platform that's hard, hard to predict? I mean, this is like the you know, the old days with enterprise applications You can stream the thing you can build, test deploy, You can do data preparation and you can do We have a cloud data and you go, Okay, that's nice, I think I In the old days, it would have been you know, how you know the prime household names. You know, basically the snowflake coming from, you know, from the from the data analyst angle. Exactly. I was like, No, But then you see Mongo last week. But yet they finally have started to write their first real sequel. So what One third. So the hyper scale is but the hyper scale urz are you going to trust your hyper scale But I do think organisations want that leverage I mean, I I think you know, from a database standpoint, I think you're right. not as quickly. But we're gonna operate better on our Exactly. the hyper scalar, but you have to replicate your data. I mean, again, you hit that. but, you know, I'm not sure that's the right question. That's the question I'm asking. that the from the customer's perspective, it's probably valued about right, So But you guys are at the financial analyst meeting. But before this week, I probably would have thought there at the right evaluation, I would say they're probably more at the right valuation employed because the IPO valuation is just such Always great to have you on.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Frank Sullivan	PERSON	0.99+
Tony	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Tony Blair	PERSON	0.99+
Tony Sanjeev	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
David McGregor	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
$100 billion	QUANTITY	0.99+
Ventana Research	ORGANIZATION	0.99+
2013	DATE	0.99+
last week	DATE	0.99+
52%	QUANTITY	0.99+
Sanjeev Mohan Sanremo	PERSON	0.99+
more than one cloud	QUANTITY	0.99+
2014	DATE	0.99+
2029 projections	QUANTITY	0.99+
two companies	QUANTITY	0.99+
45 minutes	QUANTITY	0.99+
San James Point	LOCATION	0.99+
$10 billion.25 percent	QUANTITY	0.99+
one application	QUANTITY	0.99+
Odula	PERSON	0.99+
John Kerry	PERSON	0.99+
Python	TITLE	0.99+
Summit 2022	EVENT	0.99+
Data Warehouse	ORGANIZATION	0.99+
Snowflake	EVENT	0.98+
Scarpelli	PERSON	0.98+
Data Lake	ORGANIZATION	0.98+
one platform	QUANTITY	0.98+
this week	DATE	0.98+
today	DATE	0.98+
10 different tables	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
one	QUANTITY	0.97+
Apache	ORGANIZATION	0.97+
Day two	QUANTITY	0.97+
DB Inside	ORGANIZATION	0.96+
one place	QUANTITY	0.96+
one source	QUANTITY	0.96+
one third	QUANTITY	0.96+
Snowflake Summit 2022	EVENT	0.96+
One third	QUANTITY	0.95+
two thirds	QUANTITY	0.95+
Claudia	PERSON	0.94+
one time	QUANTITY	0.94+
one cloud provider	QUANTITY	0.94+
Two thirds	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
data lake	ORGANIZATION	0.92+
Snow Park	LOCATION	0.92+
Cloudera	ORGANIZATION	0.91+
two different levels	QUANTITY	0.91+
three	QUANTITY	0.91+
one cluster	QUANTITY	0.89+
single query	QUANTITY	0.87+
aws	ORGANIZATION	0.84+
first ones	QUANTITY	0.83+
Snowflake summit 2022	EVENT	0.83+
azure	ORGANIZATION	0.82+
mongo db	ORGANIZATION	0.82+
One	QUANTITY	0.81+
Eunice store	ORGANIZATION	0.8+
wave of	EVENT	0.78+
cloud	ORGANIZATION	0.77+
first real sequel	QUANTITY	0.77+
M c M Q.	PERSON	0.76+
Red shift	ORGANIZATION	0.74+
Anaconda	ORGANIZATION	0.73+
Snowflake	ORGANIZATION	0.72+
ASAP	ORGANIZATION	0.71+
Snow	ORGANIZATION	0.68+
snowflake	TITLE	0.66+
Park	TITLE	0.64+
Cube	COMMERCIAL_ITEM	0.63+
Apache	TITLE	0.63+
Mrr	PERSON	0.63+
senior vice president	PERSON	0.62+
Wall Street	ORGANIZATION	0.6+

Juan Loaiza, Oracle | CUBE Conversation 2021

(upbeat music) >> The innovation around databases has exploded over the last few years. Not only do organizations continue to rely on database technology to manage their most mission critical business data. But new use cases have emerged that process and analyze unstructured data. They share data at scale, protect data, provide greater heterogeneity. New technologies are being injected into the database equation. Not just cloud which has been a huge force in the space, but also AI to drive better insights and automation, blockchain to protect data and provide better auditability, new file formats to expand the utility of database technology and more. Debates are bound as to who's the best number one, the fastest, the most cloudy, the least expensive, et cetera. But there is no debate, when it comes to leadership and mission critical database technologies. That status goes to Oracle. And with me to talk about the developments of database technology in the market is cube alum Juan Loaiza, who's executive vice president of Mission Critical Database Technology at Oracle. Juan always great to see you, thanks for making some time. >> Thanks, great to see you Dave, always a pleasure to join you. >> Yeah and I hope you have some time because they've got a lot of questions for you. (chuckles) I want to start with- >> All right I love questions. >> Good I want to start and we'll go deep if you're up for it. I want to start with the GoldenGate announcement. We're covering that recent announcement, the service on OCI. GoldenGate it's part of this your super high availability capabilities that Oracle is so well known for. What do we need to know about the new service and what it brings for your customers? >> Yeah, so first of all, GoldenGate is all about creating real time data throughout an enterprise. So it does replication, data integration, moving data into analytic workloads, streaming analytics of data, migrating of databases and making databases highly available. All those are use cases for real-time data movement. And GoldenGate is really the leading product in the market, has been for many years. We have about 80% of the global fortune 500 running GoldenGate today, in addition to thousands and thousands of smaller customers. So it is the premier data integration, replication, high availability, anything involving moving data in real time, GoldenGate is the premier platform. And so we've had that available as a product for many years. And what we just recently done is we've released it as a cloud service, as a fully managed and automated cloud service. So that's kind of the big new thing that's happening right now. >> So is that what's unique about this, is it's now a service, or there are other attributes that are unique to Oracle? >> Yeah, so the service is kind of the most basic part to it. But the big thing about the service is it makes this product dramatically easier to use. So traditionally the data integration, replication products, although very powerful, also are very complex to use. And one of the big benefits of the service is we've made a dramatically simpler. So not just super experts can use it, but anyone can use it. And also as part of releasing it as a cloud service, we've done a number of unique things including making it completely elastically scalable, pay per use and dynamic scalability. So just in time, real time scalability. So as your workload increases we automatically increase the throughput of GoldenGate. So previously you had to figure all this stuff out ahead of time. It was very static. All these products have been very static. Now it's completely dynamic a native cloud product and that's very unique in the market. >> So, I mean, from an availability standpoint, I guess IBM sort of has this with Db2 but it doesn't offer the heterogeneity that GoldenGate has. But at what about like AWS, Microsoft, Google, do they provide services like, like GoldenGate? >> There's really nothing like the GoldenGate service. When you're talking about people like Google and Azure, they really have do it yourself third-party products. So there'll be a third party data integration replication product, and it's kind of available in their marketplace and customers have to do everything. So it's basically a put it together, your own kit. And it's very complicated. I mean these data integration products have always been complicated, and they're even more complicated in the cloud, if you have to do everything yourself. Amazon has a product but it's really focused on basic data migration to their cloud. It doesn't have the same capabilities as Oracle has. It doesn't have the elasticity, it doesn't have pay peruse, so it's really not very clavy at all. >> Well, so I mean the biggest customers have always glommed onto GoldenGate because they need that super ultra high availability. And they're capable of do it yourself. So, tell us how this compares to two DIY. >> Yeah, so you have mentioned the big customers so you're absolutely right. The big customers have been big users of GoldenGate. Smaller customers or users as well, however, it's been challenging because it's complicated. Data integration has been a complicated area of data management. More and most complicated. And so one of the things this does, is that it expands the market. Makes it much dramatically easier for smaller companies that don't have as many it resources to use the product. Also, smaller companies obviously don't have as much data as the really large giants. So they don't have as much data throughput. So traditionally the price has been high for a small customer. But now, with pay per use in the cloud, it eliminates the two big blockers for smaller enterprises. Which are the costs, the high fixed costs and the complexity of the products. So in which, by the way, it's helpful for everyone also. And for big customers they've also struggled with elasticity. So sometimes a huge batch job will kick in, the rate of change increases and suddenly the replication product doesn't keep up. Because on-prem products aren't really very elastic. So it helps large customers as well. Everybody loves these reviews but the elasticity pay per use, on demand nature of it's really helpful for everybody. >> Well, and because it's delivered as a service I would imagine for the large customers that you're giving them more granularity, so they can apply it maybe for a single application, as opposed to trying to have to justify it across a whole suite. And because the cost is higher, but now if you're allowing me to pay by the drink, is that right? I could just sort of apply it in a more granular level. >> Yes, that's exactly right. It's really pay per use. You can use it as much or as little as you want. You just pay for what you use. And as I mentioned, it's not a static payment either. So if you have a lot of data loads going on and right now you pay a little more, at night when you have less going on, you pay a lot less. So you really just paying for what use. It's very easy to set it up for a single application or all your applications. >> How about for things like continuous replication or real-time analytics, is the service designed to support that? >> Yes, so that's the heritage of GoldenGate. GoldenGate has been around for decades and we've worked with some of the most demanding customers in the world on exactly those things. So real time data all over the enterprise is really the goal that everyone wants. Real-time data from OTP and to analytics, from one system to another system, and for availability. That is the key benefit of GoldenGate. And that's the key technology that we've been working on for decades. And now we have it very easy to use in the cloud. >> Well what would be the overheads associated with that? I mean, for instance, you've go it, you need a second copy. You need the other database copies, and where does it make sense to incur that overhead? Obviously the super high availability apps that can exploit real time. Think like fraud detection is the obvious one, but what else can you add there? >> Well, GoldenGate itself doesn't require any extra copies of anything. However, it does enable customers that want to create for example, an analytics system, a data warehouse, to feed data from all their systems in real time into that data warehouse for example. And it also enables the real-time capabilities, enable high availability and you can get high availability within the cloud with it, between on premises in the cloud, between clouds. Also, you can migrate data. Migrate databases without having to take them down. So all these capabilities are available now and they're very easy to use. >> Okay. Thanks for that clarification. What about autonomous? Is that on the roadmap or what you thinking? >> Yeah, the GoldenGate is essentially an autonomous service. And it works with the Oracle Autonomous Database. So you can both use it as a source for data and as a sink for data, as a place you're writing data. So for example, you can have an autonomous OTP database, that's replicating to another autonomous OTP database in real time. And both of them are replicating changes to the autonomous data warehouse. But it doesn't all have to be autonomous. You can have any mix of, autonomous not autonomous, on-prem in cloud, in anybody's cloud. So that's the beauty of GoldenGate, It's extremely flexible. >> Well, you mentioned the plasticity a couple of times. I mean, why is that so important that that GoldenGate on OCI gives you that elastic, whatever billing the auto-scaling talk, talk to me in terms of what that does for the customer. >> Yeah, there's really two big benefits. One benefit is it's very difficult to predict workloads. So normally on an on-prem configuration, you have to say, okay what is the max possible workload that's going to happen here? And then you have to buy the product, configure the product, get hardware, basically size, everything for that. And then if you guess wrong, you're either spending too much because you oversized it or you have a big data real-time problem. The data can't keep up with the real-time because you've undersized the configuration. So that's hard to do. So the beauty of elasticity and the dynamic elasticity, the pay per use, is you don't have to figure all this stuff out. So if you have more workload, we grow it automatically. If you have less workload, we shrink it automatically. And you don't have to guess ahead of time. You don't have to price ahead of time. So you, you just use what, what you use, right? You don't pay for something that you're not using. So it's a very big change in the whole model of how you use these data, replication, integration, high availability technologies. >> Well, I think I'm correct to say GoldenGate primarily has been for big companies. You mentioned that small companies can now take advantage of this service. We talked about the granularity. And I could definitely see, can they afford it? I guess this is part one and then, and then the other part of the question is, I can see GoldenGate really satisfying your on-prem customers and them taking advantage of it, but do you think this will attract new customers beyond your core? So two part question there. >> Yeah, absolutely. So small customers have been challenged by the complexity of data integration. And that's one of the great things about the cloud services is it's dramatically simpler. So Oracle manages everything. Oracle does the patching, the upgrades. Oracle does the monitoring. It takes care of the high availability of the product. So all that management, complexity, all the configuration set up, everything like that, that's all automated, that's owned by Oracle. So small customers were always challenged by the complexity of product, along with everything else that they had to do. And then the other of course benefit is small customers were challenged by the large fixed price. So now with pay per use, they pay only for what they use. It's really usable by easily by small customers also. So it really expands the market and makes it more broadly applicable. >> So kind of same answer for beyond your existing customer base, beyond the on-prem that that's kind of... You answered >> Right. >> my two part question with one answer, so that was pretty efficient, (chuckles) pun intended. So the bottom line for me and squinting through this announcement is you've got the heterogeneity piece with GoldenGate OCI and as such it's going to give you the capability to create what I'll call an architecturally coherent decentralized data mesh. Big on this data mesh these days, could have decentralized data. With the proviso then I going to be able to connect to OCI, which of course you can do with Azure or I guess you could bring cloud to a customer on prem, first of all, is this correct? And can we expect you over time to do this with AWS or other cloud providers? >> It can move data from Amazon or to Amazon. It can actually handle, any data wherever it lives. So, yeah, it's very flexible and it's really just the automation of all the management, that we're running in our public cloud But the data can be from anywhere to anywhere. >> Cool, all right, let's switch topics here a little bit. Just talk about some of the things that you've been working on, some of the innovation. I sat through your blockchain announcement, it was very cool. Of course I love anything blockchain and crypto, NFTs are exploding, so that Coinbase IPO. It's just really an exciting time out there. I think a lot of people don't really appreciate the innovation that's occurring. So you've been making a lot of big announcements last several months. You've been taking your R and D bringing it into product, So that's great, we love to always see that because that's where really the rubber meets the road. Just for the database side of the house, you announced 21c the next generation of the self-driving data warehouse, ADW, blockchain tables, now you got GoldenGate running on OCI. Take us inside the development organizations. What are the underlying drivers other than your boss. >> When we talk about our autonomous database, it is the mission critical Oracle database, but it's dramatically easier to do. So Oracle does all the management all on automation, but also we use machine learning to tune, and to make it highly available, and to make it highly secure. So that that's been one of our biggest products we've been working on for many years. And recently we enhanced our autonomous data warehouse taking it beyond being a data warehouse to complete a data analytics platform. So it includes things like ETL. So we built ETL into the autonomous data warehouse. We're building our GoldenGate replication into autonomous data warehousing. We built machine learning directly natively into the database. So now, if someone wants to run some machine learning they just run a machine learning queries. They no longer have to stand up a separate system. So a big move that we've been making is, taking it beyond just a database to a full analytic platform. And this goes beyond what anyone else in the industry is doing, because we have a lot more technology. So for example, the ML machine learning directly in the database, the ETL directly in the database. The data replication is directly in the database. All these things are very unique to Oracle. And they dramatically simplify for customers how they manage data. In addition to that, we've also been working in our database product. We've enhanced it tremendously. So our big goal there is to provide what we call it converged database. So everything you need, all the data types. Whether it's JSON, relational, spatial, graph, all that different kinds of data types, all the different kinds of workloads. Analytics, OTP, things like blockchain, microservices events, all built into the Oracle database, making it dramatically easier to both develop and deploy new applications. So those are some of our big, big goals. Make it simple, make it integrated. Take the complexity, we'll take on the complexity. So developers and customers find it easy to develop an easy to use. And we've made huge strides in all these areas in the last couple of years. >> That's awesome. I wonder if we could land on blockchain again for now it's kind of jogging, but sort of on crypto. Though you're not about crypto but you are about applying blockchain. Maybe you can help our audience understand what are some of the real use cases where blockchain tech can be used with Oracle database. >> Yeah, so that's a very interesting topic. As you mentioned, blockchain is very currently, we see a lot of cryptocurrencies. I distributed applications for blockchain. So in general, in the past, we've had two worlds. We've had the enterprise data management world and we've had the blockchain world. And these are very distinct, right? And on the blockchain side the applications have mostly centered around, distributed multi-party applications, right? So where you have multiple parties that all want to reach consensus and then that consensus is stored in a blockchain. So that's kind of been the focus of blockchain. And what we've done is very innovative. We're the first company to ever do this. Is we've taken the core architecture, ideas. And really a lot of it has to do with the cryptography of blockchain. And we've built, we've engineered that natively into the mainstream Oracle database. So now in mainstream Oracle database, we have blockchain technology built in. And it's very dramatically simpler to use. And the use cases, you asked about the use case, that's what we've done. And it's taken us about five years to do this. Now it's been released into the market in our mainstream 19c Oracle database. So the use case is different from the conventional blockchain use case. Which I mentioned was really multi-party consensus based apps. We're trying to make blockchain useful for mainstream, enterprise and government applications. So any kind of mainstream government application, or enterprise application. And that idea of blockchain, the core concept of blockchain, is it addresses a different kind of security problem. So when you look at conventional security, it's really trying to keep people out. So we have things like firewalls, passwords, networking cryption, data encryption. It's all about keeping bad people out of the data. And there's really two big problems that it doesn't address well. One problem is that there's always new security exploits being published. So you have hackers out there that are working overtime. Sometimes they're nation States that are trying to attack data providers. And every week, every month there's a new security exploit that's discovered and this happens all the time. So that's one big problem. So we're building up these elaborate walls of protection around our core data assets. And in the meantime, we have basically barbarians attacking on every side.(chuckles) And every once in a while, they get over the walls and this is just what's happening. So that's one big problem. And the second big problem is elicit changes made by people with credentials. So sometimes you have an insider in your, in your company. Whether it's an administrator or a sales person, a support person, that has valid credentials, but then uses those valid credentials in some illicit way. They go out and change somebody's data for their own gain. And even more common than that cause there's not that many bad guys inside the company to they exist, is stolen credentials. So what's happened in many cases is hackers or nation States will steal for example, administrative credentials and then use those administrative credentials to come into a system and steal data. So that's the kind of problem that is not well addressed by security mechanism. So if you have privileges security mechanism says, yeah you're fine. If somebody steals your privileges, again you get the pass through the gate. And so what we've done with blockchain is we've taken the cryptography elements of blockchain. We call it crypto secure data management. And we've built those into the Oracle database. So think of it this way. If someone actually makes it through over the walls that we built, and in into the core data, what we've done with that cryptographic technology of blockchain, is we've made that immutable. So you can't change it. So even if you make it over the gate you can't get into the core data assets and change those assets. And that's not built into Oracle databases is super easy to adopt. And I think it's going to really enhance and expand the community of people that can actually use that blockchain technology. >> I mean, that's awesome. I could talk all day about blockchain. And I mean, when you think about hackers, it's all there. They're all about ROI, value over cost. And if you can increase the denominator they're going to go somewhere else, right? Because the value will will decline. And this is really the intersection of software engineering cryptography. And I guess even when you bring crypto currency into it, it's like sort of the game theory. That's really kind of not what you're all about, but the first two pieces are really critical in terms of just next generation of raising that security hurdle. Love it. Now, go ahead. >> Yeah it's a different approach. I was just going to say, it's a different approach. Because think about trying to keep people out with things like passwords and firewalls, you can have basically bugs in that software that allow people to exploit and get in. When you're talking about cryptography, that's math, it's very difficult. I mean, you really can't fight pass math. Once the data is cryptographically protected on a blockchain, a hacker can't really do anything with that. It's just, math is math. There's nothing you can do to break it, right. It's very different from trying to get through some algorithm. That's really trying to keep you out. >> Awesome. I said, I could talk forever on this topic. But let me, let me go into some competitive dynamics. You recently announced Autonomous Data Warehouse. You've got service capabilities that are really trying to appeal to the line of business. I want to get your take on that announcement and specifically how you think it compares name names. I'm going to name names you don't have to. But Snowflake, obviously a lot of momentum in the marketplace. AWS with Redshift is doing very, very well. Obviously there are others. But those are two prominent ones that we've tracked in our data shows that have momentum. How do you compare? >> Yeah, so there's a number of different ways to look at the comparison. So the most simplest and straightforward is there's a lot more functionality in Oracle data warehousing. Oracle has been doing this for decades. We have a lot of built-in functionality. For example, machine learning natively built into the database makes it super easy to use. We have mixed workloads, we have spatial capabilities. We have graph capabilities. We have JSON capabilities. We have a microservice capabilities. We have-- So there's a lot more capabilities. So that's number one. Number two, our cloud service is dramatically more elastic. So with our cloud service all you really do, is you basically move the slide. You say hey, I want more resources, I want less resources. In fact, we'll do that automatically, that's called auto-scaling. In contrast when you look at people like Snowflake or Redshift they want you to stand up a new cluster. Hey you have some more workload on Monday, stand up another cluster and then we'll have two sets of clusters or maybe you'd want a third cluster, maybe you want a fourth cluster. So you end up with all these different systems which is how they scale. They say, hey, I can have multiple sets of servers access the same data. With Oracle you don't have to even think about those things. We auto scale, you get more workload. We just give it more resources. You don't even have to think about that. And then the other thing is we're looking at the whole data management end to end problem. So starting with capturing the data, moving the data in real time, transforming the data, loading the data, running machine learning and analytics on the data. Putting all kinds of data in a single place that you can do analytics on all of it together. And then having very rich screen capabilities for viewing the data, graphing the data, modeling the data, all those things. So it's all integrated. It makes it super easy to use. So a much easier, much more functionality and much more elastic than any of our competitors in the market. >> Interesting, thank you for those comments. I mean, it's a different world, right? I mean, you guys got all the market share, they got all the growth, those things over time, you've been around, you see it, they come together and you fight it out and may the best approach wins. >> So we'll be watching >> Yeah also I forgot to mention the obvious thing, which is Oracle runs everywhere. So you can run Oracle on premises. You can run Oracle on the public cloud. You can run what we call cloud at customer. Our competitors really are just public cloud only. So you customers don't get the choice of where they want to run their data warehouse. >> Now Juan a while ago I sat down with David foyer and Mark steamer. We reviewed how Gartner looks at the marketplace and it wasn't surprise that when it came to operational workloads, Oracle stood out. I mean, that's kind of an understatement relative to the major competitors. Most of our viewers, I don't think expected for instance Microsoft or AWS to be that far away from you. But at the same time, the database magic quadrant maybe didn't reflect that gap as widely. So there's some dissonance there with the detailed workload drill downs were dramatic. And I wonder what your take on the results. I mean, obviously you're happy with them. You came out leading in virtually every category or you will one and two, and some of that sort of not even non-mission critical operational stuff. But what can you add to my narrative there? >> Yeah, so Gartner, first of all, we're talking about cloud databases. >> Right. >> Right, so this is not on premises databases this is pure cloud databases. And what they did is they did two things. One is, the main thing was a technical rating of the databases, of the cloud databases. And, there's other vendors that have been had database in the cloud for longer than we have. But in the most recent Gartner analysis report, as you mentioned, Oracle came out on top for cloud database technology, in almost every single operational use case including things like Internet of Things, things like JSON data, variable data, analytics as well as a traditional OTP and mixed workloads. So Oracle was rated the highest technology which isn't a big surprise. We've been doing this for decades. Over 90% of the global fortune 500 run Oracle. And there's a reason, because this is what we're good at. This our core strength. Our availability, our security, our scalability, our functionality, both for OTP and analytics. All the capabilities, built-in machine learning, graph analytics, everything. So even when we compare narrowly things like Internet of Things or variable data against niche competitors that that's what all they do. We came up dramatically ahead. But what surprised a lot of people is how far ahead of some of the other cloud vendors like Amazon, like Azure, like Google, Oracle came out ahead in the cloud database category. So a lot of people think, well, some of these other pure cloud vendors must be ahead of Oracle in cloud database. But actually not. I mean, if you look at the Gartner analyst report, it was very clear. It was Oracle was dramatically ahead of their cloud database technologies with our cloud database. >> So I'm pretty much out of time but last question. I've had some interesting discussions lately and we've pointed out for years in our research that of course you're delivering the entire stack, the database, part of the infrastructure the applications, you have the whole engineered system strategy. And for the most part you're kind of unique in this regard. I mean, Dell just announced that it's spinning off VMware and it could have gone the other direction. And become more integrated hardware and software player, for the data center. But look, it's working for Dell based on the reaction, from the street post announcement. Cisco they got a hardware and software model that's sort of integrated but the company's value that peaked back in the .com boom, it's been very slow to bounce back. But my point is for these companies the street doesn't value, the integrated model. Oracle is kind of the exception. You know, it's at trading at all time highs, I know you're not going to comment on the stock price, but I guess in SAP until it missed it guided conservatively, was kind of on the good trajectory. But so I'm wondering, why do you think Oracle strategy resonates with investors, but not so much those companies? Is it, because you have the applications piece? I mean, maybe that's kind of my premise for, for SAP but what's your take? Why is it working for you? >> Well, okay. I think it's pretty simple, which is some of our competitors, for example, they might have a software product and a hardware product. But mostly those are acquired in their separate products that just happen to be in a portfolio. They are not a single company with a single vision and joint engineering going on. It's really, hey, I got the software on over here. I got the hardware over there, but they don't really talk to each other, they don't really work together. They're not trying to develop something where the stack is actually not just integrated but engineered together. And that is really the key. Oracle focuses on data management top to bottom. So we have everything from our ERP, CRM applications talking to our database, talking to our engineered systems, running in our cloud. And it's all completely engineered together. So Oracle doesn't just acquire these things and kind of glue them together. We actually engineer them and that's fundamentally the difference. You can buy two things and have them as two separate divisions in your company but it doesn't really get you a whole lot. >> Juan it's always a pleasure, I love these conversations and hope we can do more in the future. Really appreciate your time. Thanks for coming to the CUBE >> Pleasure, Dave nice to talk to you. >> All right keep it right there, everybody. This is Dave Vellante for theCUBE, we'll see you next time. (upbeat musiC)

Published Date : Apr 21 2021

SUMMARY :

of database technology in the market Thanks, great to see you Dave, Yeah and I hope you have some time about the new service So that's kind of the big new thing of the most basic part to it. but it doesn't offer the complicated in the cloud, Well, so I mean the biggest customers And so one of the things this does, And because the cost is higher, So if you have a lot And that's the key technology is the obvious one, And it also enables the Is that on the roadmap So that's the beauty of GoldenGate, that does for the customer. the pay per use, is you don't have of the question is, I can see GoldenGate So it really expands the market beyond the on-prem that that's kind of... So the bottom line for me and it's really just the of the self-driving data So for example, the ML but you are about applying blockchain. And the use cases, you of the game theory. Once the data is in the marketplace. So the most simplest and straightforward may the best approach wins. You can run Oracle on the public cloud. But at the same time, the Yeah, so Gartner, first of all, of the databases, of the cloud databases. And for the most part you're And that is really the key. Thanks for coming to the CUBE theCUBE, we'll see you next time.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Juan Loaiza	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Juan	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Monday	DATE	0.99+
two things	QUANTITY	0.99+
One problem	QUANTITY	0.99+
Mark steamer	PERSON	0.99+
One benefit	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
OCI	ORGANIZATION	0.99+
fourth cluster	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
one answer	QUANTITY	0.99+
third cluster	QUANTITY	0.99+
one big problem	QUANTITY	0.99+
two big problems	QUANTITY	0.99+
two sets	QUANTITY	0.99+
Coinbase	ORGANIZATION	0.99+
two part	QUANTITY	0.99+
about five years	QUANTITY	0.98+
two big benefits	QUANTITY	0.98+
first company	QUANTITY	0.97+
two separate divisions	QUANTITY	0.97+
Over 90%	QUANTITY	0.97+
GoldenGate	ORGANIZATION	0.97+
second copy	QUANTITY	0.97+
David foyer	PERSON	0.97+
first two pieces	QUANTITY	0.96+
single	QUANTITY	0.96+
two big blockers	QUANTITY	0.96+
single application	QUANTITY	0.96+

George Lumpkin & Neil Mendelson, Oracle | CUBE Conversation, April 2021

(bright upbeat music) >> Hi well, this is Dave Vellante. We're digging deeper into the world of database. You know, there are a lot of ways to skin a cat and different vendors take different approaches and we're reaching out to the technologists to get their perspective on the major trends that they're seeing in the market, 'cause we want to understand the different ways in which you can solve problems. So look, if you have thoughts and the technical chops on this topic, I'd love to interview you. Just ping me at at DVellante, on Twitter, a lot of ways to get ahold of me. Anyway, we recently spoke with Andrew Mendelsohn, who is Oracle's EVP and he's responsible for database server technologies. And we talked a lot about Oracle's ADW, Autonomous Data Warehouse. And we looked at the cloud database strategy that Oracle is taking and the company's plans and how they're different maybe from other solutions in the marketplace, but I wanted to dig deeper. And so today we have two members of Mendelsohn's team on The Cube, and we're going to probe a little bit. George Lumpkin, is the Vice President of Autonomous Data Warehouse. And Neil Mendelson is the VP of Modern Data Warehouse, that business for Oracle. They're both 20-year veterans of Oracle. When I reached out to Steve Savannah, who's a colleague of mine for many years, he's always telling me how great Oracle is relative to the competition. So I said, okay, come on The Cube and talk about this, give me your best people. And he said, whatever these two don't know about cloud data warehouse, it isn't worth knowing anyway. So with that said gentlemen, welcome to The Cube. Thanks so much for coming on. >> Thank you. >> Hey, glad to be here. >> So George, let's start with you. And maybe we could recap for some of the viewers who might not be familiar with the interview that I did with Andy. In your words, what exactly is an Autonomous Data Warehouse? Is this cloud native? Is it an Oracle buzzword? What is it? >> Well, I mean, Autonomous Data Warehouse is Oracle's cloud data warehouse. It's a service that built to allow business users to get more value from their data. That's what the cloud data warehouse market is. Autonomous Data Warehouse is absolutely cloud native. This is a huge misconception that people might have when they first sort of hear about something, this service because they think this is a Oracle database, right? Oracle makes databases. This is the same old database I knew from 10 years ago. And that's absolutely not true. We built a cloud native service or data warehousing built it with cloud features. You know, if your understanding of the cloud data warehouse market is based upon how you thought things look 10 years ago, like Snowflake wouldn't have even existed, right? You can't base your understanding of Oracle based upon that. We have a modern service that's highly elastic, provides cloud capabilities like online patching and it's fully autonomous. It's really built the business users so they don't need to worry about administering their database. >> So I want to come back and actually ask you some questions about that, but let me follow up and talk about some of the evolution of the ADW. And where did you start? I think it was 2018, maybe where you came from, where you are today, maybe you can take us through the technological progression and maybe the path you took to get here. >> And so 2018, was when we released the service and made generally available, but of course, you know we started much earlier than that. And this was started within my product management team, and other organization. So we really sat down with a blank sheet of paper and we said, what should the data warehouse in the cloud look like? You know, let's put aside everything that Oracle does for its on-prem customers and think about how the cloud should be different. And the first thing that we said was, well, you know, if Oracle writes the database software, and Oracle builds its own hardware, and Oracle has created its own cloud, why do we need customers to manage a database? And that's where the idea of autonomous database came from. That Oracle is managing the entire ecosystem. And therefore we built a database that we believe it's far and away the simplest to use simplest data warehouse in the market. And that's been our focus since we started with 2018. And that continues to be our focus, looking at more ways that we can make an Autonomous Data Warehouse as simpler and easier for business users to get more value out of their data. >> Awesome, one more question. And actually Neil, you might want to chime in on this as well. So just from a technical perspective, you know forget the marketing claims and all the BS. How do you compare ADW to the so-called born in the cloud data warehouses? You mentioned Snowflake, you know Redshift, is Redshift born in the cloud. Well, it was par XL but Amazon's done some good work around Redshift. I think big query is maybe probably a better example 'cause it was, you know, like Snowflake started in the cloud but how do you compare ADW to some of these other so-called born in the cloud data warehouses? >> I think part of this, you mentioned Redshift wasn't important in the cloud. It was, you know, a code base taken from a prior company that was on-premise company. So they adapted it to the cloud, right? And you know, we have done, as George said, much of the same, which is, you know, our starting point was not you know another company's code base, but our starting point was our own code base. But as George said, it's less about the starting point and it's more about where you envision the end point, right? Which is that, you know, whatever your starting point is, I think we have a fundamental different view of the endpoint. Amazon talks about how they're literally built for you know, a cloud built for developers, right? You know, builders, right? And you know Oracle wasn't first in the infrastructure business, we entered through applications business. And all of a sudden, you know we began taking on 100s of 1000s and 100s of even more customers that were SAS customers. Underneath was the database and all the infrastructure. One of the things that we took away from that was that we couldn't possibly hire enough people DBA, to manage all the infrastructure below our applications customers. So one of the things that influenced this is that, you know customers expect SAS applications to just take care of themselves, right? So we had to essentially modify the infrastructure to allow it to do so as well, right? And we're bringing that capability to those people who, you know, may or may not have an application, but their interest is, you know more of this self-service agility type of aspect. >> So it seems to me and Georgia was sort of alluding to this before. I mean, when you mentioned Snowflake a couple of times, and then Neil, something you just said, I'm going to pick up on is you've been around for a long time. And you know, when I talked to the Snowflake people, they know Oracle, a lot of them came from Oracle. They understand I think how you can't just build Oracle overnight and build in the capabilities that Oracle has and the recovery. And you talk to customers and you know you are the gold standard of, you know especially mission critical databases, so I get that. But now you just sort of hit on it, is it takes a lot of people and skill to run the database. So that's the problem that you're saying you were attacking, is that, am I getting that right? >> Right, right, so the people that you talked about who originally built Snowflake came from Oracle, but they came from Oracle more than a decade ago. So their context is over a decade old, right? In the meantime, we've been busy, you know building a economies and many other capabilities, right? Their view of Oracle is that view that was back more than 10 years ago, right? They're still adding capability. So a really good example of this illustration is Oracle as you said, it's the most capable system that's out there and has been for many years. We've been focusing on how do we simplify that and how do we use machine learning embedded within the system itself? Because core to the concept of autonomous is that inside, is this machine learning system that's continually improving, right? That's the whole notion. Where in Snowflakes case, they're still adding functionality. Last year, they added masking which you know functionality they didn't have, but when they added the capability, they added it without, you know, the ability for a business user to actually take advantage of it. There's no capability for a business user to actually find the information that needs to be masked. And then after the information is found, you require a technical person to actually implement the mask. In Oracle's case, we've had masking and those capabilities for a long time, our focus was to be able to provide a simple tool that a business user can use that doesn't need technical or security experience. Find the data that needs to be masked PII data, and then hit a button and have it masked for you. So, you know, they're still, you know, without this notion of a strategy to move toward the system to heal itself and to manage itself, they're just going to continue. As they continue to add more capability, they will in turn add more complexity. What we're trying to do is take complexity out while others are adding it in, its an ironic twist. >> It is an ironic twist. It is interesting to look at it. And I don't want to make this about Snowflake. But I mean, Hey, I like what they're doing. I like them. I know the management, they're growing like crazy and you know and the customers tell me, hey, this is really simple. And it's simple by design. I mean, to your point over time it's going to get, you know, more and more complex. I was talking to Andy, I think it was Andy. He was saying, you know, they've got the different sizes you've got to shape some, you know, they call it t-shirt sizes. And I was like, okay, I got a small, I got a medium and a large, maybe that's okay. But you guys would say, we give more granular you know, a scaling, I guess is the point there, right? I mean George, I don't know if you can comment on that. It just a different strategy. You've got a company that was founded well, I guess, 2015 versus one that was founded in 1977. So you would think the latter has, you know way more function than the former, but George, anything you'd add to this conversation? >> Yeah, I mean, I'm always amazed that there are these database systems that are perceived as cloud native and they do things like sell you database sizes by t-shirt sizes, as you described. I mean, if you look at Snowflake, it's small, medium, large extra large too extra large, but they're all factors of two. You're getting a size of your database of two, four, eight, six, 32, et cetera. Or if you look at AWS Redshift, you're buying your database by the nodes. You say, how many nodes do you want? And in both those cases, this is a cloud native. This is saying we have some hardware underneath our database and we need you, Mr. Customer, to tell us how many servers you want. That's not the way the clouds should work, right? And I think this is one of the things that we did with Autonomous Data Warehouse. We said, no, that's not how the rules should work. We still run our database on hardware, we still have nodes and servers. We should tell the customer, how many CPU's you would like for your data warehouse? You want 16? Sounds good. You want 18? Yeah, we can give you 18. We're not, you know, we're not selling these to you in bundles of eight or bundles of six or powers of two. We'll sell you what you need. That's what cloud elasticity should be. Not this idea that oh, we are a database that should be managed by IT. IT already knows about servers and nodes. Therefore it's okay if we tell people your cloud data warehouse runs on nodes. Within Oracle as Neil said, we wouldn't. The data warehouse should be used by the people who want to actually analyze their data, should be used by the business users. >> Well, and so the other piece of cloud native that has become popular, is this idea of separating compute from storage and being able to scale those two independent of each other which is pretty important, right? Because you don't want to have to pay for a chunk of compute if you don't need the storage and vice versa. Maybe you could talk about that, how you solve that problem, to the extent that you solve that problem. >> Absolutely, we do separate compute print storage with Autonomous Data Warehouse. When you come in and you say, I need 10 CPU's for my data warehouse and I need two terabytes of storage. Those are two dependent decisions that you make. So they're not tied together in any way. And, you are exactly right, Dave, this is how things should work in the cloud. You should pay for what you need, pay for what you use, not be constrained by having big sets of storage you have to use for a given amount CPU or vice versa. >> Okay, go ahead Neil, please. >> Oh, just to add on to that, you know, the other aspect that comes into play is that, you know, so your starting point is X, whatever that happens to be. Over time that changes. And we all know that workloads vary right throughout the day throughout the month, throughout the year by various events that occur maybe the close of the year, close of business at the end of the quarter, it maybe you know, holiday season for retailers and so forth. So, you know, it's not only the starting point, but how do you actually manage the growth, right? scaling up and scaling down, right? In our case, we tried, as George said, we abstracted that completely for the customer basically said check a box, which has auto scale. So, if the system is required more resources, will apply more resources. And we do so instantaneously without any downtime whatsoever, right? Because you know, again, you know, people think in terms of these systems have now become business critical. So if the business critical, you can't just shut down to expand. Imagine during the holiday season is your business is ramping up. And then all of a sudden you have to scale, right? And your system either shuts down, reboots itself, right? Or it slows down to the point that it's a crawl and all your customers get frustrated. We don't do that. You click a button, auto scale and we take care of it for you smoothing out those lumps, right? Without any technical assistance. And again, if you look at Redshift, you look at all these various systems, they require technical assistance to be able to figure out not only your initial data, but how you scale out over time. >> Interesting, okay. So all is said, you know, a lot of companies are using Azure, AWS Google for infrastructure, why would these customers not just use their database? Why would they switch to Oracle or ADW? >> Well, I think Neil will probably add something. I want to start by saying a huge number of our existing Autonomous Data Warehouse customers today are customers of AWS and Azure. They are pulling data from AWS and Azure and bringing it into an Oracle Autonomous Data Warehouse. And we built feature Joe, I focused on product managers. We feel featured for that. And so it's perfectly viable and it it's almost commonplace, that the very largest enterprises to be doing that. But then coming to the question of why would they want to do it? I don't know, Neil, you want to take that? >> Yeah, yeah, so one of the things that we've really see emerge here is you know, a data warehouse doesn't generate the transactions on itself, right? So the data has to come from somewhere, right? And you ask yourself, well, where does the data come from? Well, in a lot of cases, that data is coming from applications and increasingly SAS applications that the company has deployed. And those are, you know, HR applications, you know, CRM applications, you know ERP applications and many vertical applications. In Oracle's case, what we've done is we say, okay, well, we have the application, this transactional thing, we have the infrastructure from the economist data warehouse, why don't we just make it really, really easy? And if you're an Oracle applications customer, that's already running on the Oracle cloud, we will essentially provide you the ability to create a data warehouse from that information, right? With a clicker, with largely either with a product and service or quick start kit. You don't start from scratch, you start from where you are. And there are many cases that where you are has data, very much as George mentioned before telcos, banks, insurance companies, governments, all of the data that they want to analyze, a lot of that data guess where it's coming from, it's coming from Oracle applications. So it makes sense to be able to have both the data that's generated and the data that's being analyzed close to the same place. Because at the end of the day, the payoff pitch for any form of analysis is not coming up with an insight, oh, I realized X, Y, Z, but it's rather putting the insight directly into production. And that's where, when you have this stuff spread all over God's greener trying to go from insight into action can take months, if not years. The reason that a lot of customers are now turning to us is that they need to be much more agile and they need to be able to turn that insight into action immediately without it being a science project. >> Okay, thank you for that. So let's tick them off. Like what are the top things that customers can get from Oracle Autonomous Data Warehouse, that they couldn't get from say a Snowflake or Redshift or Big query or SQL server or something yet. I appreciate you guys' willingness to talk about the competition. Let's tick them off. What are the most important things that we should know about that they can't get elsewhere? >> So first, I mean, we already talked about a couple of what we think are really the major themes of Autonomous Data Warehouse. The services is autonomous. You don't need to worry about managing it, anyone can manage the data warehouse. The service is elastic. You can buy and pay for what you use. You know, those are just what we think of as being the general characteristics of Autonomous Data Warehouse. But you know, when you come to your question of, hey, what do we give that other vendors don't provide? And I think the one angle that Autonomous Data Warehouse does a really good job is and Neil was just discussing this, it focuses on the business problems, right? We have years and years of experience with not just database security, but data security, right? You know, every cloud vendor can say, oh we encrypt all your data, we have these compliance certifications, all of these things. And what they're saying is, we are securing your database, we are securing your database infrastructure. At Oracle of course has to do those as well. But where we go further, is we say, hey, no, no, no, no, no, we know what business users want. They want to secure their data. What kind of data am I storing? Do I have PII data? Could you detect whether there's PII data and tell me about it in case some user loaded something that I wasn't aware of? What kind of privileges did I give my users? Can you make sure that those privileges are right? And can you tell me if users were given privileges that they're not using maybe I need to take them away. These are the problems that Oracle's tackled in security over the last 20 years. It's really more about the business problem. Yeah, some other, oh, go ahead. >> Oh, I'm sorry, I got so many questions for you guys. We'll get back to that 'cause it sounds like there's a long list. (laughs) >> We have nowhere to go.(laughs) I want to pick up with George on something you said about elasticity. Is it true pay by the drink? Do you have a consumption pricing? I mean, can I dial it up and dial it down whenever I want? How does that work? >> Yes, I mean not to be too many technical details, but you say, I want 14 CPU's that's what your database runs at. You can change that default number anytime you want online, right? You can say, okay, I'm coming up on my quarter end, I'm going to raise my database 20 CPU. We just do it on the ply. We just adjust the size--- >> What about the other way? What about coming down? Can I go down to one? >> You go down, you can go down to one--- >> And you're not going to charge me for 14 if I go down to one? >> No, if you set it down to one, you get charged for one, right? >> Okay, that's good, that's good. >> In the background, you know we are also allowing levels of auto scaling. You say, if you say hey, I want to charged for 14 and Oracle, can you take care of all those scaling for me? So if a bunch of people jump on at 5:00 PM, to run some queries, 'cause the executive said, hey, I need a report by tomorrow morning. We'll take care of that for you. We'll let you go beyond 14 and only charge you for exactly what you use for those extra CPU's beyond 14. >> Okay, thank you. Go ahead, Neil. >> And maybe, if we add, you know, Andy talked about this when he was on that show with you last week, right? And you know, he talked about this concept of a converged database, but let me talk about it in the way that we see it from a business point of view, right? You know, business users are looking to, you know ask a variety of questions, right? And those questions need to be able to relate to both you know, the customer themselves, the relationship that the customer might have with others. You know, today we talk about like the social network and who are influencers within that, and then where they actually conduct business. Which is really, you know, in every case, it's on some form of increasingly on a mobile device. So in that case, you want to be able to ask questions, which is not only, you know, who should I focus on, but who are the key influencers within this community, right? That could influence others? And does that happen in a particular place in time? Meaning, you know, let's say pre COVID, it might happen at a coffee shop or somewhere else. We can answer all of those questions and more inside of the autonomous system without having to replicate the data out to one system that does graph and another system that does spatial, a third system that does this. It's like a business user. It's like, wait a minute, come on, you're trying to tell me that I need a separate system and replicate the data just be able to understand location? The answer in many cases is yes, you have to have separate, which a business person says, well, that's absurd. Can't I just do this all in one system? You can with Oracle. >> So look, I'm not trying to be the snarky journalist or analyst here but I want to keep pushing on this issue. So here we are, it's 2021. It's April. We're like a third of the way through the year. And so far, nobody has come out and said, okay, we're going to deliver Autonomous Data Warehouse just like Oracle. So I asked myself, well, why is Oracle doing this? You guys answered, you know, to reduce the labor cost. But I asked myself, is this how they're solving the problem of keeping relevant a database that spans five decades? And you guys said, no, no, this is cloud native born in the cloud, you know started essentially with a new mindset. But is this a trend that others are going to follow? You know, and if so, why haven't we seen it this idea of a self-driving databases? Why is it right now unique to Oracle? What's really going on here? >> So I think there's a really interesting thing that's happening, it's not visible outside of Oracle. It's very visible for those of us who work inside of the development organization. You know, if you look at Oracle, I can tell you bad. I mean, I think it's safe to presume Oracle has the largest database development organization on the planet, right? I mean, it was kind of the largest database or large most used database for the past two decades. And what's happened is we pivoted to building a cloud platform. We're not just building a database, we're taking all of these resources that we have with all these expertise of building database software. We were saying, we now have to build the platform to run and manage the database software in the cloud, right? And it's a little bit like, you know I think to make people relate to it a little better, there was a really good quote from Elon Musk couple of years ago, talking about Tesla. Like everyone looks at the car, right? Tesla, the car is really great. The hard part of this, is building the factory, and that's analogy holds for Oracle. What we're building is the cloud battery. And what we have transitioned is our database development organization is now building as robust a cloud as possible. So that you know, when we increase the number of databases by 10 X, we don't add 10 X, more cloud ops people to manage it. We are ramping up developer building features to automate the management of our cloud infrastructure. And with that automation, we get better ability, less errors, more security. We give benefits to our cloud data warehouse customers with it. And I think this something really important to realize, right? We build database software. We build, you know, an engineered system built for databases called exit data, and we build a cloud platform. And these are really equal tiers in what we are building and developing today in 2021 from Oracle database development organization. >> Well, you mentioned exit data, I want to shift gears here a little bit and talk about we're seeing this hybrid cloud on-premises clouds, they're finally gaining some traction. I got to give props Oracle's cloud of customers really the early to that game. I think it was the first in my view anyway, true same same vision, took you guys a little while to get there but it was the right vision. And the thing I always say about Oracle people don't understand is Oracle invest in R and D, your chairman is also the CTO. You guys are serious about technical investment so you know, that's where innovation comes from. But, and we heard during your recent earnings call, we heard some positive comments on this. So what's your take on delivering autonomous data warehouse on-prem and how do you compare with say Snowflake and AWS in that area? Snowflake, Frank Slootman, I've had him on record saying we're not going to do that halfway house. Forget it, we are always going to be in the cloud. We're never going to do an on-prem installation. AWS, we'll see to date. Yeah, I don't think you can get a Redshift for instance in outposts, but maybe that'll come. But, how do you see that emerging? What's your difference there? Maybe Neil, you could talk about that. >> Yeah, so, you know, I think, you know, customers had a lot of regulated industries, right? Still have concerns about the public cloud. And I think that when you hear statements like, you know, we're never going to do, you know, on-prem. Well, economist cloud at customer, it's not a classic on-prem solution. What it is, it's a piece of our cloud delivered in your data center. It's still the cloud software. Oracle manages it, Oracle, you know, the system itself manages itself and we take care of that responsibility so you don't have to. The differences is that we can make that available in a public cloud as well as in a private cloud, right? And there are so many use cases, you know, that you can imagine from a regulatory point of view, or just from a comfort point of view, where customers are choosing, they want the ability to decide for themselves where to place this stuff as compared to only having one option, right? And you know, you look at a lot of what's happening in the emerging world where, you know, there are a lot of places in the world that may not have, you know, really really high-speed internet connections to make, you know a public cloud feasible. Well, in that case, whether you're talking about, you know an oil rig or you're talking about something else, right? We can put that capability where it needs to be close to the operation that you're talking about, irrespective of the deployment option. >> Well, let me just follow up on that because I think it's interesting that, you know Frank Slootman said that to me, I oftentimes around AWS I say, never say never 'cause they'll surprise you, right? And I've learned that with Andy Jassy, but one of the things that seems difficult for on-prem, would be to separate that compute from storage because you have to actually physically move in resources. I think about Vertica Xeon mode. It's not quite the same, same. So, I mean, in that regard, maybe you're not the same same. And maybe that dogma makes sense for some companies. For Oracle, obviously you've got a huge on-prem state, thoughts on that. >> So, you know, clearly, you know, so typically what we'll do is that we'll provide additional hardware beyond what the customer might expect and that allows them to use the capabilities of expansion, right? We also have the ability to allow the customer to expand from their cloud of customer into the public cloud as well, of which we have a lot of those situations. So we can provide a level of elasticity, even on-premises by over provisioning the systems, well not charging the customer until they use only based on what they consume, right? Combined together with the ability for us to augment their usage in the public cloud as well, right? Where others, again are constraint, right? Because they only have a single option. >> Right, well, you've got the capital resources to do that as well which is not to be overlooked. Okay, I mean, I've blown our time here but you guys are so awesome. (laughs) I appreciate the candor. So last question and George, if you want to throw in a couple of those other tick boxes, you know the differentiators, please feel free, but for both of you, if you can leave customers with the one key point or the top key points on how Oracle Autonomous Data Warehouse can really help them improve their business in the near term, what would they be? Maybe George, you could start and then Neil you bring us home. >> Yeah, I mean, I think that, as I said before, our starting point with Autonomous Data Warehouse, is how can we build a better customer experience in the cloud? And I think, and this continues throughout 2021, and I think that the big theme here is the business users should be able to get value directly from their data warehouses. We talked a few times about how a line of business user should be able to manage their own data, should be able to load their own data warehouse, should be able to start to work with their own data, should be able to run machine learning, model of build machine learning, models against that data and all of that built in, and delivered in Autonomous Data Warehouse. And we think that this is, you know we see our customer organizations large and small, the light bulbs starting to go on how easy the services to use to and how completed it is for helping business users get value from their data. And just adding onto what George said, you know, the development organization has done a tremendous job of really simplifying this cooperation. What we also tried to do that on the business side. You know, when a customer has an on-prem situation, they're looking at moving to the cloud, whether lift and shift or modernized, they're looking at costs, they're looking at risk and they're looking at time. So one of the things we look at is how do we mitigate that? How do we mitigate the cost, the risk and the time? Well, this week, I think we announced our new cloud lift program and the cloud lift program is what Oracle will provide to its cloud engineering resources around the world is that we will do, we will take the cost, the risk and the time out of the equation and Oracle will work directly with the customer or the customer's partner of choice, maybe an Accenture or Deloitte, and we will move them, right? You know, at little or no cost, most cases there's no cost whatsoever, right? We mitigate the risk because we're taking the risk on. And we've built a lot of automated tools to make that go very quickly, right? And securely, and then finally, we do it in a very very short amount of time as compared to what you would need to do with, you know 'cause there is no Redshift on-premises. There is no Snowflake on-premises. You have to convert from what you already have to that, right? And, but the company beyond the technological barriers that George talked about were also trying to smooth the operation so that a business itself can make a decision that not only did they not need the technical people to operate it, they won't need an entire consulting contract with millions of dollars in order to actually do the movement to the cloud. >> Well, guys, I really appreciate you coming on the program and again, your candor to speak openly about you know, your approach, the competitors. And so it's great having you, really really thank you for, for your time. >> Appreciate it. >> And thank you for watching everybody. Look, if you guys want to come back, go toe to toe with these guys, say the word you're always welcome to come on The Cube. One thing for sure, Oracle are serious, when it comes to database. Thank you for watching. This is Dave Vellante. We'll see you next time. (bright music)

Published Date : Apr 7 2021

SUMMARY :

And Neil Mendelson is the for some of the viewers of the cloud data warehouse and maybe the path you took to get here. And the first thing that we And actually Neil, you might want to chime And you know, we have And you know, when I talked In the meantime, we've been busy, you know it's going to get, you know, not selling these to you to the extent that you solve that problem. decisions that you make. Oh, just to add on to that, you know, So all is said, you know, I don't know, Neil, you want to take that? And those are, you know, HR applications, I appreciate you guys' And can you tell me if many questions for you guys. George on something you said but you say, I want 14 CPU's In the background, you Okay, thank you. And maybe, if we add, you know, born in the cloud, you So that you know, when we really the early to that game. And I think that when you hear interesting that, you know We also have the ability to you know the differentiators, And we think that this is, you know speak openly about you know, And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
George	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Andrew Mendelsohn	PERSON	0.99+
Neil	PERSON	0.99+
Neil Mendelson	PERSON	0.99+
Dave	PERSON	0.99+
George Lumpkin	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Deloitte	ORGANIZATION	0.99+
Steve Savannah	PERSON	0.99+
1977	DATE	0.99+
AWS	ORGANIZATION	0.99+
Frank Slootman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
2015	DATE	0.99+
Andy Jassy	PERSON	0.99+
2018	DATE	0.99+
April	DATE	0.99+
100s	QUANTITY	0.99+
5:00 PM	DATE	0.99+
April 2021	DATE	0.99+
tomorrow morning	DATE	0.99+
Tesla	ORGANIZATION	0.99+
10 CPU	QUANTITY	0.99+
Last year	DATE	0.99+
Oracle Autonomous Data Warehouse	ORGANIZATION	0.99+

Marc Staimer, Dragon Slayer Consulting & David Floyer, Wikibon | December 2020

>> Announcer: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is theCUBE conversation. >> Hi everyone, this is Dave Vellante and welcome to this CUBE conversation where we're going to dig in to this, the area of cloud databases. And Gartner just published a series of research in this space. And it's really a growing market, rapidly growing, a lot of new players, obviously the big three cloud players. And with me are three experts in the field, two long time industry analysts. Marc Staimer is the founder, president, and key principal at Dragon Slayer Consulting. And he's joined by David Floyer, the CTO of Wikibon. Gentlemen great to see you. Thanks for coming on theCUBE. >> Good to be here. >> Great to see you too Dave. >> Marc, coming from the great Northwest, I think first time on theCUBE, and so it's really great to have you. So let me set this up, as I said, you know, Gartner published these, you know, three giant tomes. These are, you know, publicly available documents on the web. I know you guys have been through them, you know, several hours of reading. And so, night... (Dave chuckles) Good night time reading. The three documents where they identify critical capabilities for cloud database management systems. And the first one we're going to talk about is, operational use cases. So we're talking about, you know, transaction oriented workloads, ERP financials. The second one was analytical use cases, sort of an emerging space to really try to, you know, the data warehouse space and the like. And, of course, the third is the famous Gartner Magic Quadrant, which we're going to talk about. So, Marc, let me start with you, you've dug into this research just at a high level, you know, what did you take away from it? >> Generally, if you look at all the players in the space they all have some basic good capabilities. What I mean by that is ultimately when you have, a transactional or an analytical database in the cloud, the goal is not to have to manage the database. Now they have different levels of where that goes to as how much you have to manage or what you have to manage. But ultimately, they all manage the basic administrative, or the pedantic tasks that DBAs have to do, the patching, the tuning, the upgrading, all of that is done by the service provider. So that's the number one thing they all aim at, from that point on every database has different capabilities and some will automate a whole bunch more than others, and will have different primary focuses. So it comes down to what you're looking for or what you need. And ultimately what I've learned from end users is what they think they need upfront, is not what they end up needing as they implement. >> David, anything you'd add to that, based on your reading of the Gartner work. >> Yes. It's a thorough piece of work. It's taking on a huge number of different types of uses and size of companies. And I think those are two parameters which really change how companies would look at it. If you're a Fortune 500 or Fortune 2000 type company, you're going to need a broader range of features, and you will need to deal with size and complexity in a much greater sense, and a lot of probably higher levels of availability, and reliability, and recoverability. Again, on the workload side, there are different types of workload and there're... There is as well as having the two transactional and analytic workloads, I think there's an emerging type of workload which is going to be very important for future applications where you want to combine transactional with analytic in real time, in order to automate business processes at a higher level, to make the business processes synchronous as opposed to asynchronous. And that degree of granularity, I think is missed, in a broader view of these companies and what they offer. It's in my view trying in some ways to not compare like with like from a customer point of view. So the very nuance, what you talked about, let's get into it, maybe that'll become clear to the audience. So like I said, these are very detailed research notes. There were several, I'll say analysts cooks in the kitchen, including Henry Cook, whom I don't know, but four other contributing analysts, two of whom are CUBE alum, Don Feinberg, and Merv Adrian, both really, you know, awesome researchers. And Rick Greenwald, along with Adam Ronthal. And these are public documents, you can go on the web and search for these. So I wonder if we could just look at some of the data and bring up... Guys, bring up the slide one here. And so we'll first look at the operational side and they broke it into four use cases. The traditional transaction use cases, the augmented transaction processing, stream/event processing and operational intelligence. And so we're going to show you there's a lot of data here. So what Gartner did is they essentially evaluated critical capabilities, or think of features and functions, and gave them a weighting, or a weighting, and then a rating. It was a weighting and rating methodology. On a s... The rating was on a scale of one to five, and then they weighted the importance of the features based on their assessment, and talking to the many customers they talk to. So you can see here on the first chart, we're showing both the traditional transactions and the augmented transactions and, you know, the thing... The first thing that jumps out at you guys is that, you know, Oracle with Autonomous is off the charts, far ahead of anybody else on this. And actually guys, if you just bring up slide number two, we'll take a look at the stream/event processing and operational intelligence use cases. And you can see, again, you know, Oracle has a big lead. And I don't want to necessarily go through every vendor here, but guys, if you don't mind going back to the first slide 'cause I think this is really, you know, the core of transaction processing. So let's look at this, you've got Oracle, you've got SAP HANA. You know, right there interestingly Amazon Web Services with the Aurora, you know, IBM Db2, which, you know, it goes back to the good old days, you know, down the list. But so, let me again start with Marc. So why is that? I mean, I guess this is no surprise, Oracle still owns the Mission-Critical for the database space. They earned that years ago. One that, you know, over the likes of Db2 and, you know, Informix and Sybase, and, you know, they emerged as number one there. But what do you make of this data Marc? >> If you look at this data in a vacuum, you're looking at specific functionality, I think you need to look at all the slides in total. And the reason I bring that up is because I agree with what David said earlier, in that the use case that's becoming more prevalent is the integration of transaction and analytics. And more importantly, it's not just your traditional data warehouse, but it's AI analytics. It's big data analytics. It's users are finding that they need more than just simple reporting. They need more in-depth analytics so that they can get more actionable insights into their data where they can react in real time. And so if you look at it just as a transaction, that's great. If you're going to just as a data warehouse, that's great, or analytics, that's fine. If you have a very narrow use case, yes. But I think today what we're looking at is... It's not so narrow. It's sort of like, if you bought a streaming device and it only streams Netflix and then you need to get another streaming device 'cause you want to watch Amazon Prime. You're not going to do that, you want one, that does all of it, and that's kind of what's missing from this data. So I agree that the data is good, but I don't think it's looking at it in a total encompassing manner. >> Well, so before we get off the horses on the track 'cause I love to do that. (Dave chuckles) I just kind of let's talk about that. So Marc, you're putting forth the... You guys seem to agree on that premise that the database that can do more than just one thing is of appeal to customers. I suppose that makes, certainly makes sense from a cost standpoint. But, you know, guys feel free to flip back and forth between slides one and two. But you can see SAP HANA, and I'm not sure what cloud that's running on, it's probably running on a combination of clouds, but, you know, scoring very strongly. I thought, you know, Aurora, you know, given AWS says it's one of the fastest growing services in history and they've got it ahead of Db2 just on functionality, which is pretty impressive. I love Google Spanner, you know, love the... What they're trying to accomplish there. You know, you go down to Microsoft is, they're kind of the... They're always good enough a database and that's how they succeed and et cetera, et cetera. But David, it sounds like you agree with Marc. I would say, I would think though, Amazon kind of doesn't agree 'cause they're like a horses for courses. >> I agree. >> Yeah, yeah. >> So I wonder if you could comment on that. >> Well, I want to comment on two vectors. The first vector is that the size of customer and, you know, a mid-sized customer versus a global $2,000 or global 500 customer. For the smaller customer that's the heart of AWS, and they are taking their applications and putting pretty well everything into their cloud, the one cloud, and Aurora is a good choice. But when you start to get to a requirements, as you do in larger companies have very high levels of availability, the functionality is not there. You're not comparing apples and... Apples with apples, it's two very different things. So from a tier one functionality point of view, IBM Db2 and Oracle have far greater capability for recovery and all the features that they've built in over there. >> Because of their... You mean 'cause of the maturity, right? maturity and... >> Because of their... Because of their focus on transaction and recovery, et cetera. >> So SAP though HANA, I mean, that's, you know... (David talks indistinctly) And then... >> Yeah, yeah. >> And then I wanted your comments on that, either of you or both of you. I mean, SAP, I think has a stated goal of basically getting its customers off Oracle that's, you know, there's always this urinary limping >> Yes, yes. >> between the two companies by 2024. Larry has said that ain't going to happen. You know, Amazon, we know still runs on Oracle. It's very hard to migrate Mission-Critical, David, you and I know this well, Marc you as well. So, you know, people often say, well, everybody wants to get off Oracle, it's too expensive, blah, blah, blah. But we talked to a lot of Oracle customers there, they're very happy with the reliability, availability, recoverability feature set. I mean, the core of Oracle seems pretty stable. >> Yes. >> But I wonder if you guys could comment on that, maybe Marc you go first. >> Sure. I've recently done some in-depth comparisons of Oracle and Aurora, and all their other RDS services and Snowflake and Google and a variety of them. And ultimately what surprised me is you made a statement it costs too much. It actually comes in half of Aurora for in most cases. And it comes in less than half of Snowflake in most cases, which surprised me. But no matter how you configure it, ultimately based on a couple of things, each vendor is focused on different aspects of what they do. Let's say Snowflake, for example, they're on the analytical side, they don't do any transaction processing. But... >> Yeah, so if I can... Sorry to interrupt. Guys if you could bring up the next slide that would be great. So that would be slide three, because now we get into the analytical piece Marc that you're talking about that's what Snowflake specialty is. So please carry on. >> Yeah, and what they're focused on is sharing data among customers. So if, for example, you're an automobile manufacturer and you've got a huge supply chain, you can supply... You can share the data without copying the data with any of your suppliers that are on Snowflake. Now, can you do that with the other data warehouses? Yes, you can. But the focal point is for Snowflake, that's where they're aiming it. And whereas let's say the focal point for Oracle is going to be performance. So their performance affects cost 'cause the higher the performance, the less you're paying for the performing part of the payment scale. Because you're paying per second for the CPUs that you're using. Same thing on Snowflake, but the performance is higher, therefore you use less. I mean, there's a whole bunch of things to come into this but at the end of the day what I've found is Oracle tends to be a lot less expensive than the prevailing wisdom. So let's talk value for a second because you said something, that yeah the other databases can do that, what Snowflake is doing there. But my understanding of what Snowflake is doing is they built this global data mesh across multiple clouds. So not only are they compatible with Google or AWS or Azure, but essentially you sign up for Snowflake and then you can share data with anybody else in the Snowflake cloud, that I think is unique. And I know, >> Marc: Yes. >> Redshift, for instance just announced, you know, Redshift data sharing, and I believe it's just within, you know, clusters within a customer, as opposed to across an ecosystem. And I think that's where the network effect is pretty compelling for Snowflake. So independent of costs, you and I can debate about costs and, you know, the tra... The lack of transparency of, because AWS you don't know what the bill is going to be at the end of the month. And that's the same thing with Snowflake, but I find that... And by the way guys, you can flip through slides three and four, because we've got... Let me just take a quick break and you have data warehouse, logical data warehouse. And then the next slide four you got data science, deep learning and operational intelligent use cases. And you can see, you know, Teradata, you know, law... Teradata came up in the mid 1980s and dominated in that space. Oracle does very well there. You can see Snowflake pop-up, SAP with the Data Warehouse, Amazon with Redshift. You know, Google with BigQuery gets a lot of high marks from people. You know, Cloud Data is in there, you know, so you see some of those names. But so Marc and David, to me, that's a different strategy. They're not trying to be just a better data warehouse, easier data warehouse. They're trying to create, Snowflake that is, an incremental opportunity as opposed to necessarily going after, for example, Oracle. David, your thoughts. >> Yeah, I absolutely agree. I mean, ease of use is a primary benefit for Snowflake. It enables you to do stuff very easily. It enables you to take data without ETL, without any of the complexity. It enables you to share a number of resources across many different users and know... And be able to bring in what that particular user wants or part of the company wants. So in terms of where they're focusing, they've got a tremendous ease of use, tremendous focus on what the customer wants. And you pointed out yourself the restrictions there are of doing that both within Oracle and AWS. So yes, they have really focused very, very hard on that. Again, for the future, they are bringing in a lot of additional functions. They're bringing in Python into it, not Python, JSON into the database. They can extend the database itself, whether they go the whole hog and put in transaction as well, that's probably something they may be thinking about but not at the moment. >> Well, but they, you know, they obviously have to have TAM expansion designs because Marc, I mean, you know, if they just get a 100% of the data warehouse market, they're probably at a third of their stock market valuation. So they had better have, you know, a roadmap and plans to extend there. But I want to come back Marc to this notion of, you know, the right tool for the right job, or, you know, best of breed for a specific, the right specific, you know horse for course, versus this kind of notion of all in one, I mean, they're two different ends of the spectrum. You're seeing, you know, Oracle obviously very successful based on these ratings and based on, you know their track record. And Amazon, I think I lost count of the number of data stores (Dave chuckles) with Redshift and Aurora and Dynamo, and, you know, on and on and on. (Marc talks indistinctly) So they clearly want to have that, you know, primitive, you know, different APIs for each access, completely different philosophies it's like Democrats or Republicans. Marc your thoughts as to who ultimately wins in the marketplace. >> Well, it's hard to say who is ultimately going to win, but if I look at Amazon, Amazon is an all-cart type of system. If you need time series, you go with their time series database. If you need a data warehouse, you go with Redshift. If you need transaction, you go with one of the RDS databases. If you need JSON, you go with a different database. Everything is a different, unique database. Moving data between these databases is far from simple. If you need to do a analytics on one database from another, you're going to use other services that cost money. So yeah, each one will do what they say it's going to do but it's going to end up costing you a lot of money when you do any kind of integration. And you're going to add complexity and you're going to have errors. There's all sorts of issues there. So if you need more than one, probably not your best route to go, but if you need just one, it's fine. And if, and on Snowflake, you raise the issue that they're going to have to add transactions, they're going to have to rewrite their database. They have no indexes whatsoever in Snowflake. I mean, part of the simplicity that David talked about is because they had to cut corners, which makes sense. If you're focused on the data warehouse you cut out the indexes, great. You don't need them. But if you're going to do transactions, you kind of need them. So you're going to have to do some more work there. So... >> Well... So, you know, I don't know. I have a different take on that guys. I think that, I'm not sure if Snowflake will add transactions. I think maybe, you know, their hope is that the market that they're creating is big enough. I mean, I have a different view of this in that, I think the data architecture is going to change over the next 10 years. As opposed to having a monolithic system where everything goes through that big data platform, the data warehouse and the data lake. I actually see what Snowflake is trying to do and, you know, I'm sure others will join them, is to put data in the hands of product builders, data product builders or data service builders. I think they're betting that that market is incremental and maybe they don't try to take on... I think it would maybe be a mistake to try to take on Oracle. Oracle is just too strong. I wonder David, if you could comment. So it's interesting to see how strong Gartner rated Oracle in cloud database, 'cause you don't... I mean, okay, Oracle has got OCI, but you know, you think a cloud, you think Google, or Amazon, Microsoft and Google. But if I have a transaction database running on Oracle, very risky to move that, right? And so we've seen that, it's interesting. Amazon's a big customer of Oracle, Salesforce is a big customer of Oracle. You know, Larry is very outspoken about those companies. SAP customers are many, most are using Oracle. I don't, you know, it's not likely that they're going anywhere. My question to you, David, is first of all, why do they want to go to the cloud? And if they do go to the cloud, is it logical that the least risky approach is to stay with Oracle, if you're an Oracle customer, or Db2, if you're an IBM customer, and then move those other workloads that can move whether it's more data warehouse oriented or incremental transaction work that could be done in a Aurora? >> I think the first point, why should Oracle go to the cloud? Why has it gone to the cloud? And if there is a... >> Moreso... Moreso why would customers of Oracle... >> Why would customers want to... >> That's really the question. >> Well, Oracle have got Oracle Cloud@Customer and that is a very powerful way of doing it. Where exactly the same Oracle system is running on premise or in the cloud. You can have it where you want, you can have them joined together. That's unique. That's unique in the marketplace. So that gives them a very special place in large customers that have data in many different places. The second point is that moving data is very expensive. Marc was making that point earlier on. Moving data from one place to another place between two different databases is a very expensive architecture. Having the data in one place where you don't have to move it where you can go directly to it, gives you enormous capabilities for a single database, single database type. And I'm sure that from a transact... From an analytic point of view, that's where Snowflake is going, to a large single database. But where Oracle is going to is where, you combine both the transactional and the other one. And as you say, the cost of migration of databases is incredibly high, especially transaction databases, especially large complex transaction databases. >> So... >> And it takes a long time. So at least a two year... And it took five years for Amazon to actually succeed in getting a lot of their stuff over. And five years they could have been doing an awful lot more with the people that they used to bring it over. So it was a marketing decision as opposed to a rational business decision. >> It's the holy grail of the vendors, they all want your data in their database. That's why Amazon puts so much effort into it. Oracle is, you know, in obviously a very strong position. It's got growth and it's new stuff, it's old stuff. It's, you know... The problem with Oracle it has like many of the legacy vendors, it's the size of the install base is so large and it's shrinking. And the new stuff is.... The legacy stuff is shrinking. The new stuff is growing very, very fast but it's not large enough yet to offset that, you see that in all the learnings. So very positive news on, you know, the cloud database, and they just got to work through that transition. Let's bring up slide number five, because Marc, this is to me the most interesting. So we've just shown all these detailed analysis from Gartner. And then you look at the Magic Quadrant for cloud databases. And, you know, despite Amazon being behind, you know, Oracle, or Teradata, or whomever in every one of these ratings, they're up to the right. Now, of course, Gartner will caveat this and say, it doesn't necessarily mean you're the best, but of course, everybody wants to be in the upper, right. We all know that, but it doesn't necessarily mean that you should go by that database, I agree with what Gartner is saying. But look at Amazon, Microsoft and Google are like one, two and three. And then of course, you've got Oracle up there and then, you know, the others. So that I found that very curious, it is like there was a dissonance between the hardcore ratings and then the positions in the Magic Quadrant. Why do you think that is Marc? >> It, you know, it didn't surprise me in the least because of the way that Gartner does its Magic Quadrants. The higher up you go in the vertical is very much tied to the amount of revenue you get in that specific category which they're doing the Magic Quadrant. It doesn't have to do with any of the revenue from anywhere else. Just that specific quadrant is with that specific type of market. So when I look at it, Oracle's revenue still a big chunk of the revenue comes from on-prem, not in the cloud. So you're looking just at the cloud revenue. Now on the right side, moving to the right of the quadrant that's based on functionality, capabilities, the resilience, other things other than revenue. So visionary says, hey how far are you on the visionary side? Now, how they weight that again comes down to Gartner's experts and how they want to weight it and what makes more sense to them. But from my point of view, the right side is as important as the vertical side, 'cause the vertical side doesn't measure the growth rate either. And if we look at these, some of these are growing much faster than the others. For example, Snowflake is growing incredibly fast, and that doesn't reflect in these numbers from my perspective. >> Dave: I agree. >> Oracle is growing incredibly fast in the cloud. As David pointed out earlier, it's not just in their cloud where they're growing, but it's Cloud@Customer, which is basically an extension of their cloud. I don't know if that's included these numbers or not in the revenue side. So there's... There're a number of factors... >> Should it be in your opinion, Marc, would you include that in your definition of cloud? >> Yeah. >> The things that are hybrid and on-prem would that cloud... >> Yes. >> Well especially... Well, again, it depends on the hybrid. For example, if you have your own license, in your own hardware, but it connects to the cloud, no, I wouldn't include that. If you have a subscription license and subscription hardware that you don't own, but it's owned by the cloud provider, but it connects with the cloud as well, that I would. >> Interesting. Well, you know, to your point about growth, you're right. I mean, it's probably looking at, you know, revenues looking, you know, backwards from guys like Snowflake, it will be double, you know, the next one of these. It's also interesting to me on the horizontal axis to see Cloud Data and Databricks further to the right, than Snowflake, because that's kind of the data lake cloud. >> It is. >> And then of course, you've got, you know, the other... I mean, database used to be boring, so... (David laughs) It's such a hot market space here. (Marc talks indistinctly) David, your final thoughts on all this stuff. What does the customer take away here? What should I... What should my cloud database management strategy be? >> Well, I was positive about Oracle, let's take some of the negatives of Oracle. First of all, they don't make it very easy to rum on other platforms. So they have put in terms and conditions which make it very difficult to run on AWS, for example, you get double counts on the licenses, et cetera. So they haven't played well... >> Those are negotiable by the way. Those... You bring it up on the customer. You can negotiate that one. >> Can be, yes, They can be. Yes. If you're big enough they are negotiable. But Aurora certainly hasn't made it easy to work with other plat... Other clouds. What they did very... >> How about Microsoft? >> Well, no, that is exactly what I was going to say. Oracle with adjacent workloads have been working very well with Microsoft and you can then use Microsoft Azure and use a database adjacent in the same data center, working with integrated very nicely indeed. And I think Oracle has got to do that with AWS, it's got to do that with Google as well. It's got to provide a service for people to run where they want to run things not just on the Oracle cloud. If they did that, that would in my term, and my my opinion be a very strong move and would make make the capabilities available in many more places. >> Right. Awesome. Hey Marc, thanks so much for coming to theCUBE. Thank you, David, as well, and thanks to Gartner for doing all this great research and making it public on the web. You can... If you just search critical capabilities for cloud database management systems for operational use cases, that's a mouthful, and then do the same for analytical use cases, and the Magic Quadrant. There's the third doc for cloud database management systems. You'll get about two hours of reading and I learned a lot and I learned a lot here too. I appreciate the context guys. Thanks so much. >> My pleasure. All right, thank you for watching everybody. This is Dave Vellante for theCUBE. We'll see you next time. (upbeat music)

Published Date : Dec 18 2020

SUMMARY :

leaders all around the world. Marc Staimer is the founder, to really try to, you know, or what you have to manage. based on your reading of the Gartner work. So the very nuance, what you talked about, You're not going to do that, you I thought, you know, Aurora, you know, So I wonder if you and, you know, a mid-sized customer You mean 'cause of the maturity, right? Because of their focus you know... either of you or both of you. So, you know, people often say, But I wonder if you But no matter how you configure it, Guys if you could bring up the next slide and then you can share And by the way guys, you can And you pointed out yourself to have that, you know, So if you need more than one, I think maybe, you know, Why has it gone to the cloud? Moreso why would customers of Oracle... on premise or in the cloud. And as you say, the cost in getting a lot of their stuff over. and then, you know, the others. to the amount of revenue you in the revenue side. The things that are hybrid and on-prem that you don't own, but it's Well, you know, to your point got, you know, the other... you get double counts Those are negotiable by the way. hasn't made it easy to work and you can then use Microsoft Azure and the Magic Quadrant. We'll see you next time.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
David Floyer	PERSON	0.99+
Rick Greenwald	PERSON	0.99+
Dave	PERSON	0.99+
Marc Staimer	PERSON	0.99+
Marc	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adam Ronthal	PERSON	0.99+
Don Feinberg	PERSON	0.99+
Google	ORGANIZATION	0.99+
Larry	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
December 2020	DATE	0.99+
IBM	ORGANIZATION	0.99+
Henry Cook	PERSON	0.99+
Palo Alto	LOCATION	0.99+
two	QUANTITY	0.99+
five years	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Merv Adrian	PERSON	0.99+
100%	QUANTITY	0.99+
second point	QUANTITY	0.99+

3 3 Adminstering Analytics v4 TRT 20m 23s

>>Yeah. >>All right. Welcome back to our third session, which is all about administering analytics at Global Scale. We're gonna be discussing how you can implement security data compliance and governance across the globe at for large numbers of users to ensure thoughts. What is open for everyone across your organization? So coming right up is Cheryl Zang, who is a senior director of product management of Thought spot, and Kendrick. He threw the sports sports director of Systems Engineering. So, Cheryl and Kendrick, the floor is yours. >>Thank you, Tina, for the introduction. So let's talk about analytics scale on. Let's understand what that is. It's really three components. It's the access to not only data but its technology, and we start looking at the intersection of that is the value that you get as an organization. When you start thinking about analytics scale, a lot of times we think of analysts at scale and we look at the cloud as the A seven m for it, and that's a That's an accurate statement because people are moving towards the cloud for a variety of reasons. And if you think about what's been driving, it has been the applications like Salesforce, Forcados, Mongo, DB, among others. And it's actually part of where we're seeing our market go where 64% of the company's air planning to move their analytics to the cloud. And if you think of stock spotted specifically, we see that vast majority of our customers are already in the cloud with one of the Big Four Cloud Data warehouses, or they're evaluated. And what we found, though, is that even though companies are moving their analytics to the cloud, we have not solved. The problem of accessing the data is a matter of fact. Our customers. They're telling us that 10 to 25% of that data warehouse that they're leveraging, they've moved and I'm utilizing. And if you look at in General, Forrester says that 60 to 73% of data that you have is not being leveraged, and if we think about why you go through, you have this process of taking enterprise data, moving it into these cubes and aggregates and building these reports dashboards. And there's this bottleneck typically of that be I to and at the end of the day, the people that are getting that data on the right hand side or on Lee. Anywhere from 20 to 30% of the population when companies want to be data driven is 20 to 30% of the population. Really what you're looking for now it's something north of that. And if you think of Cloud data, warehouse is being the the process and you bring Cloud Data Warehouse and it's still within the same framework. You know? Why invest? Why invest and truly not fix the problem? And if you take that out and your leverage okay, you don't necessarily have the You could go directly against the warehouse, but you're still not solving the reports and dashboards. Why investing truly not scale? It's the three pillars. It's technology, it's data, and it's a accessibility. So if we look at analytics at scale, it truly is being able to get to that north of the 20 to 30% have that be I team become enablers, often organization. Have them be ableto work with the data in the Cloud Data warehouse and allow the cells marking finding supplies and then hr get direct access to that. Ask their own questions to be able to leverage that to be able to do that. You really have to look at your modern data architecture and figure out where you are in this maturity, and then they'll be able to build that out. So you look at this from the left to right and sources. It's ingestion transformation. It's the storage that the technology brains e. It's the data from a historical predictive perspective. And then it's the accessibility. So it's technology. It's data accessibility. And how do you build that? Well, if you look at for a thought to spot perspective, it truly is taking and driving and leveraging the cloud data warehouse architectures, interrogated, essay behind it. And then the accessibility is the search answers pen boards and embedded analytics. If you take that and extend it where you want to augment it, it's adding our partners from E T L R E L t. Perspective like al tricks talent Matile Ian Streaming data from data brings or if you wanna leverage your cloud, data warehouses of Data Lake and then leverage the Martin capability of your child data warehouse. The augmentation leveraging out through its data bricks and data robot. And that's where your data side of that pillar gets stronger, the technologies are enabling it. And then the accessibility from the output. This thought spot. Now, if you look at the hot spots, why and how do we make this technology accessible? What's the user experience we are? We allow an organization to go from 20 to 30% population, having access to data to what it means to be truly data driven by our users. That user experience is enabled by our ability to lead a person through the search process. There are search index and rankings. This is built for search for corporate data on top of the Cloud Data Warehouse. On top of the data that you need to be able to allow a person who doesn't understand analytics to get access to the data and the questions they need to answer, Arcuri Engine makes it simple for customers to take. Ask those questions and what you might think are not complex business questions. But they turn into complex queries in the back end that someone who typically needs to know that's that power user needs to know are very engine. Isolate that from an end user and allows them to ask that question and drive that query. And it's built on an architecture that allows us to change and adapt to the types of things. It's micro services architecture, that we've not only gone from a non grim system to our cloud offering, in a matter of of really true these 23 years. And it's amazing the reason why we can do that, do that and in a sense, future proof your investment. It's because of the way we've developed this. It's wild. First, it's Michael Services. It's able to drive. So what this architecture ER that we've talked about. We've seen different conversations of beyond its thought spot everywhere, which allows us to take that spot. One. Our ability to for search for search data for auto analyzed the Monitor with that govern security in the background and being able to leverage that not only internally but externally and then being able to take thought spot modeling language for that analysts and that person who just really good at creating and let them create these models that it could be deployed anywhere very, very quickly and then taking advantage off the Cloud Data warehouse or the technology that you have and really give you accessibility the technology that you need as well as the data that you need. That's what you need to be able to administer, uh, to take analytics at scale. So what I'm gonna do now is I'm gonna turn it over to Cheryl and she's gonna talk about administration in thought spot. Cheryl, >>thank you very much Can take. Today. I'm going to show you how you can administrator and manage South Spot for your organization >>covering >>streaming topics, the user management >>data management and >>also user adoption and performance monitoring. Let's jump into the demo. >>I think the Southport Application The Admin Council provides all the core functions needed for system level administration. Let's start with user management and authentication. With the user tab. You can add or delete a user, or you can modify the setting for an existing user. For example, user name, password email. Or you can add the user toe a different group with the group's tab. You can add or delete group, or you can manage the group setting. For example, Privileges associated with all the group members, for example, can administrate a soft spot can share data with all users or can manage data this can manage data privilege is very important. It grants a user the privileges to add data source added table and worksheet, manage data for different organizations or use cases without being an at me. There is also a field called Default Pin Board. You can select a set of PIN board that will be shown toe all of the users in that group on their homepage in terms off authentication. Currently, we support three different methods local active directory and samel By default. Local authentication is enabled and you can also choose to have several integration with an external identity provider. Currently, we support actor Ping Identity, Seaside Minor or a T. F. S. The third method is integration with active directory. You can configure integration with L DAP through active directory, allowing you to authenticate users against an elder up server. Once the users and groups are added to the system, we can share pin board wisdom or they can search to ask and answer their own questions. To create a searchable data, we first need to connect to our data warehouses with embraced. You can directly query the data as it exists in the data warehouse without having to move or transfer the data. In this page, you can add a connection to any off the six supported data warehouses. Today we will be focusing on the administrative aspect off the data management. So I will close the tap here and we will be using the connections that are already being set up. Under the Data Objects tab, we can see all of the tables from the connections. Sometimes there are a lot of tables, and it may be overwhelming for the administrator to manage the data as a best practice. We recommend using stickers toe organize your data sets here, we're going to select the Salesforce sticker. This will refined a list off tables coming from Salesforce only. This helps with data, lineage and the traceability because worksheets are curated data that's based on those tables. Let's take a look at this worksheet. Here we can see the joints between tables that created a schema. Once the data analyst created the table and worksheet, the data is searchable by end users. Let's go to search first, let's select the data source here. We can see all of the data that we have been granted access to see Let's choose the Salesforce sticker and we will see all of the tables and work ship that's available to us as a data source. Let's choose this worksheet as a data source. Now we're ready to search the search Insight can be saved either into a PIN board or an answer. Okay, it's important to know that the sticker actually persist with PIN board and answers. So when the user logging, they will be able to see all of the content that's available to them. Let's go to the Admin Council and check out the User Adoption Pin board. The User Adoption Pin board contains essential information about your soft spot users and their adoption off the platform. Here, you can see daily active user, weekly, active user and monthly active user. Count that in the last 30 days you can also see the total count off the pin board and answers that saved in the system. Here, you can see that unique count off users. Now. You can also find out the top 10 users in the last 30 days. The top 10 PIN board consumers and top 10 ad hoc searchers here, you can see that trending off weekly, active users, daily, active users and hourly active users over time. You can also get information about popular pin boards and user actions in the last one month. Now let's zoom in into this chart. With this chart, you can see weekly active users and how they're using soft spot. In this example, you can see 60% of the time people are doing at Hawk search. If you would like to see what people are searching, you can do a simple drill down on quarry tax. Here we can find out the most popular credit tax that's being used is number off the opportunities. At last, I would like to show you assistant performance Tracking PIN board that's available to the ad means this PIN board contains essential information about your soft spot. Instance performance You this pimple. To understand the query, Leighton see user traffic, how users are interacting with soft spot, most frequently loaded tables and so on. The last component toe scowling hundreds of users, is a great on boarding experience. A new feature we call Search Assist helps automate on boarding while ensuring new users have the foundation. They need to be successful on Day one, when new users logging for the first time, they're presented with personalized sample searches that are specific to their data set. In this example, someone in a sales organization would see questions like What were sales by product? Type in 2020. From there are guided step by step process helps introduce new users with search ensuring a successful on boarding experience. The search assist. The coach is a customized in product Walk through that uses your own data and your own business vocabulary to take your business users from unfamiliar to near fluent in minutes. Instead of showing the entire end user experience today, I will focus on the set up and administration side off the search assist. Search Assist is easy to set up at worksheet level with flexible options for multiple guided lessons. Using preview template, we help you create multiple learning path based on department or based on your business. Users needs to set up a learning path. You're simply feeling the template with relevant search examples while previewing what the end user will see and then increase the complexity with each additional question toe. Help your users progress >>in summary. It is easy to administrator user management, data management, management and the user adoption at scale Using soft spot Admin Council Back to you, Kendrick. >>Thank you, Cheryl. That was great. Appreciate the demo there. It's awesome. It's real life data, real life software. You know what? Enclosing here? I want to talk a little bit about what we've seen out in the marketplace and some of them when we're talking through prospects and customers, what they talk a little bit about. Well, I'm not quite area either. My data is not ready or I've got I don't have a file data warehouse. That's this process. In this thinking on, we have examples and three different examples. We have a company that actually had never I hadn't even thought about analytics at scale. We come in, we talked to them in less than a week. They're able to move their data thought spot and ask questions of the billion rose in less than a week now. We've also had customers that are early adoption. They're sticking their toes in the water around the technology, so they have a lot of data warehouse and they put some data at it, and with 11 minute within 11 minutes, we were able to search on a billion rows of their data. Now they're adding more data to combine to, to be able to work with. And then we have customers that are more mature in their process. Uh, they put large volumes of data within nine minutes. We're asking questions of their data, their business users air understanding. What's going on? A second question we get sometimes is my data is not clean. We'll talk Spot is very, very good at finding that type of data. If you take, you start moving and becomes an inner door process, and we can help with that again. Within a week, we could take data, get it into your system, start asking business questions of that and be ready to go. You know, I'm gonna turn it back to you and thank you for your time. >>Kendrick and Carol thank you for joining us today and bringing all of that amazing inside for our audience at home. Let's do a couple of stretches and then join us in a few minutes for our last session of the track. Insides for all about how Canadian Tire is delivering Korean making business outcomes would certainly not in a I. So you're there

Published Date : Dec 10 2020

SUMMARY :

We're gonna be discussing how you can implement security data compliance and governance across the globe Forrester says that 60 to 73% of data that you have is not I'm going to show you how you Let's jump into the demo. and it may be overwhelming for the administrator to manage the data as data management, management and the user adoption at scale Using soft spot Admin and thank you for your time. Kendrick and Carol thank you for joining us today and bringing all of that amazing inside for our audience at home.

ENTITIES

Entity	Category	Confidence
Cheryl	PERSON	0.99+
Tina	PERSON	0.99+
Kendrick	PERSON	0.99+
Cheryl Zang	PERSON	0.99+
10	QUANTITY	0.99+
60	QUANTITY	0.99+
20	QUANTITY	0.99+
60%	QUANTITY	0.99+
Forrester	ORGANIZATION	0.99+
third session	QUANTITY	0.99+
64%	QUANTITY	0.99+
11 minute	QUANTITY	0.99+
Today	DATE	0.99+
First	QUANTITY	0.99+
30%	QUANTITY	0.99+
nine minutes	QUANTITY	0.99+
third method	QUANTITY	0.99+
second question	QUANTITY	0.99+
Global Scale	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
South Spot	ORGANIZATION	0.99+
less than a week	QUANTITY	0.99+
23 years	QUANTITY	0.99+
2020	DATE	0.99+
Carol	PERSON	0.99+
Leighton	ORGANIZATION	0.98+
today	DATE	0.98+
Michael Services	ORGANIZATION	0.98+
25%	QUANTITY	0.97+
73%	QUANTITY	0.97+
hundreds of users	QUANTITY	0.97+
11 minutes	QUANTITY	0.97+
Matile Ian	PERSON	0.97+
first	QUANTITY	0.96+
three pillars	QUANTITY	0.96+
three components	QUANTITY	0.96+
one	QUANTITY	0.95+
three different methods	QUANTITY	0.95+
10 users	QUANTITY	0.95+
Day one	QUANTITY	0.95+
six supported data warehouses	QUANTITY	0.94+
Systems Engineering	ORGANIZATION	0.94+
Thought spot	ORGANIZATION	0.93+
Data Lake	ORGANIZATION	0.91+
Arcuri Engine	ORGANIZATION	0.9+
10 ad hoc searchers	QUANTITY	0.9+
Warehouse	TITLE	0.89+
billion rows	QUANTITY	0.88+
Cloud Data warehouse	TITLE	0.87+
billion	QUANTITY	0.86+
three different examples	QUANTITY	0.86+
last one month	DATE	0.86+
Salesforce	ORGANIZATION	0.86+
a week	QUANTITY	0.85+
Canadian	OTHER	0.84+
each additional question	QUANTITY	0.83+
v4	OTHER	0.83+
last 30 days	DATE	0.78+
Salesforce	TITLE	0.77+
last 30 days	DATE	0.77+
Korean	OTHER	0.75+
One	QUANTITY	0.74+
Search	TITLE	0.73+
Big Four	QUANTITY	0.73+
Martin	PERSON	0.72+
DB	TITLE	0.72+
10 PIN	QUANTITY	0.71+
Southport	TITLE	0.66+
Lee	PERSON	0.66+
Hawk	ORGANIZATION	0.66+
Adminstering Analytics	TITLE	0.66+
Mongo	TITLE	0.64+
Forcados	TITLE	0.64+
Seaside Minor	ORGANIZATION	0.62+
gress	ORGANIZATION	0.6+
Cloud	TITLE	0.57+
Ping	TITLE	0.53+
seven	QUANTITY	0.49+
User Adoption	ORGANIZATION	0.39+
20m	OTHER	0.36+
User	ORGANIZATION	0.35+
Adoption	COMMERCIAL_ITEM	0.35+

Unleash the Power of Your Cloud Data | Beyond.2020 Digital

>>Yeah, yeah. Welcome back to the third session in our building, A vibrant data ecosystem track. This session is unleash the power of your cloud data warehouse. So what comes after you've moved your data to the cloud in this session will explore White Enterprise Analytics is finally ready for the cloud, and we'll discuss how you can consume Enterprise Analytics in the very same way he would cloud services. We'll also explore where analytics meets cloud and see firsthand how thought spot is open for everyone. Let's get going. I'm happy to say we'll be hearing from two folks from thought spot today, Michael said Cassie, VP of strategic partnerships, and Vika Valentina, senior product marketing manager. And I'm very excited to welcome from our partner at AWS Gal Bar MIA, product engineering manager with Red Shift. We'll also be sharing a live demo of thought spot for BTC Marketing Analytics directly on Red Shift data. Gal, please kick us off. >>Thank you, Military. And thanks. The talks about team and everyone attending today for joining us. When we talk about data driven organizations, we hear that 85% of businesses want to be data driven. However, on Lee. 37% have been successful in We ask ourselves, Why is that and believe it or not, Ah, lot of customers tell us that they struggled with live in defining what being data driven it even means, and in particular aligning that definition between the business and the technology stakeholders. Let's talk a little bit. Let's look at our own definition. A data driven organization is an organization that harnesses data is an asset. The drive sustained innovation and create actionable insights. The super charge, the experience of their customers so they demand more. Let's focus on a few things here. One is data is an asset. Data is very much like a product needs to evolve sustained innovation. It's not just innovation innovation, it's sustained. We need to continuously innovate when it comes to data actionable insights. It's not just interesting insights these air actionable that the business can take and act upon, and obviously the actual experience we. Whether whether the customers are internal or external, we want them to request Mawr insights and as such, drive mawr innovation, and we call this the for the flywheel. We use the flywheel metaphor here where we created that data set. Okay, Our first product. Any focused on a specific use case? We build an initial NDP around that we provided with that with our customers, internal or external. They provide feedback, the request, more features. They want mawr insights that enables us to learn bringing more data and reach that actual data. And again we create MAWR insights. And as the flywheel spins faster, we improve on operational efficiencies, supporting greater data richness, and we reduce the cost of experimentation and legacy environments were never built for this kind of agility. In many cases, customers have struggled to keep momentum in their fleet, flywheel in particular around operational efficiency and experimentation. This is where Richie fits in and helps customer make the transition to a true data driven organization. Red Shift is the most widely used data warehouse with tens of thousands of customers. It allows you to analyze all your data. It is the only cloud data warehouse that sits, allows you to analyze data that sits in your data lake on Amazon, a street with no loading duplication or CTL required. It is also allows you to scale with the business with its hybrid architectures it also accelerates performance. It's a shared storage that provides the ability to scale toe unlimited concurrency. While the UN instant storage provides low late and say access to data it also provides three. Key asks that customers consistently tell us that matter the most when it comes to cost. One is usage based pricing Instead of license based pricing. Great value as you scale your data warehouse using, for example, reserved instances they can save up to 75% compared to on the mind demand prices. And as your data grows, infrequently accessed data can be stored. Cost effectively in S three encouraged through Amazon spectrum, and the third aspect is predictable. Month to month spend with no hitting charges and surprises. Unlike and unlike other cloud data warehouses, where you need premium versions for additional enterprise capabilities. Wretched spicing include building security compression and data transfer. >>Great Thanks. Scout um, eso. As you can see, everybody wins with the cloud data warehouses. Um, there's this evolution of movement of users and data and organizations to get value with these cloud data warehouses. And the key is the data has to be accessible by the users, and this data and the ability to make business decisions on the data. It ranges from users on the front line all the way up to the boardroom. So while we've seen this evolution to the Cloud Data Warehouse, as you can see from the statistic from Forrester, we're still struggling with how much of that data actually gets used for analytics. And so what is holding us back? One of the main reasons is old technology really trying to work with today's modern cloud data warehouses? They weren't built for it. So you run into issues of trying to do data replication, getting the data out of the cloud data warehouse. You can do analysis and then maintaining these middle layers of data so that you can access it quickly and get the answers you need. Another issue that's holding us back is this idea that you have to have your data in perfect shape with the perfect pipeline based on the exact dashboard unique. Um, this isn't true. Now, with Cloud data warehouse and the speed of important business data getting into those cloud data warehouses, you need a solution that allows you to access it right away without having everything to be perfect from the start, and I think this is a great opportunity for GAL and I have a little further discussion on what we're seeing in the marketplace. Um, one of the primary ones is like, What are the limiting factors, your Siegel of legacy technologies in the market when it comes to this cloud transformation we're talking about >>here? It's a great question, Michael and the variety of aspect when it comes to legacy, the other warehouses that are slowing down innovation for companies and businesses. I'll focus on 21 is performance right? We want faster insights. Companies want the ability to analyze MAWR data faster. And when it comes to on prem or legacy data warehouses, that's hard to achieve because the second aspect comes into display, which is the lack of flexibility, right. If you want to increase your capacity of your warehouse, you need to ensure request someone needs to go and bring an actual machine and install it and expand your data warehouse. When it comes to the cloud, it's literally a click of a button, which allows you to increase the capacity of your data warehouse and enable your internal and external users to perform analytics at scale and much faster. >>It falls right into the explanation you provided there, right as the speed of the data warehouses and the data gets faster and faster as it scales, older solutions aren't built toe leverage that, um, you know, they're either they're having to make technical, you know, technical cuts there, either looking at smaller amounts of data so that they can get to the data quicker. Um, or it's taking longer to get to the data when the data warehouse is ready, when it could just be live career to get the answers you need. And that's definitely an issue that we're seeing in the marketplace. I think the other one that you're looking at is things like governance, lineage, regulatory requirements. How is the cloud you know, making it easier? >>That's That's again an area where I think the cloud shines. Because AWS AWS scale allows significantly more investment in securing security policies and compliance, it allows customers. So, for example, Amazon redshift comes by default with suck 1 to 3 p. C. I. Aiso fared rampant HIPPA compliance, all of them out of the box and at our scale. We have the capacity to implement those by default for all of our customers and allow them to focus. Their very expensive, valuable ICTY resource is on actual applications that differentiate their business and transform the customer experience. >>That's a great point, gal. So we've talked about the, you know, limiting factors. Technology wise, we've mentioned things like governance. But what about the cultural aspect? Right? So what do you see? What do you see in team struggling in meeting? You know, their cloud data warehouse strategy today. >>And and that's true. One of the biggest challenges for large large organizations when they moved to the cloud is not about the technology. It's about people, process and culture, and we see differences between organizations that talk about moving to the cloud and ones that actually do it. And first of all, you wanna have senior leadership, drive and be aligned and committed to making the move to the cloud. But it's not just that you want. We see organizations sometimes Carol get paralyzed. If they can't figure out how to move each and every last work clothes, there's no need to boil the ocean, so we often work with organizations to find that iterative motion that relative process off identifying the use cases are date identifying workloads in migrating them one at a time and and through that allowed organization to grow its knowledge from a cloud perspective as well as adopt its tooling and learn about the new capabilities. >>And from an analytics perspective, we see the same right. You don't need a pixel perfect dashboard every single time to get value from your data. You don't need to wait until the data warehouse is perfect or the pipeline to the data warehouse is perfect. With today's technology, you should be able to look at the data in your cloud data warehouse immediately and get value from it. And that's the you know, that's that change that we're pushing and starting to see today. Thanks. God, that was That was really interesting. Um, you know, as we look through that, you know, this transformation we're seeing in analytics, um, isn't really that old? 20 years ago, data warehouses were primarily on Prem and the applications the B I tools used for analytics around them were on premise well, and so you saw things like applications like Salesforce. That live in the cloud. You start having to pull data from the cloud on Prem in order to do analytics with it. Um, you know, then we saw the shift about 10 years ago in the explosion of Cloud Data Warehouse Because of their scale, cost reduced, reduce shin reduction and speed. You know, we're seeing cloud data. Warehouses like Amazon Red Shift really take place, take hold of the marketplace and are the predominant ways of storing data moving forward. What we haven't seen is the B I tools catch up. And so when you have this new cloud data warehouse technology, you really need tools that were custom built for it to take advantage of it, to be able to query the cloud data warehouse directly and get results very quickly without having to worry about creating, you know, a middle layer of data or pipelines in order to manage it. And, you know, one company captures that really Well, um, chick fil A. I'm sure everybody has heard of is one of the largest food chains in America. And, you know, they made a huge investment in red shift and one of the purposes of that investment is they wanted to get access to the data mawr quickly, and they really wanted to give their business users, um, the ability to do some ad hoc analysis on the data that they were capturing. They found that with their older tools, the problems that they were finding was that all the data when they're trying to do this analysis was staying at the analyst level. So somebody needed to create a dashboard in order to share that data with a user. And if the user's requirements changed, the analysts were starting to become burdened with requests for changes and the time it took to reflect those changes. So they wanted to move to fought spot with embrace to connect to Red Shift so they could start giving business users that capability. Query the database right away. And with this, um, they were able to find, you know, very common things in in the supply chain analysis around the ability to figure out what store should get, what product that was selling better. The other part was they didn't have to wait for the data to get settled into some sort of repository or second level database. They were able to query it quickly. And then with that, they're able to make changes right in the red shift database that were then reflected to customers and the business users right away. So what they found from this is by adopting thought spot, they were actually able to arm business users with the ability to make decisions very quickly. And they cleared up the backlog that they were having and the delay with their analysts. And they're also putting their analysts toe work on different projects where they could get better value from. So when you look at the way we work with a cloud data warehouse, um, you have to think of thoughts about embrace as the tool that access that layer. The perfect analytic partner for the Cloud Data Warehouse. We will do the live query for the business user. You don't need to know how to script and sequel, um Thio access, you know, red shift. You can type the question that you want the answer to and thought spot will take care of that query. We will do the indexing so that the results come back faster for you and we will also do the analysis on. This is one of the things I wanted to cover, which is our spot i. Q. This is new for our ability to use this with embrace and our partners at Red Shift is now. We can give you the ability to do auto analysis to look at things like leading indicators, trends and anomalies. So to put this in perspective amount imagine somebody was doing forecasting for you know Q three in the western region. And they looked at how their stores were doing. And they saw that, you know, one store was performing well, Spot like, you might be able to look at that analysis and see if there's a leading product that is underperforming based on perhaps the last few quarters of data. And bring that up to the business user for analysis right away. They don't need to have to figure that out. And, um, you know, slice and dice to find that issue on their own. And then finally, all the work you do in data management and governance in your cloud data warehouse gets reflected in the results in embrace right away. So I've done a lot of talking about embrace, and I could do more, but I think it would be far better toe. Have Vika actually show you how the product works, Vika. >>Thanks, Michael. We learned a lot today about the power of leveraging your red shift data and thought spot. But now let me show you how it works. The coronavirus pandemic has presented extraordinary challenges for many businesses, and some industries have fared better than others. One industry that seems to weather the storm pretty well actually is streaming media. So companies like Netflix and who Lou. And in this demo, we're going to be looking at data from B to C marketing efforts. First streaming media company in 2020 lately, we've been running campaigns for comedy, drama, kids and family and reality content. Each of our campaigns last four weeks, and they're staggered on a weekly basis. Therefore, we always have four campaigns running, and we can focus on one campaign launch per >>week, >>and today we'll be digging into how our campaigns are performing. We'll be looking at things like impressions, conversions and users demographic data. So let's go ahead and look at that data. We'll see what we can learn from what's happened this year so far, and how we can apply those learnings to future decision making. As you can already see on the thoughts about homepage, I've created a few pin boards that I use for reporting purposes. The homepage also includes what others on my team and I have been looking at most recently. Now, before we dive into a search, will first take a look at how to make a direct connection to the customer database and red shift to save time. I've already pre built the connection Red Shift, but I'll show you how easy it is to make that connection in just three steps. So first we give the connection name and we select our connection type and was on red Shift. Then we enter our red shift credentials, and finally, we select the tables that we want to use Great now ready to start searching. So let's start in this data to get a better idea of how our marketing efforts have been affected either positively or negatively by this really challenging situation. When we think of ad based online marketing campaigns, we think of impressions, clicks and conversions. Let's >>look at those >>on a daily basis for our purposes. So all this data is available to us in Thought spot, and we can easily you search to create a nice line chart like this that shows US trends over the last few months and based on experience. We understand that we're going to have more clicks than impressions and more impressions and conversions. If we started the chart for a minute, we could see that while impressions appear to be pretty steady over the course of the year, clicks and especially conversions both get a nice boost in mid to late March, right around the time that pandemic related policies were being implemented. So right off the bat, we found something interesting, and we can come back to this now. There are few metrics that we're gonna focus on as we analyze our marketing data. Our overall goal is obviously to drive conversions, meaning that we bring new users into our streaming service. And in order to get a visitor to sign up in the first place, we need them to get into our sign up page. A compelling campaign is going to generate clicks, so if someone is interested in our ad, they're more likely to click on it, so we'll search for Click through Rape 5% and we'll look this up by campaign name. Now even compare all the campaigns that we've launched this year to see which have been most effective and bring visitors star site. And I mentioned earlier that we have four different types of campaign content, each one aligned with one of our most popular genres. So by adding campaign content, yeah, >>and I >>just want to see the top 10. I could limit my church. Just these top 10 campaigns automatically sorted by click through rate and assigned a color for each category so we could see right away that comedy and drama each of three of the top 10 campaigns by click through rate reality is, too, including the top spot and kids and family makes one appearance as well. Without spot. We know that any non technical user can ask a question and get an answer. They can explore the answer and ask another question. When you get an answer that you want to share, keep an eye on moving forward, you pin the answer to pin board. So the BBC Marketing Campaign Statistics PIN board gives us a solid overview of our campaign related activities and metrics throughout 2020. The visuals here keep us up to date on click through rate and cost per click, but also another really important metrics that conversions or cost proposition. Now it's important to our business that we evaluate the effectiveness of our spending. Let's do another search. We're going to look at how many new customers were getting so conversions and the price cost per acquisition that we're spending to get each of these by the campaign contact category. So >>this is a >>really telling chart. We can basically see how much each new users costing us, based on the content that they see prior to signing up to the service. Drama and reality users are actually relatively expensive compared to those who joined based on comedy and kids and family content that they saw. And if all the genres kids and family is actually giving us the best bang for our marketing >>buck. >>And that's good news because the genres providing the best value are also providing the most customers. We mentioned earlier that we actually saw a sizable uptick in conversions as stay at home policies were implemented across much of the country. So we're gonna remove cost per acquisition, and we're gonna take a daily look how our campaign content has trended over the years so far. Eso By doing this now, we can see a comparison of the different genres daily. Some campaigns have been more successful than others. Obviously, for example, kids and family contact has always fared pretty well Azaz comedy. But as we moved into the stay at home area of the line chart, we really saw these two genres begin to separate from the rest. And even here in June, as some states started to reopen, we're seeing that they're still trending up, and we're also seeing reality start to catch up around that time. And while the first pin board that we looked at included all sorts of campaign metrics, this is another PIN board that we've created so solely to focus on conversions. So not only can we see which campaigns drug significant conversions, we could also dig into the demographics of new users, like which campaigns and what content brought users from different parts of the country or from different age groups. And all this is just a quick search away without spot search directly on a red shift. Data Mhm. All right, Thank you. And back to you, Michael. >>Great. Thanks, Vika. That was excellent. Um, so as you can see, you can very quickly go from zero to search with thought Spot, um, connected to any cloud data warehouse. And I think it's important to understand that we mentioned it before. Not everything has to be perfect. In your doubt, in your cloud data warehouse, um, you can use thought spot as your initial for your initial tool. It's for investigatory purposes, A Z you can see here with star, Gento, imax and anthem. And a lot of these cases we were looking at billions of rows of data within minutes. And as you as your data warehouse maturity grows, you can start to add more and more thoughts about users to leverage the data and get better analysis from it. So we hope that you've enjoyed what you see today and take the step to either do one of two things. We have a free trial of thoughts about cloud. If you go to the website that you see below and register, we can get you access the thought spots so you can start searching today. Another option, by contacting our team, is to do a zero to search workshop where 90 minutes will work with you to connect your data source and start to build some insights and exactly what you're trying to find for your business. Um thanks, everybody. I would especially like to thank golf from AWS for joining us on this today. We appreciate your participation, and I hope everybody enjoyed what they saw. I think we have a few questions now. >>Thank you, Vika, Gal and Michael. It's always exciting to see a live demo. I know that I'm one of those comedy numbers. We have just a few minutes left, but I would love to ask a couple of last questions Before we go. Michael will give you the first question. Do I need to have all of my data cleaned and ready in my cloud data warehouse before I begin with thought spot? >>That's a great question, Mallory. No, you don't. You can really start using thought spot for search right away and start getting analysis and start understanding the data through the automatic search analysis and the way that we query the data and we've seen customers do that. Chick fil a example that we talked about earlier is where they were able to use thoughts bought to notice an anomaly in the Cloud Data Warehouse linking between product and store. They were able to fix that very quickly. Then that gets reflected across all of the users because our product queries the Cloud Data Warehouse directly so you can get started right away without it having to be perfect. And >>that's awesome. And gal will leave a fun one for you. What can we look forward to from Amazon Red Shift next year? >>That's a great question. And you know, the team has been innovating extremely fast. We released more than 200 features in the last year and a half, and we continue innovating. Um, one thing that stands out is aqua, which is a innovative new technology. Um, in fact, lovely stands for Advanced Square Accelerator, and it allows customers to achieve performance that up to 10 times faster, uh, than what they've seen really outstanding and and the way we've achieved that is through a shift in paradigm in the actual technological implementation section. Uh, aqua is a new distributed and hardware accelerated processing layer, which effectively allows us to push down operations analytics operations like compression, encryption, filtering and aggregations to the storage there layer and allow the aqua nodes that are built with custom. AWS designed analytics processors to perform these operations faster than traditional soup use. And we no longer need to bring, you know, scan the data and bring it all the way to the computational notes were able to apply these these predicates filtering and encourage encryption and compression and aggregations at the storage level. And likewise is going to be available for every are a three, um, customer out of the box with no changes to come. So I apologize for being getting out a little bit, but this is really exciting. >>No, that's why we invited you. Call. Thank you on. Thank you. Also to Michael and Vika. That was excellent. We really appreciate it. For all of you tuning in at home. The final session of this track is coming up shortly. You aren't gonna want to miss it. We're gonna end strong, come back and hear directly from our customer a T mobile on how T Mobile is building a data driven organization with thought spot in which >>pro, It's >>up next, see you then.

Published Date : Dec 10 2020

SUMMARY :

is finally ready for the cloud, and we'll discuss how you can that provides the ability to scale toe unlimited concurrency. to the Cloud Data Warehouse, as you can see from the statistic from Forrester, which allows you to increase the capacity of your data warehouse and enable your they're either they're having to make technical, you know, technical cuts there, We have the capacity So what do you see? And first of all, you wanna have senior leadership, drive and And that's the you know, that's that change that And in this demo, we're going to be looking at data from B to C marketing efforts. I've already pre built the connection Red Shift, but I'll show you how easy it is to make that connection in just three all this data is available to us in Thought spot, and we can easily you search to create a nice line chart like this that Now it's important to our business that we evaluate the effectiveness of our spending. And if all the genres kids and family is actually giving us the best bang for our marketing And that's good news because the genres providing the best value are also providing the most customers. And as you as your Do I need to have all of my data cleaned the Cloud Data Warehouse directly so you can get started right away without it having to be perfect. forward to from Amazon Red Shift next year? And you know, the team has been innovating extremely fast. For all of you tuning in at home.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Cassie	PERSON	0.99+
Vika	PERSON	0.99+
Vika Valentina	PERSON	0.99+
America	LOCATION	0.99+
90 minutes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
June	DATE	0.99+
2020	DATE	0.99+
T Mobile	ORGANIZATION	0.99+
two folks	QUANTITY	0.99+
first question	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
first product	QUANTITY	0.99+
First	QUANTITY	0.99+
next year	DATE	0.99+
Amazon	ORGANIZATION	0.99+
85%	QUANTITY	0.99+
third session	QUANTITY	0.99+
Gal	PERSON	0.99+
second aspect	QUANTITY	0.99+
third aspect	QUANTITY	0.99+
more than 200 features	QUANTITY	0.99+
One	QUANTITY	0.99+
one campaign	QUANTITY	0.99+
today	DATE	0.99+
Each	QUANTITY	0.99+
T mobile	ORGANIZATION	0.99+
Carol	PERSON	0.99+
each category	QUANTITY	0.98+
one	QUANTITY	0.98+
37%	QUANTITY	0.98+
first	QUANTITY	0.98+
two genres	QUANTITY	0.98+
three steps	QUANTITY	0.98+
Red Shift	ORGANIZATION	0.98+
20 years ago	DATE	0.98+
one store	QUANTITY	0.98+
three	QUANTITY	0.97+
tens of thousands of customers	QUANTITY	0.97+
MIA	PERSON	0.97+
21	QUANTITY	0.97+
US	LOCATION	0.97+
One industry	QUANTITY	0.97+
each one	QUANTITY	0.97+
Mallory	PERSON	0.97+
each	QUANTITY	0.97+
Vika	ORGANIZATION	0.97+
this year	DATE	0.97+
up to 75%	QUANTITY	0.97+
mid	DATE	0.97+
Lee	PERSON	0.96+
up to 10 times	QUANTITY	0.95+
S three	TITLE	0.95+
first pin board	QUANTITY	0.93+
both	QUANTITY	0.93+
two things	QUANTITY	0.93+
four campaigns	QUANTITY	0.93+
top 10	QUANTITY	0.92+
one thing	QUANTITY	0.92+
late March	DATE	0.91+
Cloud Data Warehouse	ORGANIZATION	0.91+

Ajeet Singh, ThoughtSpot | CUBE Conversation, November 2020

>> Narrator: From theCUBE studios in Palo Alto, in Boston, connecting with thought leaders all around the world. This is theCUBE conversation. >> Everyone welcome to this special CUBE conversation. I'm John Furrier, host of theCUBE here in our Palo Alto studios. During this time of the pandemic, we're doing a lot of remote interviews, supporting a lot of events. theCUBE virtual is our new brand because there's no events to go to, but we certainly want to talk to the best people and get the most important stories. And today I have a great segment with a world-class entrepreneur, Ajeet Singh co-founder and executive chairman of ThoughtSpot. And they've got an event coming up, which is going to be coming up in December 9th and 10th. But this interview is really about what it takes to be a world-class leader and what it takes to see the future and be a visionary, but then execute an opportunity because this is the time that we're in right now is there's a lot of change, data, technology, a sea change is happening and it's upon us and leadership around technology and how to capture opportunities is really what we need right now. And so Ajeet I want to thank you for coming on to theCUBE conversation. >> Thanks for having me, John. Pleasure to be here. >> For the folks watching, the startup that you've been doing for many, many years now, ThoughtSpot you're the co-founder executive chairman, but you also were involved in Nutanix as the co-founder of that company as well. You know, a little about unicorns and creating value and doing things early, but you're a visionary and you're a technologist and a leader. I want to go in and explore that because now more than ever, the role of data, the role of the truth is super important. And as the co-founder, your company is well positioned to do that. I mean, your tagline today on the website says insight is the speed of thought, but going back to the beginning, probably wasn't the tagline. It was probably maybe like we got to leverage data, take us through the vision initially when you founded the company in 2012. What was the thinking? What was on your mind? Take us through the journey. >> Yeah. So as an entrepreneur, I think visionary is a very big term. I don't know if I qualify for that or not, but what I'm really passionate about is identifying very large markets, with very, very big problems. And then going to the white board and from scratch, building a solution that is perfectly designed for the big problem that the market might be facing from scratch. And just an absolute honest way of approaching the problem and finding the best possible solution. So when we were starting ThoughtSpot, the market that we identified was analytics, analytics software. And the big problem that we saw was that while on one hand, companies were building very big data lakes, data warehouses, there was a lot of money being spent in capturing and storing data how that data was consumed by the end-users, the non-technical people, the sales, marketing, HR people, the doctors, the nurses, that process was not changing. That process was still stuck in old times where you have to ask an analyst to go and build a dashboard for you. And at the same time, we saw that in the consumer space, when anyone had a question they wanted to learn about something, they would just go to Google and ask that question. So we said, why can't analytics be as easy as Google? If I have a question, why do I have to wait for three weeks for some data experts to bring some insights to me for most simple questions, if I'm doing some very deep analysis, trying to come up with fraud algorithms, it's understood, you know, you need data expert. But if I'm just trying to understand how my business is doing, how my customers are doing, I shouldn't have to wait. And so that's how we identified the market and the problem. And then we build a solution that is designed for that non-technical user with a very design thinking UX first approach to make it super easy for anyone to ask that question. So that was the Genesis of the company. >> You know, I just love the thinking because you're solving a problem with a clean sheet piece of paper, you're looking at what can be done. And it's just, you can bring up Google because you know, you think about Google's motto was find what you're looking for. And they had a little gimmicky buttons, like I'm feeling lucky, which just took you to a random webpage at that time while everyone else was tryna build these walled gardens and this structural apparatus, Google wanted you in and out with your results fast. And that mindset just never came over to the enterprise and with all that legacy structure and all the baggage associated with it. So I totally loved the vision, but I got to ask you, how did you get to beachhead? How did you get that first success milestone? When did you see results in your thinking? >> Yeah, so I mean, I believe that once you've identified a big market and a big problem, it comes down to the people. So I sort of went on a recruit recruiting mission and I recruited perhaps the best technology and business team that you can find in any enterprise segment, not only just analytics, some of the early engineers, my co-founder, he was at Google before that, Amit Prakash, before that he was at Microsoft working on Bing. So it took a lot of very deliberate effort to find the right kind of people who have a builder's mentality and are also deep experts in areas like search large-scale distributed systems. Very passionate about user experience. And then you start building the product, you know, it took us almost, I would say one and a half three years to get the initial working version of the product. And we were lucky enough to engage with some of the largest companies in the world, such as Walmart who are very interested in our solution because they were facing these kinds of problems. And we almost co-developed this technology with our early customers, focusing on ease of use, scale, security, governance, all of that, because it's one thing to have a concept where you want to make access to data as easy as Google, you have a certain interface people can type and get an answer. But when you are talking about enterprise data and enterprise needs, they are nowhere similar to what you have in consumer space. Consumer space is free for all, all the information is there you can crawl it and then you can access it. In enterprise, for you to take this idea of search, but make it production grid, make it real and not just a concept card. You need to invest a lot in building deep technology and then enabling security and scalability and all of that. So it took us almost , I would say a two and a half to three years to get to the initial version of the product and the problem we are solving and the area of technology search that we are working on. We brought it to the market. It's almost an infinite game. You know, you can keep making things easier and easier. And we've seen how Google has continued to evolve their search over time And it is still evolving. We just feel so lucky to be in this market, taking the direction that we have taken. >> Yeah. It's easy to talk a big game in this area because like you said, it's a hard technical problem because it'll structural data, whether it's schema databases or whatever, legacy baggage, but to make it easy, hard. And I like what you guys go with this, find the right information and put it in the right place, the right time. It's a really hard problem. And the beautiful thing is you guys are building a category while there's spend in the market that needs the problem today. So category creation with an existing market that needs it. So I got to ask you, if you could do me a favor and define for the audience, what is search-driven analytics? What does that mean from your standpoint? >> Yeah, what it means is for the end user, it looks like search but under the hood is driving large scale analytics. I like to say that our product looks like a search engine on the surface, but under the hood, it's a massive number crunching machine. So Search and AI driven analytics. There's two goals there. One, if the user has, any user and we're talking about non-technical users here, we're not talking about necessarily data experts, but if a user has a question, they should be able to get an answer instantly. They shouldn't have to wait. That is what we achieve with Search and with Spot IQ, our AI engine, we help surface insights where people may not even know that those are the questions they should be asking because data has become so complex. People often don't even know what question they should be asking. And we give them a pool that's very easy to use, but it helps surface insights to them. So there is both a pool model that we enabled through Search and a push model that we enable through Spot IQ. >> So I have to ask you that you guys are pioneering this segment you're in first. And sometimes when you're first, you have arrows in your back as you know, it's not all the beginners survive, they get competition copies, but you guys have had a lead. You had success. What's different today as you have competition coming in trying to say, "Oh, we got Search too." So what's different today with ThoughtSpot? How are you guys differentiated? >> Yeah. I mean, that's always a sign of success. If what you are trying to do, if others are saying we have it too, you have done something that is valuable. And that happens in all industry. I think the best example is Tesla. They were the first to look at this very well-known problem. I mean, we haven't had a very sort of unique take on the existence of the problem itself. Everybody knows that there is a problem with access to data, but the technology that we have built is so deep that it's very, very hard to really copy it and make it work in real world with Tesla in automotive industry in cars, there is obviously so many other companies that have launched battery powered cars, electric cars, but there is Tesla and there is all the other electric cars which are a bit of an afterthought, because if you want to build an analytics product, where Search is at the core, Search cannot be added on the top, Search has to be the core, and then you build around it. And that requires you to build a fundamental architecture from the ground up. And you can't take an existing BI product that is built for dash boarding and add a search bar. I have always said that adding a search bar in a UI is perhaps, you know, 10 to 20 lines of JavaScript code. Anyone can add it and there is so much open source stuff out there that you can just take it and plug it. And many people have tried to do that, but taking off the shelf, Search technology that is built for unstructured data and sticking it on to a product that is required to do analytics on enterprise data, that doesn't work. We built a search technology that understands enterprise data at a very deep level, so that when our customers take our product and bring it into their environment, they don't have to fundamentally change how they manage their data. Our goal is to add value to their existing enterprise data Cloud Data Warehouses and deliver this amazing Search experience where our Search engine is enable to understand what's in their data Lake, what's in their Cloud Data Warehouse. What are the schema, the tables, the joints, the cardinality, the data archive, the security requirements, all of things have to be understood by the technology for you to deliver the experience. So now that said, we pride ourselves in not resting on our laurels. You know, we have this sort of motto in the company. We say we are only 2% done. So we are on our own sort of a continuous journey of innovation. And we have been working on taking our Search technology to the next level. And that is something really powerful that we are going to unveil at our upcoming conference, Beyond, in December. And that is one to create even more distance between us and the competition. And it's all driven by what we have seen with our customers, how they're using our product or learnings what they like, what they don't like, where we see gaps and where we see opportunity to make it even easier to deliver value to our customers and our users. >> I think that's a really profound insight you just shared, because if you look at what you just said around thinking about Search as an embedded architectural foundational, you know, embedded in the architecture, that's different than bolting on a feature where you said Java code or some open source library. You know, we see in the security market, people bolted on security had huge problems. Now, all you hear is, "Oh, you got a big security in from the beginning." You actually have baked Search into everything from the beginning. And it's not just a utility, it's a mindset. And it's also a technology metadata data about data software, and all kinds of tech is involved. Am I getting that right? I mean, cause I think this is what I heard you say. It's like, you got to have the data. >> This is totally right. I mean, if I can use an analogy, there is Google search and obviously Yahoo also tried to bring their own search Yahoo search Yahoo actually, Yahoo versus Google is a perfect example or a perfect analogy to compare with ThoughtSpot versus other BI product Yahoo was built for predefined content consumption. You know, you had a homepage, somebody defined it. You could make some customizations. And there is predefined content you can consume it. Now, they also did add search, but that didn't really go so far. While Google said, we will vary from scratch ability to crawl all the data, ability to index all the data and then build a serving infrastructure that deliver this amazing performance and interactivity and relevance for the user. Relevance is where Google already shined. And you can't do those things until you think about the architecture from the ground up. >> Ajeet I'm looking forward to having more deep dive conversations on that one topic. But for the folks who might not be old enough, like me to remember Google back at that time, Yahoo was the best search engine and it was directory basically with a keyword search. It was trivial, technically speaking, but they got big. And then the portal wars came out, we got to have a portal. Google was very much not looked down as an innovator, but they had great technical chops and they just stayed the course. They had a mission to provide the best search engine to help users find what they're looking for. And they never wavered. And it was not fashionable about that time to your point. And then Yahoo was number one, then Google just became Google and the rest is history. So I really think that's super notable because companies face the same problem. What looks like fashionable tech today might not be the right one. I think that's... >> Yeah, and I totally agree. And I think a lot of times in our space, there's a lot of sort of hype around AI and machine learning. We as a company have tried to stay close to our customers and users and build things that will work for them. And a lot of stuff that we are doing, it has never been done before. So it's not to say that along the way, we don't have our own failures. We do have failures and we learn from them. >> Yeah. Yeah. Just don't make the same mistake twice. >> Yeah, I think if you have a process of learning quickly, improving quickly, those are the companies that will have a competitive advantage. In today's world, nobody gets it right the first time. If you're trying to do something fundamentally different, if you're copying somebody else, then you're too late already. >> I totally agree. >> If you do something new, it's about how fast you penetrate And that's... >> That's a great mindset. That's a great mindset. And I think that's worth capturing calling out, but I got to ask you because what's first of all, distinguished history and I love your mindset and just solving problems, big problems. All great. I want to ask you something about the industry and where you guys were in 2012 alright when you started the company, you were literally in what I call the before Cloud phase. Cause it was before Cloud companies and then during Cloud companies and then after Cloud, you know, Amazon clearly took advantage of that for a lot of startups. So right around 2012 through 2016, I'd call that the Amazon is growing up years. How did the Cloud impact your thinking around the product and how you guys were executing because you were right on that wave. You were probably in the sweet spot of your development. >> Yeah. >> Pre business planning. You were in the pre-business planning mode, incomes, Amazon. I'm sure you're probably using Amazon cause your starters and all start up sort of use Amazon at first, but I just think about, do we all have found premise with a data center? How did that impact you guys? And how does that change today? >> Certainly. Yeah it's been fascinating to see how the world is evolving how enterprises have also really evolved in depth, thinking on how they leverage the cloud infrastructure now. In the Cloud, there is the compute and storage infrastructure. And then you have a Cloud Data Warehouse, the analytics stack in the Cloud. That's becoming more popular now with a company like Google, having BigQuery and then Snowflake really amazing concepts and things like that. So when we started, we looked at where our customers are , where is their data. And what kind of infrastructure is available to us at the time there wasn't enough compute to drive the search engine that we wanted to build. There were also not any significant Cloud Data Warehousing at the time, but our engineering team our co-founders, they came from companies like Google, where building a Cloud based architecture and elastic architecture, service oriented architecture is in their DNA. So we architected the product to run on infrastructure that is very elastic that can be run practically anywhere. But our initial customers and applies the Global 2000. They had their data on-prem. So we had started more with on-prem as a go-to-market strategy. and then about four and a half years ago, once cloud infrastructure I'm talking about the compute infrastructure started to become more mature, we certified our software, to run on all three clouds So today we have more than 75 to 80% of our customers already running our software in the Cloud. And as now, because we connect to our primary data sources, our Cloud Data Warehouses, Cloud Data Lakes. Now with Snowflake and BigQuery and Synapse and Redshift, we have enough of our customers who have deployed Cloud Data Warehouses. So we are also able to directly integrate with them. And that's why we launched our own hosted SaaS Offering about a month ago. So I would say our journey in this area has been sort of similar to companies like Splunk or Elastic, which started with a software model initially deployed more on-prem, but then evolved with the customers to the Cloud. So we have a lot of focus and momentum and lot of our customers, as they're moving their data to the Cloud, they're asking us as well to be in the Cloud and provide a hosted offering. And that is what we have built for the last one year. And we launched it a month ago. >> It's nice to be on the right side of history. I got to say, when you're on the way to be there. And that also makes integrations easy too. I love the Cloud play. Let's get to the final segment here. I want to get your thoughts on your customers, your advice. There's a huge untapped opportunity for companies when it comes to data, a lot of them are realizing that the pandemic is highlighting a lot of areas where they have to go faster and then to go to Cloud, they're going to build modern apps more data's coming in than ever before. Where are these untapped opportunities for customers to take advantage of the data? And what's your opinion on where they should look and what they should do? >> Yeah, I really think that the pandemics has shown for the first, the value of data to society at large, there is probably more than a billion people in the world that have seen a chart for the first time in their life. Everybody is being... and COVID has done some magic. But everybody was looking at charts of infection and so on and so forth. So there is a lot more broad awareness of what data can do in improving our society at large for the businesses of course, in the last six, seven months, you heard it enough from lot of leaders that digital transformation is accelerating. Everybody is realizing that the way to interact in the world is becoming more and more digital expecting your customers to come to your branch to do banking is not really an option. And people are also seeing how all the SaaS companies and SaaS businesses, digital businesses, they have really taken off. So if a company like Zoom can suddenly have a a hundred, $150 billion valuation, because you are able to do everything remote, all the enterprises are looking to really touch their customers and partners in a lot more digital way than they could do before. And definitely COVID has also really created this almost, you know, pool buckets of organization. There is lot of companies that have tremendously benefited from it. And there a lot of companies that have been poorly affected, really in a difficult place. And I think both of them for the first category, they are looking at how do I maintain this revenue even after COVID, because one of this thing, you know, hopefully early next year we have a vaccine and things can start to look better again sometime next year. But we have learned so much. We have attracted so many new customers, how do we retain and grow them further? And that means I need to invest more and more in my technology. Now, companies that are not doing well, they really want to figure out how to become more operationally efficient. And they are really under pressure to get more value from there and both categories, improving your revenue, retaining customers. You need to understand the customer behavior. You need to understand which products they are buying at a fine grain level, not with the law of averages, not by looking at a dashboard and saying our average customer likes this kind of product. That one doesn't really work. You have to offer people personalized services and that personalization is just not possible at scale, without really using data on the front lines. You can't have just manager sitting in their office, looking at dashboards and charts and saying these are the kinds of campaigns I need to run because my average customer seems to like these kinds of offers. I need to really empower my sales people, my individual frontline workers, who are interfacing with the customer to be able to make customized offers of services and products to them. And that is possible on the data. So we see a really, a lot more focus in getting value from data, delivering value quickly and digital transformation broadly but definitely leveraging data in businesses. There is tremendous acceleration that is happening and, you know, next five years, it's all going to be about being able to monetize data on the front lines when you are interfacing with your customers and partners >> Ajeet, that's great insight. And I really appreciate what you're saying. And you know, I wrote a blog post in 2007. I said, data will be the new development kit. Back then we used to call development kits, software user development. >> John, you are the real visionary. It took me until 2012 to be able to do this. >> Well, it wasn't clear, but you saw other data was going to have to be programmed be part of the programming. And I think, what you're getting at here is so profound because we're living 2020 people can see the value of data at the right time. It changes the conversations, it changes what's going on in the real time communications of our world with real-time access to information, whether that's machine to machine or machine to human, having data in the right place, changes the context. >> Yap. >> And that is a true, not a tech thing, that's just life, right? I think this year, I think we're going to look back and say, this was the year that everyone realized that real time communications, real-time society needs real time data. And I think it's going to be more important than ever. So it's a really big problem and important one. And thank you for sharing that. >> Yeah. And actually you bring up a very good point programming, developing big data. Data as a development kit. We are also going to announce a new product at Beyond, which will be about bringing ThoughtSpot everywhere, where a lot of business users are in their business applications. And by using ThoughtSpot product, using our full experience, they can obviously do enterprise wide analytics and look at all the data. But if they're looking for insights and nuggets, and they want to ask questions in their business workflows. We are also launching a product capability that will allow software developers to inject data in their business applications and enable and empower their own business users to be able to ask any questions that they might have without having to go to yet another BI product. >> It's data as code. I mean, you almost think about like software metaphors, where's the compiler? Where's the source code? Where's the data code? You start to get into this new mindset of thinking about data as code, because you got to have data about the data. Is it clean data, dirty data? Is it real time? Is it useful? There's a lot of intelligence needed to manage this. This is like a pretty big deal. And it's fairly new in the sense in the science side. Yeah, machine learning has been around for a while and you know, there's tracks for that. But thinking of this way as an operating system mindset, it's not just being a data geek. You know what I'm saying? So I think you're on the right track Ajeet. I really appreciate your thoughts here. Thank you. >> Thank you John. >> Okay. This is a cube conversation. Unpacking the data. The data is the future. We're living in a real-time world and in real-time data can change the outcomes of all kinds of contexts. And with truth, you need data and Ajeet Singh co-founder executive chairman of ThoughtSpot shares his thoughts here in theCUBE. I'm John furrier. Thanks for watching. (soft upbeat music)

Published Date : Nov 23 2020

SUMMARY :

leaders all around the world. and get the most important stories. Pleasure to be here. And as the co-founder, And at the same time, we saw and all the baggage associated with it. and the problem we are solving And the beautiful thing is you and a push model that we So I have to ask you And that is one to is what I heard you say. and relevance for the user. about that time to your point. And a lot of stuff that we are doing, Just don't make the same mistake twice. gets it right the first time. about how fast you penetrate but I got to ask you How did that impact you guys? and applies the Global 2000. and then to go to Cloud, And that is possible on the data. And you know, I wrote a blog post in 2007. to be able to do this. data in the right place, And I think it's going to and look at all the data. And it's fairly new in the And with truth, you need data

ENTITIES

Entity	Category	Confidence
2012	DATE	0.99+
Walmart	ORGANIZATION	0.99+
2007	DATE	0.99+
John Furrier	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
10	QUANTITY	0.99+
November 2020	DATE	0.99+
Google	ORGANIZATION	0.99+
December	DATE	0.99+
Amit Prakash	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
December 9th	DATE	0.99+
two goals	QUANTITY	0.99+
2016	DATE	0.99+
Java	TITLE	0.99+
both categories	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
first time	QUANTITY	0.99+
three years	QUANTITY	0.99+
next year	DATE	0.99+
first category	QUANTITY	0.99+
10th	DATE	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.99+
Ajeet Singh	PERSON	0.99+
One	QUANTITY	0.99+
Boston	LOCATION	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
ThoughtSpot	ORGANIZATION	0.99+
early next year	DATE	0.99+
a month ago	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
20 lines	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
more than a billion people	QUANTITY	0.98+
one and a half three years	QUANTITY	0.98+
one thing	QUANTITY	0.98+
Bing	ORGANIZATION	0.97+
Zoom	ORGANIZATION	0.97+
pandemics	EVENT	0.97+
JavaScript	TITLE	0.97+
COVID	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Cloud Data Warehouse	TITLE	0.97+
CUBE	ORGANIZATION	0.97+
2%	QUANTITY	0.96+

Frank Slootman Dave Vellante Cube Conversation

>>from the Cube Studios in Palo Alto in Boston, connecting with thought leaders all around >>the world. This is a cute conversation high, but this is Day Volonte. And as you know, we've been tracking the next generation of clouds. Sometimes we call it Cloud to two point. Frank's Lukman is here to really unpack this with me. Frank. Great to see you. Thanks for coming on. >>Yeah, you as well. They could see it >>s o obviously hot off your AIPO A lot of buzz around that. Uh, that's fine. We could we could talk about that, but I really want to talk about the future. What? Before we get off the I p o. That was something you told me when you're CEO service. Now you said, hey, we're priced to perfection, so it looks like snowflakes gonna be priced to perfection. It's a marathon, though. You You made that clear. I presume it's not any different here for you. Yeah, >>well, I think you know the service now. Journey was different in the sense that we were kind of under the underdogs, and people sort of discovered over the years the full potential of the company and I think there's stuff like they pretty much discovered a day. One. It's a little bit more, More sometimes it's nice to be an underdog. Were a bit of an over dog in this, uh, this particular scenario, but, you know, it is what it is, Andre. You know, it's all about execution delivering the results, delivering on our vision, Uh, you know, being great with our customers. And, uh, hopefully the chips will fall where they where they may. At that point, >>yeah, you're you're You're a poorly kept secret at this point, Frank. After a while, I wanted, you know, I've got some excerpts of your book that that I've been reading. And, of course, I've been following your career since the two thousands. You're off sailing. You mentioned in your book that you were kind of retired. You were done, and then you get sucked back in now. Why? I mean, are you in this for the sport? What's the story here? >>Uh, actually, that that's not a bad way of characterizing it. I think I am in that, uh, you know, for the sport, uh, you know the only way to become the best version of yourself is to be to be under the gun and, uh, you know, every single day. And that's that's certainly what we are. It sort of has its own rewards building great products, building great companies, regardless off you know what the spoils. Maybe it has its own rewards. And I It's hard for people like us to get off the field and, you know, hang it up. So here we are. >>You know, you're putting forth this vision now the data cloud, which obviously it's good marketing, but I'm really happy because I don't like the term Enterprise Data Warehouse. I don't think it reflects what you're trying to accomplish. E D. W. It's slow on Lee. A few people really know how to use it. The time value of data is gone by the time you know, your business is moving faster than the data in the D. W. And it really became a savior because of Sarbanes Oxley. That's really what it came a reporting mechanism. So I've never seen What you guys are doing is is e d w. So I want you to talk about the data cloud. I want to get into the to the vision a little bit and maybe challenge you on a couple things so our audience can better understand it. Yes. So >>the notion of a data cloud is is actually, uh, you know, type of cloud that we haven't had. I mean, data has been been fragmented and locked up in a million different places in different clouds. Different cloud regions, obviously on premise, um, And for data science teams, you know, they're trying thio drive analysis across datasets, which is incredibly hard, Which is why you know, a lot of this resorts to, you know, programming on bond things of that sort of. ITT's hardly scalable because the data is not optimized. The economics are not optimized. There's no governance model and so on. But a data cloud is actually the ability thio loosely couple and lightly Federated uh, data, regardless of where it is. So it doesn't have scale limitations or performance limitations. Uh, the way traditional data warehouses have had it. So we really have a fighting chance off really killing the silos and unlocking the bunkers and allowing the full promise of data sciences and ml On day I thio really happen. I mean, a lot of lot of the analysis that happens on data is on the single data set because it's just too damn hard, you know, to drive analysis across multiple data sets. And, you know, when we talk to our customers, they have very precise designs on what they're trying to do. They say, Look, we are trying to discover, you know, through through through deep learning You know what the patterns are that lead to transactions. You know, whether it's if you're streaming company. Maybe it's that you're signing up for a channel or you're buying a movie or whatever it is. What is the pattern you know, of data points that leads us to that desired outcome. Once you have a very accurate description of the data relationships, you know that results in that outcome, you can then search for it and scale it, you know, tens of million times over. That's what digital enterprises do, right? So in order to discover these patterns enriched the data to the point where the patterns become incredibly predictive. Uh, that's that's what snowflake is formed, right? But it requires a completely Federated Data mo because you're not gonna find a data pattern in the in the single data set per se right? So that's that's what it's all about. I mean, the outcomes of a data cloud are very, very closely related to the business outcomes that the user is seeking, right? It's not some infrastructure process. It has a very remote relationship with business outcome. This is very, very closely related. >>So it doesn't take a brain surgeon to look at the Trillion Years Club. And so I could see that I could see the big you know, trillion dollars apple $2 trillion market cap companies. They got data at the core, whereas most companies most incumbents. Yeah, it might be a bottling plant that the core, some manufacturing or some other processes they put, they put data around it in these silos. It seems like you're trying toe really? Bring that innovation and put data at the core. And you've got an architecture to do that. You talk about your multi cluster shared storage architecture. You mentioned you mentioned data sharing it. Will this, in your opinion, enable, for instance, incumbents to do what a lot of the startups were able to do with the cloud days? I mean they got access to data centers, which they they couldn't have before the cloud you're trying to do with something similar with data. >>Yeah, so So, you know, obviously there's no doubt that the cloud is a critical enabler. This wouldn't be happening. Uh, you know what? I was at the same time, the trails that have been blessed by the likes of Facebook and Google. Uh, e the reason those enterprises are so extraordinary valuable is is because of what they know. Uh, you know, through data and how they can monetize what they know through data. But that is now because that power is now becoming available, you know, to every single enterprise out there. Right, Because the data platform, the underlying cloud capabilities, we are now delivering that to anybody who wants it. Now, you still need to have strong date engineering data science capabilities. It's not like falling off a log, but fundamentally, those capabilities are now, you know, broadly accessible in the marketplace. >>So we're talking upfront about some of the differences between what you've done earlier in your career. Like I said, you're the worst kept secret, you know, Data domain. I would say it was sort of somewhat of a niche market. You you blew it up until it was very disruptive, but it was somewhat limited in what could be done. Uh, and and maybe some of that limitation, you know, wouldn't have occurred if you stay the price, uh, independent company service. Now you mop the table up because you really had no competition there, Not the case here. You you've got some of the biggest competitors in the world, so talk about that. And what gives you confidence that you can continue to dominate, >>But, you know, it's actually interesting that you bring up these companies. I mean, data. The man was a scenario where we were constrained on market and literally we were a data backup company. As you recall, we needed to move into backup software. Need to move the primary storage. While we knew it, we couldn't execute on it because it took tremendous resource is which, back in the day, it was much harder than one of this right now. So we ended up selling the company to E M. C and and now part of Dell. But way short, uh, we're left with some trauma from that experience, Uh, that, you know, why couldn't we, you know, execute on that transformation? So coming to service now, we were extremely. I'm certainly need personally, extremely attuned to the challenges that we have endured in our prior company. One of the reasons why you saw service now break out at scale at tremendous growth rights is because of what we have learned from the prior journey. We're not gonna ever get caught again in a situation where we could not sustain our markets and sustain our growth. So if service I was very much the execution model was very much a reaction to what we had encountered in the prior company. Now coming into snowflake totally different deal. Because not only is there's a large market, this is a developing market. I think you've pointed out in some of your broadcasting that this market is very much in flux on the reason is that you know, technology is now capable of doing things for for people and enterprises that they could never do before. So people are spending way mawr resource is than they ever thought possible on these new capability. So you can't think in terms of static markets and static data definitions, it means nothing. Okay, These things are so in transition right now, it's very difficult for people you know to to scope that the scale of this opportunity. >>Yeah. I wanna understand you're thinking around and, you know, I've written about the TAM, and can Snowflake grow into its valuation and the way I drew it, I said, Okay, you got data Lakes and you got Enterprise Data Warehouse. That's pretty well understood. But I called it data as a service to cover the closest analogy to your data cloud. And then even beyond that, when you start bringing in the edge and real time data, uh, talk about how you're thinking about that, Tam. And what what you have to do to participate. You have toe, you know, bring adjacent capabilities, ISAT this read data sharing that will get you there. In other words, you're not like a transaction system. You hear people talking about converge databases, you hear? Talk about real time inference at the edge that today anyway, isn't what snowflake is about. Does that vision of data sharing and the data cloud does that allow you to participate in that massive, multi $100 billion tam that that I laid out and probably others as well. >>Yeah, well, it is always difficult. Thio defined markets based on historical concept that probably not gonna apply whole lot for much longer. I mean, the way we think of it is that data is the beating heart of the digital enterprise on, uh, you know, digital enterprises today. What do you look at? People against the car door dash or so on. Um, they were built from the ground up to be digital on the prices and data Is the beating heart off their operation Data operations is their manufacturing, if you will, um, every other enterprise out there is is working very hard to become digital or part digital and is going to learn to develop data platforms like what we're talking about here to data Cloud Azaz. Well, as the expertise in terms of data engineering and data scientist to really fully become a digital enterprise, right. So, you know, we view data as driving operations off the digital enterprise. That's really what it iss right data, and it's completely data driven. And there's no people involved. People are developing and supporting the process. But in the execution, it is end to end. Data driven. Being that data is the is the signal that initiates the process is technol assess. Their there being a detective, and then they fully execute the entire machinery probe Problematic machinery, if you will, um, you know, of the processes that have been designed, for example, you know, I may fit a certain pattern. You know, that that leads to some transactional context. But I've not fully completed that pattern until I click on some Lincoln. And all of a sudden proof I have become, you know, a prime prospect system, the text that in the real time and then unleashes Oh, it's outreach and capabilities to get me to transact me. You and I are experiencing this every day. You know, when we're when we're online, you just may not fully re election. That's what's happening behind the scenes. That's really what this is all about. So and so to me, this is sort of the new online transaction processing is enter and, uh, you know, data digital. Uh, no process that is continually acquiring, analyzing and acting on data. >>Well, you've talked about the time time value of of data. It loses value over time. And to the extent that you can actually affect decisions, maybe before you lose the customer before you lose the patient even even more importantly or before you lose the battle. Uh, there's all kinds of, you know, mental models that you can apply this. So automation is a key part of that. And then again, I think a lot of people like you said, if you just try to look at historical markets, you can't really squint through those and apply them. You really have toe open up your mind and think about the new possibilities. And so I could see your your component of automation. I I see what's happening in the r P. A space and and I could see See these this massive opportunities Thio really change society, change business, your last thoughts. >>There's just there's just no scenario that I can envision where data is not completely core in central to a digital enterprise, period. >>Yeah, I think I really do think, Frank, your your your Your vision is misunderstood somewhat. I think people say Okay. Hey, we'll bet on salute men Scarpelli the team. That's great to do that. But I think this is gonna unfold in a way that people may be having predicted that maybe you guys, yourselves and your founders, you know, haven't have aren't able to predict as well. But you've got that good, strong architectural philosophy that you're pursuing and it just kind of feels right, doesn't it? >>You know, I mean, one of the 100 conversations and, uh, you know, things is the one of the reasons why we also wrote our book. You know, the rights of the data cloud is to convey to the marketplace that this is not an incremental evolution, that this is not sort of building on the past. There is a real step function here on the way to think about it is that typically enterprises and institutions will look at a platform like snowflakes from a workload context. In other words, I have this business. I have this workload. This is very much historically defined, by the way. And then they benchmark us, you know, against what they're what they're already doing on some legacy platform. And they decided, like, Yeah, this is a good fit. We're gonna put Snowflake here. Maybe there, but it's still very workload centric, which means that we are essentially perpetuating the mentality off the past. Right? We were doing it. Wanna work, load of the time We're creating the new silos and the new bunkers of data in the process. And we're really not approaching this with level of vision that the data science is really required to drive maximum benefit from data. So our arguments and this is this is not an easy arguments is to say, toc IOS on any other sea level person that wants to listen to that look, you know, just thinking about, you know, operational context and operational. Excellent. It's like we have toe have a platform that allows us unfettered access to the data that, you know, we may need to, you know, bring the analytical power to right. If you have to bring in political power to a diversity of data sets, how are we going to do that right? The data lives in, like, 500 different places. It's just not possible, right, other than with insane amounts of programming and complexity, and then we don't have the performance, and we don't have to economics, and we don't have the governance and so on. So you really want to set yourself up with a data cloud so that you can unleash your data science, uh, capabilities, your machine learning your deep learning capabilities, aan den, you really get the full throttle advantage. You know of what the technology can do if you're going to perpetuate the silo and bunkering of data by doing it won't work. Load of the time. You know, 5, 10 years from now, we're having the same conversation we've been having over the last 40 years, you know? >>Yeah. Operationalize ing your data is gonna require busting down those those silos, and it's gonna require something like the data cloud to really power that to the next decade and beyond. Frank's movement Thanks so much for coming in. The Cuban helping us do a preview here of what's to come. >>You bet, Dave. Thanks. >>All right. Thank you for watching. Everybody says Dave Volonte for the Cube will see you next time

Published Date : Oct 16 2020

SUMMARY :

And as you know, we've been tracking the next generation of clouds. Yeah, you as well. Before we get off the I p o. That was something you told me when you're CEO service. this particular scenario, but, you know, it is what it is, Andre. I wanted, you know, I've got some excerpts of your book that that I've been reading. uh, you know, for the sport, uh, you know the only way to become the best version of yourself is to it. The time value of data is gone by the time you know, your business is moving faster than the data is on the single data set because it's just too damn hard, you know, to drive analysis across And so I could see that I could see the big you know, trillion dollars apple Uh, you know, through data and how they can monetize what Uh, and and maybe some of that limitation, you know, wouldn't have occurred if you stay the price, Uh, that, you know, why couldn't we, you know, execute on and the data cloud does that allow you to participate in that massive, And all of a sudden proof I have become, you know, a prime prospect system, Uh, there's all kinds of, you know, mental models that you completely core in central to a digital enterprise, period. maybe you guys, yourselves and your founders, you know, haven't have aren't able to predict as well. You know, I mean, one of the 100 conversations and, uh, you know, things and it's gonna require something like the data cloud to really power that to the next Everybody says Dave Volonte for the Cube will see you next time

ENTITIES

Entity	Category	Confidence
Dave Volonte	PERSON	0.99+
Frank	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Scarpelli	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Palo Alto	LOCATION	0.99+
apple	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Lee	PERSON	0.99+
IOS	TITLE	0.99+
Andre	PERSON	0.99+
Boston	LOCATION	0.99+
Cube Studios	ORGANIZATION	0.99+
Trillion Years Club	ORGANIZATION	0.99+
two thousands	QUANTITY	0.99+
100 conversations	QUANTITY	0.99+
trillion dollars	QUANTITY	0.98+
today	DATE	0.98+
ITT	ORGANIZATION	0.98+
one	QUANTITY	0.98+
$2 trillion	QUANTITY	0.98+
One	QUANTITY	0.97+
a day	QUANTITY	0.97+
single	QUANTITY	0.97+
Cloud Azaz	ORGANIZATION	0.96+
next decade	DATE	0.96+
TAM	ORGANIZATION	0.96+
$100 billion	QUANTITY	0.96+
Enterprise Data Warehouse	ORGANIZATION	0.95+
Dave Vellante	PERSON	0.95+
500 different	QUANTITY	0.94+
two point	QUANTITY	0.93+
D. W.	LOCATION	0.92+
Sarbanes Oxley	PERSON	0.91+
5	QUANTITY	0.9+
Snowflake	TITLE	0.87+
Snowflake	ORGANIZATION	0.86+
single data set	QUANTITY	0.86+
tens of million times	QUANTITY	0.85+
10 years	QUANTITY	0.83+
E M. C	ORGANIZATION	0.83+
Lincoln	PERSON	0.82+
Day Volonte	PERSON	0.82+
Lukman	PERSON	0.82+
Cuban	PERSON	0.8+
last 40 years	DATE	0.77+
snowflakes	TITLE	0.75+
single enterprise	QUANTITY	0.64+
Tam	ORGANIZATION	0.63+
Thio	PERSON	0.62+
million	QUANTITY	0.53+
single day	QUANTITY	0.49+
Cube	PERSON	0.36+

Ajay Vohora & Lester Waters, Io-Tahoe | AWS re:Invent 2019

>>LA Las Vegas. It's the cube covering AWS reinvent 2019, brought to you by Amazon web services and they don't care along with its ecosystem partners. >>Fine. Oh, welcome back here to Las Vegas. We are alive at AWS. Reinvent a lot with Justin Warren. I'm John Walls day one of a jam pack show. We had great keynotes this morning from Andy Jassy, uh, also representatives from Goldman Sachs and number of other enterprises on this stage right now we're gonna talk about data. It's all about data with IO Tahoe, a couple of the companies, representatives, CEO H J for horror. Jorge J. Thanks for being with us. Thank you Joan. And uh, Lester waters is the CSO at IO Tahoe. Leicester. Good afternoon to you. Thanks for being with us. Thank you for having us. CJ, you brought a football with you there. I see. So you've come prepared for a sport sport. I love it. All right. But if this is that your booth and your, you're showing here I assume and exhibiting and I know you've got a big offering we're going to talk about a little bit later on. First tell us about IO Tahoe a little bit to inform our viewers right now who might not be too familiar with the company. >>Sure. Well, our background was dealing with enterprise scale data issues that were really about the complexity, the amount of data and different types of data. So 2014 around when we're in stealth, kind of working on our technology, uh, the, a lot of the common technologies around them were Apache base. So Hadoop, um, large enterprises that were working with like a GE, Comcast had a cow help us come out of stealth in 2017. Uh, and grave, it's gave us a great story of solving petabyte scale data challenges, uh, using machine learning. So, uh, that manual overhead, that more and more as we look at, uh, AWS services, how do we drive the automation and get the value from data, uh, automation. >>It's gotta be the way forwards. All right, so let's, let's jump onto that then. Uh, on, on that notion, you've got this exponential growth in data, obviously working off the edge internet of things. Um, all these inputs, right? And we have so much more information at our disposal. Some of it's great, some of it's not. How do we know the difference, especially in this world where this exponential increase has happened. Lester, I mean, just tackle that for, from a, uh, from a company perspective and identifying, you know, first off, how do we ever figure out what do we have that's that valuable? Where do we get the value out of that, right? And then, um, how do we make sense of it? How do we put it into practice? >>Yeah. So I think not most enterprises have a problem with data sprawl. There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, they take a copy of that data. There's another instance of it. Then there's another instance for another project. >>And suddenly these different data sources become authoritative and become production. So now I have three, four, or five different instances. Oh, and then there's the three or four that got canceled and they're still sitting around. And as an information security professional, my challenge is to know where all of those pieces of data are so that, so that I can govern it and make sure that the stuff I don't need is gotten rid of it deleted. Uh, so you know, using the IO Tahoe software, I'm able to catalog all of that. I'm able to garner insights into that data using the, the nine patent pending algorithms that we have, uh, to, to find that, uh, to do intelligent tagging, if you will. So, uh, from my perspective, I'm very interested in making sure that I'm adhering to compliance rules. So the really cool thing about the stuff is that we go and tag data, we look at it and we actually tie it to lines of regulations. So you could go CC CCPA. This bit of text here applies to this. And that's really helpful for me as an information security professional because I'm not necessarily versed on every line of regulation, but when I can go and look at it handily like that, it makes it easier for me to go, Oh, okay, that's great. I know how to treat that in terms of control. So that for, that's the important bit for me. So if you don't know where your data is, you can't control it. You can't monitor it. >>Governance. Yeah. The, the knowing where stuff is, I'm familiar with a framework that was developed at Telstra back in Australia called the five no's, which is about exactly that. Knowing where your data is, what is it, who has access to it? Cause I actually being able to cattle on the data then like knowing what it is that you have. This is a mammoth task. I mean that's, that's hard enough 12 years ago. But like today with the amount of data that's actually actively being created every single day, so how, how does your system help CSOs tackle this, this kind of issue and maybe less listed. You can, you can start off and then, then you can tell us a bit more of yourself. >>Yeah, I mean I'll start off on that. It's a, a place to kind of see the feedback from our enterprise customers is as that veracity and volume of data increases. The, the challenge is definitely there to keep on top of governing that. So continually discovering that new data created, how is it different? How's it adding to the existing data? Uh, using machine learning and the models that we create, whether it's anomaly detection or classifying the data based on certain features in the data that allows us to tag it, load that in our catalog. So I've discovered it now we've made it accessible. Now any BI developer data engineer can search for that data in a catalog and make something from it. So if there were 10 steps in that data mile, we definitely sold the first four or five to of bring that momentum to getting value from that data. So discovering it, catalog it, tagging the data to make it searchable, and then it's free to pick up for whatever use case is out there, whether it's migration, security, compliance, um, security is a big one for you. >>And I would also add too, for the data scientists, you know, knowing all the assets they have available to them in order to, to drive those business value insights that they're so important these days. For companies because you know, a lot of companies compete on very thin margins and, and, and having insights into their data and to the way customers can use their data really can make, make or break a company these days. So that's, that's critical. And as Aja pointed out, being able to automate that through, through data ops if you will, uh, and drive those insights automatically is great. Like for example, from an information security standpoint, I want to fingerprint my data and I want to feed it into a DLP system. And so that, you know, I can really sort of keep an eye out if this data is actually going out. And it really is my data versus a standard reject kind of matching, which isn't the best, uh, techniques. So >>yeah. So walk us through that in a bit more detail. So you mentioned tagging is essentially that a couple of times. So let's go into the details a little bit about what that, what that actually means for customers. My understanding is that you're looking for things like a social security number that could be sitting somewhere in this data. So finding out where are all these social security numbers that I may not be aware of and it could be being shared with someone who shouldn't have access to that, but it is there, is that what it is or are they, are there other kinds of data that you're able to tag that traditional purchase? >>Yeah. Was wait straight out of the box. You've got your um, PII or personally, um, identifiable information, that kind of day that is covered under the CCPA GDPR. So there are those standards, regulatory driven definitions that is social security number name, address would fall under. Um, beyond that. Then in a large enterprise, you've got a clever data scientists, data engineers you through the nature of their work can combine sets of data that could include work patterns, IDs, um, lots of activity. You bring that together and that suddenly becomes, uh, under that umbrella of sensitive. Um, so being able to tag and classify data under those regulatory policies, but then is what and what could be an operational risk to an organization, whether it's a bank, insurance, utility, health care in particular, if you work in all those verticals or yeah, across the way, agnostic to any vertical. >>Okay. All right. And the nature of being able to do that is having that machine learning set up a baseline, um, around what is sensitive and then honing that to what is particular to that organization. So, you know, lots of people will use ever sort of seen here at AWS S three, uh, Aurora, Postgres or, or my sequel Redshift. Um, and also different ways the underlying sources of that data, whether it's a CRM system, a IOT, all of those sources have got nuances that makes every enterprise data landscape just slightly different. So China make a rules based, one size fits all approach is, is going to be limiting, um, that the increase your manual overhead. So customers like GE, Comcast, um, that move way beyond throwing people at the problem, that's no longer possible. Uh, so being smart about how to approach this, classifying the data, using features in the data crane, that metadata as an asset just as an eight data warehouse would be, allows you to, to enable the rest of the organization. >>So, I mean, you've talked about, um, you know, deriving value and identifying value. Um, how does ultimately, once you catalog your tag, what does this mean to the bottom line of terms of ROI? How does AWS play into that? Um, you know, why am I as, as a, as a company, you know, what value am I getting out of, of your abilities with AWS and then having that kind of capability. >>Yeah. We, we did a great study with Forester. Um, they calculated the ROI and it's a mixture of things. It's that manual personnel overhead who are locked into that. Um, pretty unpleasant low productivity role of wrangling with data for want of a better words to make something of it. They'd much rather be creating the dashboards that the BI or the insights. Um, so moving, you know, dozens of people from the back office manual wrangling into what's going to make difference to the chief marketing officer and your CFO bring down the cost of served your customer by getting those operational insights is how they want to get to working with that data. So that automation to take out the manual overhead of the upfront task is an allowing that, that resource to be better deployed onto the more interesting productive work. So that's one part of the ROI. >>The other is with AWS. What we've found here engaging with the AWS ecosystem is just that speed of migration to AWS. We can take months out of that by cataloging what's on premise and saying, huh, I date aside. So our data engineering team want to create products on for their own customers using Sage maker using Redshift, Athena. Um, but what is the exact data that we need to push into the cloud to use those services? Is it the 20 petabytes that we've accumulated over the 20 last 20 years? That's probably not going to be the case. So tiering the on prem and cloud, um, base of that data is, is really helpful to a data officer and an information architect to set themselves up to accelerate that migration to AWS. So for people who've used this kind of system and they've run through the tagging and seen the power of the platform that you've got there. So what are some of the things that they're now able to do once they've got these highly qual, high quality tagged data set? >>So it's not just tagging too. We also do, uh, we do, we do, we do fuzzy, fuzzy magic so we can find relationships in the data or even relationships within the data in terms of duplicate. So, so for example, somebody, somebody got married and they're really the same, you know, so now there's their surname has changed. We can help companies find that, those bits of a matching. And I think we had one customer where we saved about, saved him about a hundred thousand a year in mailing costs because they were sending, you know, to, you know, misses, you know, right there anymore. Her name was. And having the, you know, being able to deduplicate that kind of data really helps with that helps people save money. >>Yep. And that's kind of the next phase in our journey is moving beyond the tag in the classification is uh, our roadmap working with AWS is very much machine learning driven. So our engineering team, uh, what they're excited about is what's the next model, what's the next problem we can solve with AI machine learning to throw at the large scale data problem. So we'll continually be curating and creating that metadata catalog asset. So allow that to be used as a resource to enable the rest of the, the data landscape. >>And I think what's interesting about our product is we really have multiple audiences for it. We've got the chief data officer who wants to make sure that we're completely compliant because it doesn't want that 4% potential fine. You know, so being able to evidence that they're having due diligence and their data management will go a long way towards if there is a breach because zero days do happen. But if you can evidence that you've really been, been, had a good discipline, then you won't get that fine or hopefully you won't get a big fine. And that the second audience is going to be information security professionals who want to secure that perimeter. The third is going to be the data architects who are trying to, to uh, to, you know, manage and, and create new solutions with that data. And the fourth of course is the data scientists trying to drive >>new business value. >>Alright, well before we, we, we, we um, let y'all take off, I want to know about, uh, an offering that you've launched this week, uh, apparently to great success and you're pretty excited about just your space alone here, your presence here. But tell us a little bit about that before you take off. >>Yeah. So we're here also sponsoring the jam lounge and everybody's welcome to sign up. It's, um, a number of our friends there to competitively take some challenges, come into the jam lounge, use our products, and kind of understand what it means to accelerate that journey onto AWS. What can I do if I show what what? Yeah, give me, give me an idea about the blog. You can take some chances to discover data and understand what data is there. Isn't there fighting relationships and intuitively through our UI, start exploring that and, and joining the dots. Um, uh, what, what is my day that knowing your data and then creating policies to drive that data into use. Cool. Good. And maybe pick up a football along the way so I know. Yeah. Thanks for being with us. Thank you for half the time. And, uh, again, the jam lounge, right? Right, right here at the SAS Bora AWS reinvent. We are alive. And you're watching this right here on the queue.

Published Date : Dec 4 2019

SUMMARY :

AWS reinvent 2019, brought to you by Amazon web services So you've come prepared for So Hadoop, um, large enterprises that were working with like and identifying, you know, first off, how do we ever figure out what do we have that's that There's project startup, we get a block of data and then all of a sudden the new, a new project comes along, So that for, that's the important bit for me. it is that you have. tagging the data to make it searchable, and then it's free to pick up for And I would also add too, for the data scientists, you know, knowing all the assets they So let's go into the details a little bit about what that, what that actually means for customers. Um, so being able to tag and classify And the nature of being able to do that is having Um, you know, why am I as, as a, as a company, you know, what value am I Um, so moving, you know, dozens of people from the back office base of that data is, is really helpful to a data officer and And having the, you know, being able to deduplicate that kind of data really So allow that to be used as a resource And that the second audience is going you take off. start exploring that and, and joining the dots.

ENTITIES

Entity	Category	Confidence
Comcast	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Justin Warren	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Australia	LOCATION	0.99+
2017	DATE	0.99+
Joan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 steps	QUANTITY	0.99+
three	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
2014	DATE	0.99+
Telstra	ORGANIZATION	0.99+
Jorge J.	PERSON	0.99+
five	QUANTITY	0.99+
Ajay Vohora	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
20 petabytes	QUANTITY	0.99+
four	QUANTITY	0.99+
John Walls	PERSON	0.99+
IO Tahoe	ORGANIZATION	0.99+
4%	QUANTITY	0.99+
Io-Tahoe	PERSON	0.99+
one customer	QUANTITY	0.99+
First	QUANTITY	0.99+
CJ	PERSON	0.99+
Redshift	TITLE	0.99+
third	QUANTITY	0.99+
12 years ago	DATE	0.98+
fourth	QUANTITY	0.98+
today	DATE	0.98+
Lester Waters	PERSON	0.98+
H J	PERSON	0.97+
Aja	PERSON	0.97+
Forester	ORGANIZATION	0.97+
CCPA	TITLE	0.97+
this week	DATE	0.97+
zero days	QUANTITY	0.96+
about a hundred thousand a year	QUANTITY	0.96+
first	QUANTITY	0.95+
second audience	QUANTITY	0.94+
nine	QUANTITY	0.94+
LA Las Vegas	LOCATION	0.94+
Sage	ORGANIZATION	0.92+
Leicester	LOCATION	0.91+
Apache	ORGANIZATION	0.9+
Lester	PERSON	0.9+
SAS Bora	ORGANIZATION	0.88+
first four	QUANTITY	0.87+
one part	QUANTITY	0.87+
one	QUANTITY	0.87+
2019	DATE	0.85+
Hadoop	ORGANIZATION	0.84+
Aurora	TITLE	0.82+
dozens of people	QUANTITY	0.79+
Redshift	ORGANIZATION	0.78+
Postgres	ORGANIZATION	0.76+
20	DATE	0.75+
eight data warehouse	QUANTITY	0.74+
five different	QUANTITY	0.73+
CEO	PERSON	0.7+
single day	QUANTITY	0.69+
China	LOCATION	0.68+
20 last	QUANTITY	0.65+
Athena	LOCATION	0.63+
morning	DATE	0.55+
Invent	EVENT	0.54+
GDPR	TITLE	0.53+
S three	TITLE	0.52+
years	QUANTITY	0.51+
no	OTHER	0.4+
waters	ORGANIZATION	0.39+

Tony Higham, IBM | IBM Data and AI Forum

>>live from Miami, Florida It's the Q covering IBM is data in a I forum brought to you by IBM. >>We're back in Miami and you're watching the cubes coverage of the IBM data and a I forum. Tony hi. Amiss here is a distinguished engineer for Ditch the Digital and Cloud Business Analytics at IBM. Tony, first of all, congratulations on being a distinguished engineer. That doesn't happen often. Thank you for coming on the Cube. Thank you. So your area focus is on the B I and the Enterprise performance management space. >>Um, and >>if I understand it correctly, a big mission of yours is to try to modernize those make himself service, making cloud ready. How's that going? >>It's going really well. I mean, you know, we use things like B. I and enterprise performance management. When you really boil it down, there's that's analysis of data on what do we do with the data this useful that makes a difference in the world, and then this planning and forecasting and budgeting, which everyone has to do whether you are, you know, a single household or whether you're an Amazon or Boeing, which are also some of our clients. So it's interesting that we're going from really enterprise use cases, democratizing it all the way down to single user on the cloud credit card swipe 70 bucks a month >>so that was used to be used to work for Lotus. But Cognos is one of IBM's largest acquisitions in the software space ever. Steve Mills on his team architected complete transformation of IBM is business and really got heavily into it. I think I think it was a $5 billion acquisition. Don't hold me to that, but massive one of the time and it's really paid dividends now when all this sort of 2000 ten's came in and said, Oh, how Duke's gonna kill all the traditional b I traditional btw that didn't happen, that these traditional platforms were a fundamental component of people's data strategies, so that created the imperative to modernize and made sure that there could be things like self service and cloud ready, didn't it? >>Yeah, that's absolutely true. I mean, the work clothes that we run a really sticky were close right when you're doing your reporting, your consolidation or you're planning of your yearly cycle, your budget cycle on these technologies, you don't rip them out so easily. So yes, of course, there's competitive disruption in the space. And of course, cloud creates on opportunity for work loads to be wrong, Cheaper without your own I t people. And, of course, the era of digital software. I find it myself. I tried myself by it without ever talking to a sales person creates a democratization process for these really powerful tools that's never been invented before in that space. >>Now, when I started in the business a long, long time ago, it was called GSS decision support systems, and they at the time they promised a 360 degree view with business That never really happened. You saw a whole new raft of players come in, and then the whole B I and Enterprise Data Warehouse was gonna deliver on that promise. That kind of didn't happen, either. Sarbanes Oxley brought a big wave of of imperative around these systems because compliance became huge. So that was a real tailwind for it. Then her duke was gonna solve all these problems that really didn't happen. And now you've got a I, and it feels like the combination of those systems of record those data warehouse systems, the traditional business intelligence systems and all this new emerging tech together are actually going to be a game changer. I wonder if you could comment on >>well so they can be a game changer, but you're touching on a couple of subjects here that are connected. Right? Number one is obviously the mass of data, right? Cause data has accelerated at a phenomenal pace on then you're talking about how do I then visualize or use that data in a useful manner? And that really drives the use case for a I right? Because A I in and of itself, for augmented intelligence as we as we talk about, is only useful almost when it's invisible to the user cause the user needs to feel like it's doing something for them that super intuitive, a bit like the sort of transition between the electric car on the normal car. That only really happens when the electric car can do what the normal car can do. So with things like Imagine, you bring a you know, how do cluster into a B. I solution and you're looking at that data Well. If I can correlate, for example, time profit cost. Then I can create KP eyes automatically. I can create visualizations. I know which ones you like to see from that. Or I could give you related ones that I can even automatically create dashboards. I've got the intelligence about the data and the knowledge to know what? How you might what? Visualize adversity. You have to manually construct everything >>and a I is also going to when you when you spring. These disparage data sets together, isn't a I also going to give you an indication of the confidence level in those various data set. So, for example, you know, you're you're B I data set might be part of the General ledger. You know of the income statement and and be corporate fact very high confidence level. More sometimes you mention to do some of the unstructured data. Maybe not as high a confidence level. How our customers dealing with that and applying that first of all, is that a sort of accurate premise? And how is that manifesting itself in terms of business? Oh, >>yeah. So it is an accurate premise because in the world in the world of data. There's the known knowns on the unknown knowns, right? No, no's are what you know about your data. What's interesting about really good B I solutions and planning solutions, especially when they're brought together, right, Because planning and analysis naturally go hand in hand from, you know, one user 70 bucks a month to the Enterprise client. So it's things like, What are your key drivers? So this is gonna be the drivers that you know what drives your profit. But when you've got massive amounts of data and you got a I around that, especially if it's a I that's gone ontology around your particular industry, it can start telling you about drivers that you don't know about. And that's really the next step is tell me what are the drivers around things that I don't know. So when I'm exploring the data, I'd like to see a key driver that I never even knew existed. >>So when I talk to customers, I'm doing this for a while. One of the concerns they had a criticisms they had of the traditional systems was just the process is too hard. I got to go toe like a few guys I could go to I gotta line up, you know, submit a request. By the time I get it back, I'm on to something else. I want self serve beyond just reporting. Um, how is a I and IBM changing that dynamic? Can you put thes tools in the hands of users? >>Right. So this is about democratizing the cleverness, right? So if you're a big, broad organization, you can afford to hire a bunch of people to do that stuff. But if you're a startup or an SNB, and that's where the big market opportunity is for us, you know, abilities like and this it would be we're building this into the software already today is I'll bring a spreadsheet. Long spreadsheets. By definition, they're not rows and columns, right? Anyone could take a Roan Collin spreadsheet and turn into a set of data because it looks like a database. But when you've got different tabs on different sets of data that may or may not be obviously relatable to each other, that ai ai ability to be on introspect a spreadsheet and turn into from a planning point of view, cubes, dimensions and rules which turn your spreadsheet now to a three dimensional in memory cube or a planning application. You know, the our ability to go way, way further than you could ever do with that planning process over thousands of people is all possible now because we don't have taken all the hard work, all the lifting workout, >>so that three dimensional in memory Cuba like the sound of that. So there's a performance implication. Absolutely. On end is what else? Accessibility Maw wraps more users. Is that >>well, it's the ability to be out of process water. What if things on huge amounts of data? Imagine you're bowing, right? Howdy, pastors. Boeing How? I don't know. Three trillion. I'm just guessing, right? If you've got three trillion and you need to figure out based on the lady's hurricane report how many parts you need to go ship toe? Where that hurricane reports report is you need to do a water scenario on massive amounts of data in a second or two. So you know that capability requires an old lap solution. However, the rest of the planet other than old people bless him who are very special. People don't know what a laugh is from a pop tart, so democratizing it right to the person who says, I've got a set of data on as I still need to do what if analysis on things and probably at large data cause even if you're a small company with massive amounts of data coming through, people click. String me through your website just for example. You know what if I What if analysis on putting a 5% discount on this product based on previous sales have that going to affect me from a future sales again? I think it's the democratizing as the well is the ability to hit scale. >>You talk about Cloud and analytics, how they've they've come together, what specifically IBM has done to modernize that platform. And I'm interested in what customers are saying. What's the adoption like? >>So So I manage the Global Cloud team. We have night on 1000 clients that are using cloud the cloud implementations of our software growing actually so actually Maur on two and 1/2 1000. If you include the multi tenant version, there's two steps in this process, right when you've got an enterprise software solution, your clients have a certain expectation that your software runs on cloud just the way as it does on premise, which means in practical terms, you have to build a single tenant will manage cloud instance. And that's just the first step, right? Because getting clients to see the value of running the workload on cloud where they don't need people to install it, configure it, update it, troubleshoot it on all that other sort of I t. Stuff that subtracts you from doing running your business value. We duel that for you. But the future really is in multi tenant on how we can get vast, vast scale and also greatly lower costs. But the adoptions been great. Clients love >>it. Can you share any kind of indication? Or is that all confidential or what kind of metrics do you look at it? >>So obviously we look, we look a growth. We look a user adoption, and we look at how busy the service. I mean, let me give you the best way I can give you is a is a number of servers, volume numbers, right. So we have 8000 virtual machines running on soft layer or IBM cloud for our clients business Analytics is actually the largest client for IBM Cloud running those workloads for our clients. So it's, you know, that the adoption has been really super hard on the growth continues. Interestingly enough, I'll give you another factoid. So we just launched last October. Cognos Alex. Multi tenant. So it is truly multi infrastructure. You try, you buy, you give you credit card and away you go. And you would think, because we don't have software sellers out there selling it per se that it might not adopt as much as people are out there selling software. Okay, well, in one year, it's growing 10% month on month cigarette Ally's 10% month on month, and we're nearly 1400 users now without huge amounts of effort on our part. So clearly this market interest in running those softwares and then they're not want Tuesdays easer. Six people pretending some of people have 150 people pretending on a multi tenant software. So I believe that the future is dedicated is the first step to grow confidence that my own premise investments will lift and shift the cloud, but multi tenant will take us a lot >>for him. So that's a proof point of existing customer saying okay, I want to modernize. I'm buying in. Take 1/2 step of the man dedicated. And then obviously multi tenant for scale. And just way more cost efficient. Yes, very much. All right. Um, last question. Show us a little leg. What? What can you tell us about the road map? What gets you excited about the future? >>So I think the future historically, Planning Analytics and Carlos analytics have been separate products, right? And when they came together under the B I logo in about about a year ago, we've been spending a lot of our time bringing them together because, you know, you can fight in the B I space and you can fight in the planning space. And there's a lot of competitors here, not so many here. But when you bring the two things together, the connected value chain is where we really gonna win. But it's not only just doing is the connected value chain it and it could be being being vice because I'm the the former Lotus guy who believes in democratization of technology. Right? But the market showing us when we create a piece of software that starts at 15 bucks for a single user. For the same power mind you write little less less of the capabilities and 70 bucks for a single user. For all of it, people buy it. So I'm in. >>Tony, thanks so much for coming on. The kid was great to have you. Brilliant. Thank you. Keep it right there, everybody. We'll be back with our next guest. You watching the Cube live from the IBM data and a I form in Miami. We'll be right back.

Published Date : Oct 23 2019

SUMMARY :

IBM is data in a I forum brought to you by IBM. is on the B I and the Enterprise performance management How's that going? I mean, you know, we use things like B. I and enterprise performance management. so that created the imperative to modernize and made sure that there could be things like self service and cloud I mean, the work clothes that we run a really sticky were close right when you're doing and it feels like the combination of those systems of record So with things like Imagine, you bring a you know, and a I is also going to when you when you spring. that you know what drives your profit. By the time I get it back, I'm on to something else. You know, the our ability to go way, way further than you could ever do with that planning process So there's a performance implication. So you know that capability What's the adoption like? t. Stuff that subtracts you from doing running your business value. or what kind of metrics do you look at it? So I believe that the future is dedicated What can you tell us about the road map? For the same power mind you write little less less of the capabilities and 70 bucks for a single user. The kid was great to have you.

ENTITIES

Entity	Category	Confidence
Tony Higham	PERSON	0.99+
Steve Mills	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Boeing	ORGANIZATION	0.99+
Miami	LOCATION	0.99+
$5 billion	QUANTITY	0.99+
15 bucks	QUANTITY	0.99+
Tony	PERSON	0.99+
70 bucks	QUANTITY	0.99+
three trillion	QUANTITY	0.99+
5%	QUANTITY	0.99+
Three trillion	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
150 people	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
two steps	QUANTITY	0.99+
Six people	QUANTITY	0.99+
1000 clients	QUANTITY	0.99+
two things	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
last October	DATE	0.99+
One	QUANTITY	0.97+
one year	QUANTITY	0.97+
Duke	ORGANIZATION	0.97+
Ditch the Digital	ORGANIZATION	0.97+
today	DATE	0.97+
Cuba	LOCATION	0.96+
Amiss	PERSON	0.96+
Planning Analytics	ORGANIZATION	0.96+
single user	QUANTITY	0.96+
Lotus	TITLE	0.95+
nearly 1400 users	QUANTITY	0.95+
Tuesdays	DATE	0.92+
one	QUANTITY	0.92+
10% month	QUANTITY	0.92+
B I	ORGANIZATION	0.91+
about	DATE	0.91+
over thousands of people	QUANTITY	0.91+
Global Cloud	ORGANIZATION	0.91+
Carlos analytics	ORGANIZATION	0.91+
10% month	QUANTITY	0.9+
1/2 1000	QUANTITY	0.87+
Alex	PERSON	0.87+
first	QUANTITY	0.81+
70 bucks a month	QUANTITY	0.81+
8000 virtual machines	QUANTITY	0.8+
Ally	ORGANIZATION	0.79+
Enterprise Data Warehouse	ORGANIZATION	0.79+
single tenant	QUANTITY	0.79+
a year ago	DATE	0.79+
Collin	PERSON	0.78+
single user	QUANTITY	0.76+
1/2 step	QUANTITY	0.73+
Sarbanes Oxley	PERSON	0.73+
single household	QUANTITY	0.7+
Cloud Business Analytics	ORGANIZATION	0.7+
a second	QUANTITY	0.68+
couple	QUANTITY	0.65+
Cognos	PERSON	0.59+
2000 ten	DATE	0.58+
cloud	TITLE	0.57+
Roan	ORGANIZATION	0.56+
IBM Cloud	ORGANIZATION	0.53+
Cube	PERSON	0.37+

Show Wrap | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's three Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back. We're here to wrap up the M I T. Chief data officer officer, information quality. It's hashtag m i t CDO conference. You're watching the Cube. I'm David Dante, and Paul Gill is my co host. This is two days of coverage. We're wrapping up eyes. Our analysis of what's going on here, Paul, Let me let me kick it off. When we first started here, we talked about that are open. It was way saw the chief data officer role emerged from the back office, the information quality role. When in 2013 the CEO's that we talked to when we asked them what was their scope. We heard things like, Oh, it's very wide. Involves analytics, data science. Some CEOs even said Oh, yes, security is actually part of our purview because all the cyber data so very, very wide scope. Even in some cases, some of the digital initiatives were sort of being claimed. The studios were staking their claim. The reality was the CDO also emerged out of highly regulated industries financialservices healthcare government. And it really was this kind of wonky back office role. And so that's what my compliance, that's what it's become again. We're seeing that CEOs largely you're not involved in a lot of the emerging. Aye, aye initiatives. That's what we heard, sort of anecdotally talking to various folks At the same time. I feel as though the CDO role has been more fossilized than it was before. We used to ask, Is this role going to be around anymore? We had C I. Ose tell us that the CEO Rose was going to disappear, so you had both ends of the spectrum. But I feel as though that whatever it's called CDO Data's our chief analytics off officer, head of data, you know, analytics and governance. That role is here to stay, at least for for a fair amount of time and increasingly, issues of privacy and governance. And at least the periphery of security are gonna be supported by that CD a role. So that's kind of takeaway Number one. Let me get your thoughts. >> I think there's a maturity process going on here. What we saw really in 2016 through 2018 was, ah, sort of a celebration of the arrival of the CDO. And we're here, you know, we've got we've got power now we've got an agenda. And that was I mean, that was a natural outcome of all this growth and 90% of organizations putting sea Dios in place. I think what you're seeing now is a realization that Oh, my God, this is a mess. You know what I heard? This year was a lot less of this sort of crowing about the ascendance of sea Dios and Maura about We've got a big integration problem of big data cleansing problem, and we've got to get our hands down to the nitty gritty. And when you talk about, as you said, we had in here so much this year about strategic initiatives, about about artificial intelligence, about getting involved in digital business or customer experience transformation. What we heard this year was about cleaning up data, finding the data that you've got organizing it, applying meditator, too. It is getting in shape to do something with it. There's nothing wrong with that. I just think it's part of the natural maturation process. Organizations now have to go through Tiu to the dirty process of cleaning up this data before they can get to the next stage, which was a couple of three years out for most of >> the second. Big theme, of course. We heard this from the former head of analytics. That G s K on the opening keynote is the traditional methods have failed the the Enterprise Data Warehouse, and we've actually studied this a lot. You know, my analogy is often you snake swallowing a basketball, having to build cubes. E D W practitioners would always used to call it chasing the chips until we come up with a new chip. Oh, we need that because we gotta run faster because it's taking us hours and hours, weeks days to run these analytics. So that really was not an agile. It was a rear view mirror looking thing. And Sarbanes Oxley saved the E. D. W. Business because reporting became part of compliance thing perspective. The master data management piece we've heard. Do you consistently? We heard Mike Stone Breaker, who's obviously a technology visionary, was right on. It doesn't scale through this notion of duping. Everything just doesn't work and manually creating rules. It's just it's just not the right approach. This we also heard the top down data data enterprise data model doesn't works too complicated, can operationalize it. So what they do, they kick the can to governance. The Duke was kind of a sidecar, their big data that failed to live up to its promises. And so it's It's a big question as to whether or not a I will bring that level of automation we heard from KPMG. Certainly, Mike Stone breaker again said way heard this, uh, a cz well, from Andy Palmer. They're using technology toe automate and scale that big number one data science problem, which is? They spend all their time wrangling data. We'll see if that if that actually lives up >> to his probable is something we did here today from several of our guests. Was about the promise of machine learning to automate this day to clean up process and as ah Mark Ramsay kick off the conference saying that all of these efforts to standardize data have failed in the past. This does look, He then showed how how G s K had used some of the tools that were represented here using machine learning to actually clean up the data at G S. K. So there is. And I heard today a lot of optimism from the people we talked to about the capability of Chris, for example, talking about the capability of machine learning to bring some order to solve this scale scale problem Because really organizing data creating enterprise data models is a scale problem, and the only way you can solve that it's with with automation, Mike Stone breaker is right on top of that. So there was optimism at this event. There was kind of an ooh, kind of, ah, a dismay at seeing all the data problems they have to clean up, but also promised that tools are on the way that could do that. >> Yeah, The reason I'm an optimist about this role is because data such a hard problem. And while there is a feeling of wow, this is really a challenge. There's a lot of smart people here who are up for the challenge and have the d n a for it. So the role, that whole 360 thing. We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, which is really bringing machine intelligence to the table. We haven't heard that as much at this event. It's now front and center. It's just another example of a I injecting itself into virtually every aspect every corner of the industry. And again, I often jokes. Same wine, new bottle. Our industry has a habit of doing that, but it's cyclical, but it is. But we seem to be making consistent progress. >> And the machine learning, I thought was interesting. Several very guest spoke to machine learning being applied to the plumbing projects right now to cleaning up data. Those are really self contained projects. You can manage those you can. You can determine out test outcomes. You can vet the quality of the of the algorithms. It's not like you're putting machine learning out there in front of the customer where it could potentially do some real damage. There. They're vetting their burning in machine, learning in a environment that they control. >> Right, So So, Amy, Two solid days here. I think that this this conference has really grown when we first started here is about 130 people, I think. And now it was 500 registrants. This'd year. I think 600 is the sort of the goal for next year. Moving venues. The Cube has been covering this all but one year since 2013. Hope to continue to do that. Paul was great working with you. Um, always great work. I hope we can, uh we could do more together. We heard the verdict is bringing back its conference. You put that together. So we had column. Mahoney, um, had the vertical rock stars on which was fun. Com Mahoney, Mike Stone breaker uh, Andy Palmer and Chris Lynch all kind of weighed in, which was great to get their perspectives kind of the days of MPP and how that's evolved improving on traditional relational database. And and now you're Stone breaker. Applying all these m i. Same thing with that scale with Chris Lynch. So it's fun to tow. Watch those guys all Boston based East Coast folks some news. We just saw the news hit President Trump holding up jet icon contractors is we've talked about. We've been following that story very closely and I've got some concerns over that. It's I think it's largely because he doesn't like Bezos in The Washington Post Post. Exactly. You know, here's this you know, America first. The Pentagon says they need this to be competitive with China >> and a I. >> There's maybe some you know, where there's smoke. There's fire there, so >> it's more important to stick in >> the eye. That's what it seems like. So we're watching that story very closely. I think it's I think it's a bad move for the executive branch to be involved in those type of decisions. But you know what I know? Well, anyway, Paul awesome working with you guys. Thanks. And to appreciate you flying out, Sal. Good job, Alex Mike. Great. Already wrapping up. So thank you for watching. Go to silicon angle dot com for all the news. Youtube dot com slash silicon angles where we house our playlist. But the cube dot net is the main site where we have all the events. It will show you what's coming up next. We've got a bunch of stuff going on straight through the summer. And then, of course, VM World is the big kickoff for the fall season. Goto wicked bond dot com for all the research. We're out. Thanks for watching Dave. A lot day for Paul Gillon will see you next time.

Published Date : Aug 1 2019

SUMMARY :

Brought to you by in 2013 the CEO's that we talked to when we asked them what was their scope. And that was I mean, And Sarbanes Oxley saved the E. data models is a scale problem, and the only way you can solve that it's with with automation, We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, And the machine learning, I thought was interesting. We just saw the news hit President Trump holding up jet icon contractors There's maybe some you know, where there's smoke. And to appreciate you flying out, Sal.

ENTITIES

Entity	Category	Confidence
Andy Palmer	PERSON	0.99+
David Dante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
Chris	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
Paul Gill	PERSON	0.99+
Mike Stone	PERSON	0.99+
2016	DATE	0.99+
Paul Gillon	PERSON	0.99+
Mike Stone Breaker	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2018	DATE	0.99+
Rose	PERSON	0.99+
Alex Mike	PERSON	0.99+
Bezos	PERSON	0.99+
G s K	ORGANIZATION	0.99+
Mahoney	PERSON	0.99+
Boston	LOCATION	0.99+
KPMG	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Sal	PERSON	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
500 registrants	QUANTITY	0.99+
two days	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
today	DATE	0.99+
next year	DATE	0.99+
Mark Ramsay	PERSON	0.99+
360	QUANTITY	0.99+
this year	DATE	0.99+
Maura	PERSON	0.99+
G S. K.	ORGANIZATION	0.98+
Youtube	ORGANIZATION	0.98+
Amy	PERSON	0.98+
Pentagon	ORGANIZATION	0.98+
C I. Ose	PERSON	0.98+
Sarbanes Oxley	PERSON	0.97+
first	QUANTITY	0.97+
This year	DATE	0.96+
one year	QUANTITY	0.96+
Mike Stone breaker	PERSON	0.95+
Enterprise Data Warehouse	ORGANIZATION	0.95+
Dios	PERSON	0.94+
Two solid days	QUANTITY	0.94+
second	QUANTITY	0.94+
three years	QUANTITY	0.92+
about 130 people	QUANTITY	0.91+
600	QUANTITY	0.9+
Duke	ORGANIZATION	0.89+
VM World	EVENT	0.88+
dot com	ORGANIZATION	0.85+
China	ORGANIZATION	0.84+
E. D. W.	ORGANIZATION	0.83+
Cube	ORGANIZATION	0.8+
MIT	ORGANIZATION	0.77+
East Coast	LOCATION	0.75+
M I T.	PERSON	0.75+
2019	DATE	0.74+
President Trump	PERSON	0.71+
both ends	QUANTITY	0.71+
three	QUANTITY	0.68+
M I T.	EVENT	0.64+
cube dot net	ORGANIZATION	0.59+
Chief	PERSON	0.58+
The Washington Post Post	TITLE	0.57+
America	ORGANIZATION	0.56+
Goto wicked	ORGANIZATION	0.54+
CEO	PERSON	0.54+
couple	QUANTITY	0.54+
CDO	ORGANIZATION	0.45+
Stone	PERSON	0.43+
CDOIQ	TITLE	0.24+

Mark Ramsey, Ramsey International LLC | MIT CDOIQ 2019

>> From Cambridge, Massachusetts. It's theCUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by SiliconANGLE Media. >> Welcome back to Cambridge, Massachusetts, everybody. We're here at MIT, sweltering Cambridge, Massachusetts. You're watching theCUBE, the leader in live tech coverage, my name is Dave Vellante. I'm here with my co-host, Paul Gillin. Special coverage of the MITCDOIQ. The Chief Data Officer event, this is the 13th year of the event, we started seven years ago covering it, Mark Ramsey is here. He's the Chief Data and Analytics Officer Advisor at Ramsey International, LLC and former Chief Data Officer of GlaxoSmithKline. Big pharma, Mark, thanks for coming onto theCUBE. >> Thanks for having me. >> You're very welcome, fresh off the keynote. Fascinating keynote this evening, or this morning. Lot of interest here, tons of questions. And we have some as well, but let's start with your history in data. I sat down after 10 years, but I could have I could have stretched it to 20. I'll sit down with the young guns. But there was some folks in there with 30 plus year careers. How about you, what does your data journey look like? >> Well, my data journey, of course I was able to stand up for the whole time because I was in the front, but I actually started about 32, a little over 32 years ago and I was involved with building. What I always tell folks is that Data and Analytics has been a long journey, and the name has changed over the years, but we've been really trying to tackle the same problems of using data as a strategic asset. So when I started I was with an insurance and financial services company, building one of the first data warehouse environments in the insurance industry, and that was in the 87, 88 range, and then once I was able to deliver that, I ended up transitioning into being in consulting for IBM and basically spent 18 years with IBM in consulting and services. When I joined, the name had evolved from Data Warehousing to Business Intelligence and then over the years it was Master Data Management, Customer 360. Analytics and Optimization, Big Data. And then in 2013, I joined Samsung Mobile as their first Chief Data Officer. So, moving out of consulting, I really wanted to own the end-to-end delivery of advanced solutions in the Data Analytics space and so that made the transition to Samsung quite interesting, very much into consumer electronics, mobile phones, tablets and things of that nature, and then in 2015 I joined GSK as their first Chief Data Officer to deliver a Data Analytics solution. >> So you have long data history and Paul, Mark took us through. And you're right, Mark-o, it's a lot of the same narrative, same wine, new bottle but the technology's obviously changed. The opportunities are greater today. But you took us through Enterprise Data Warehouse which was ETL and then MAP and then Master Data Management which is kind of this mapping and abstraction layer, then an Enterprise Data Model, top-down. And then that all failed, so we turned to Governance which has been very very difficult and then you came up with another solution that we're going to dig into, but is it the same wine, new bottle from the industry? >> I think it has been over the last 20, 30 years, which is why I kind of did the experiment at the beginning of how long folks have been in the industry. I think that certainly, the technology has advanced, moving to reduction in the amount of schema that's required to move data so you can kind of move away from the map and move type of an approach of a data warehouse but it is tackling the same type of problems and like I said in the session it's a little bit like Einstein's phrase of doing the same thing over and over again and expecting a different answer is certainly the definition of insanity and what I really proposed at the session was let's come at this from a very different perspective. Let's actually use Data Analytics on the data to make it available for these purposes, and I do think I think it's a different wine now and so I think it's just now a matter of if folks can really take off and head that direction. >> What struck me about, you were ticking off some of the issues that have failed like Data Warehouses, I was surprised to hear you say Data Governance really hasn't worked because there's a lot of talk around that right now, but all of those are top-down initiatives, and what you did at GSK was really invert that model and go from the bottom up. What were some of the barriers that you had to face organizationally to get the cooperation of all these people in this different approach? >> Yeah, I think it's still key. It's not a complete bottoms up because then you do end up really just doing data for the sake of data, which is also something that's been tried and does not work. I think it has to be a balance and that's really striking that right balance of really tackling the data at full perspective but also making sure that you have very definitive use cases to deliver value for the organization and then striking the balance of how you do that and I think of the things that becomes a struggle is you're talking about very large breadth and any time you're covering multiple functions within a business it's getting the support of those different business functions and I think part of that is really around executive support and what that means, I did mention it in the session, that executive support to me is really stepping up and saying that the data across the organization is the organization's data. It isn't owned by a particular person or a particular scientist, and I think in a lot of organization, that gatekeeper mentality really does put barriers up to really tackling the full breadth of the data. >> So I had a question around digital initiatives. Everywhere you go, every C-level Executive is trying to get digital right, and a lot of this is top-down, a lot of it is big ideas and it's kind of the North Star. Do you think that that's the wrong approach? That maybe there should be a more tactical line of business alignment with that threaded leader as opposed to this big picture. We're going to change and transform our company, what are your thoughts? >> I think one of the struggles is just I'm not sure that organizations really have a good appreciation of what they mean when they talk about digital transformation. I think there's in most of the industries it is an initiative that's getting a lot of press within the organizations and folks want to go through digital transformation but in some cases that means having a more interactive experience with consumers and it's maybe through sensors or different ways to capture data but if they haven't solved the data problem it just becomes another source of data that we're going to mismanage and so I do think there's a risk that we're going to see the same outcome from digital that we have when folks have tried other approaches to integrate information, and if you don't solve the basic blocking and tackling having data that has higher velocity and more granularity, if you're not able to solve that because you haven't tackled the bigger problem, I'm not sure it's going to have the impact that folks really expect. >> You mentioned that at GSK you collected 15 petabytes of data of which only one petabyte was structured. So you had to make sense of all that unstructured data. What did you learn about that process? About how to unlock value from unstructured data as a result of that? >> Yeah, and I think this is something. I think it's extremely important in the unstructured data to apply advanced analytics against the data to go through a process of making sense of that information and a lot of folks talk about or have talked about historically around text mining of trying to extract an entity out of unstructured data and using that for the value. There's a few steps before you even get to that point, and first of all it's classifying the information to understand which documents do you care about and which documents do you not care about and I always use the story that in this vast amount of documents there's going to be, somebody has probably uploaded the cafeteria menu from 10 years ago. That has no scientific value, whereas a protocol document for a clinical trial has significant value, you don't want to look through manually a billion documents to separate those, so you have to apply the technology even in that first step of classification, and then there's a number of steps that ultimately lead you to understanding the relationship of the knowledge that's in the documents. >> Side question on that, so you had discussed okay, if it's a menu, get rid of it but there's certain restrictions where you got to keep data for decades. It struck me, what about work in process? Especially in the pharmaceutical industry. I mean, post Federal Rules of Civil Procedure was everybody looking for a smoking gun. So, how are organizations dealing with what to keep and what to get rid of? >> Yeah, and I think certainly the thinking has been to remove the excess and it's to your point, how do you draw the line as to what is excess, right, so you don't want to just keep every document because then if an organization is involved in any type of litigation and there's disclosure requirements, you don't want to have to have thousands of documents. At the same time, there are requirements and so it's like a lot of things. It's figuring out how do you abide by the requirements, but that is not an easy thing to do, and it really is another driver, certainly document retention has been a big thing over a number of years but I think people have not applied advanced analytics to the level that they can to really help support that. >> Another Einstein bro-mahd, you know. Keep everything you must but no more. So, you put forth a proposal where you basically had this sort of three approaches, well, combined three approaches. The crawlers to go, the spiders to go out and do the discovery and I presume that's where the classification is done? >> That's really the identification of all of the source information >> Okay, so find out what you got, okay. >> so that's kind of the start. Find out what you have. >> Step two is the data repository. Putting that in, I thought it was when I heard you I said okay it must be a logical data repository, but you said you basically told the CIO we're copying all the data and putting it into essentially one place. >> A physical location, yes. >> Okay, and then so I got another question about that and then use bots in the pipeline to move the data and then you sort of drew the diagram of the back end to all the databases. Unstructured, structured, and then all the fun stuff up front, visualization. >> Which people love to focus on the fun stuff, right? Especially, you can't tell how many articles are on you got to apply deep learning and machine learning and that's where the answers are, we have to have the data and that's the piece that people are missing. >> So, my question there is you had this tactical mindset, it seems like you picked a good workload, the clinical trials and you had at least conceptually a good chance of success. Is that a fair statement? >> Well, the clinical trials was one aspect. Again, we tackled the entire data landscape. So it was all of the data across all of R&D. It wasn't limited to just, that's that top down and bottom up, so the bottom up is tackle everything in the landscape. The top down is what's important to the organization for decision making. >> So, that's actually the entire R&D application portfolio. >> Both internal and external. >> So my follow up question there is so that largely was kind of an inside the four walls of GSK, workload or not necessarily. My question was what about, you hear about these emerging Edge applications, and that's got to be a nightmare for what you described. In other words, putting all the data into one physical place, so it must be like a snake swallowing a basketball. Thoughts on that? >> I think some of it really does depend on you're always going to have these, IOT is another example where it's a large amount of streaming information, and so I'm not proposing that all data in every format in every location needs to be centralized and homogenized, I think you have to add some intelligence on top of that but certainly from an edge perspective or an IOT perspective or sensors. The data that you want to then make decisions around, so you're probably going to have a filter level that will impact those things coming in, then you filter it down to where you're going to really want to make decisions on that and then that comes together with the other-- >> So it's a prioritization exercise, and that presumably can be automated. >> Right, but I think we always have these cases where we can say well what about this case, and you know I guess what I'm saying is I've not seen organizations tackle their own data landscape challenges and really do it in an aggressive way to get value out of the data that's within their four walls. It's always like I mentioned in the keynote. It's always let's do a very small proof of concept, let's take a very narrow chunk. And what ultimately ends up happening is that becomes the only solution they build and then they go to another area and they build another solution and that's why we end up with 15 or 25-- (all talk over each other) >> The conventional wisdom is you start small. >> And fail. >> And you go on from there, you fail and that's now how you get big things done. >> Well that's not how you support analytic algorithms like machine learning and deep learning. You can't feed those just fragmented data of one aspect of your business and expect it to learn intelligent things to then make recommendations, you've got to have a much broader perspective. >> I want to ask you about one statistic you shared. You found 26 thousand relational database schemas for capturing experimental data and you standardized those into one. How? >> Yeah, I mean we took advantage of the Tamr technology that Michael Stonebraker created here at MIT a number of years ago which is really, again, it's applying advanced analytics to the data and using the content of the data and the characteristics of the data to go from dispersed schemas into a unified schema. So if you look across 26 thousand schemas using machine learning, you then can understand what's the consolidated view that gives you one perspective across all of those different schemas, 'cause ultimately when you give people flexibility they love to take advantage of it but it doesn't mean that they're actually doing things in an extremely different way, 'cause ultimately they're capturing the same kind of data. They're just calling things different names and they might be using different formats but in that particular case we use Tamr very heavily, and that again is back to my example of using advanced analytics on the data to make it available to do the fun stuff. The visualization and the advanced analytics. >> So Mark, the last question is you well know that the CDO role emerged in these highly regulated industries and I guess in the case of pharma quasi-regulated industries but now it seems to be permeating all industries. We have Goka-lan from McDonald's and virtually every industry is at least thinking about this role or has some kind of de facto CDO, so if you were slotted in to a CDO role, let's make it generic. I know it depends on the industry but where do you start as a CDO for an organization large company that doesn't have a CDO. Even a mid-sized organization, where do you start? >> Yeah, I mean my approach is that a true CDO is maximizing the strategic value of data within the organization. It isn't a regulatory requirement. I know a lot of the banks started there 'cause they needed someone to be responsible for data quality and data privacy but for me the most critical thing is understanding the strategic objectives of the organization and how will data be used differently in the future to drive decisions and actions and the effectiveness of the business. In some cases, there was a lot of discussion around monetizing the value of data. People immediately took that to can we sell our data and make money as a different revenue stream, I'm not a proponent of that. It's internally monetizing your data. How do you triple the size of the business by using data as a strategic advantage and how do you change the executives so what is good enough today is not good enough tomorrow because they are really focused on using data as their decision making tool, and that to me is the difference that a CDO needs to make is really using data to drive those strategic decision points. >> And that nuance you mentioned I think is really important. Inderpal Bhandari, who is the Chief Data Officer of IBM often says how can you monetize the data and you're right, I don't think he means selling data, it's how does data contribute, if I could rephrase what you said, contribute to the value of the organization, that can be cutting costs, that can be driving new revenue streams, that could be saving lives if you're a hospital, improving productivity. >> Yeah, and I think what I've shared typically shared with executives when I've been in the CDO role is that they need to change their behavior, right? If a CDO comes in to an organization and a year later, the executives are still making decisions on the same data PowerPoints with spinning logos and they said ooh, we've got to have 'em. If they're still making decisions that way then the CDO has not been successful. The executives have to change what their level of expectation is in order to make a decision. >> Change agents, top down, bottom up, last question. >> Going back to GSK, now that they've completed this massive data consolidation project how are things different for that business? >> Yeah, I mean you look how Barron joined as the President of R&D about a year and a half ago and his primary focus is using data and analytics and machine learning to drive the decision making in the discovery of a new medicine and the environment that has been created is a key component to that strategic initiative and so they are actually completely changing the way they're selecting new targets for new medicines based on data and analytics. >> Mark, thanks so much for coming on theCUBE. >> Thanks for having me. >> Great keynote this morning, you're welcome. All right, keep it right there everybody. We'll be back with our next guest. This is theCUBE, Dave Vellante with Paul Gillin. Be right back from MIT. (upbeat music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by SiliconANGLE Media. Special coverage of the MITCDOIQ. I could have stretched it to 20. and so that made the transition to Samsung and then you came up with another solution on the data to make it available some of the issues that have failed striking the balance of how you do that and it's kind of the North Star. the bigger problem, I'm not sure it's going to You mentioned that at GSK you against the data to go through a process of Especially in the pharmaceutical industry. as to what is excess, right, so you and do the discovery and I presume Okay, so find out what you so that's kind of the start. all the data and putting it into essentially one place. and then you sort of drew the diagram of and that's the piece that people are missing. So, my question there is you had this Well, the clinical trials was one aspect. My question was what about, you hear about these and homogenized, I think you have to exercise, and that presumably can be automated. and then they go to another area and that's now how you get big things done. Well that's not how you support analytic and you standardized those into one. on the data to make it available to do the fun stuff. and I guess in the case of pharma the difference that a CDO needs to make is of the organization, that can be Yeah, and I think what I've shared and the environment that has been created This is theCUBE, Dave Vellante with Paul Gillin.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Mark	PERSON	0.99+
Mark Ramsey	PERSON	0.99+
15 petabytes	QUANTITY	0.99+
Samsung	ORGANIZATION	0.99+
Inderpal Bhandari	PERSON	0.99+
Michael Stonebraker	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
GlaxoSmithKline	ORGANIZATION	0.99+
Barron	PERSON	0.99+
Ramsey International, LLC	ORGANIZATION	0.99+
26 thousand schemas	QUANTITY	0.99+
GSK	ORGANIZATION	0.99+
18 years	QUANTITY	0.99+
2015	DATE	0.99+
thousands	QUANTITY	0.99+
Einstein	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
tomorrow	DATE	0.99+
Samsung Mobile	ORGANIZATION	0.99+
26 thousand	QUANTITY	0.99+
Ramsey International LLC	ORGANIZATION	0.99+
30 plus year	QUANTITY	0.99+
a year later	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Federal Rules of Civil Procedure	TITLE	0.99+
20	QUANTITY	0.99+
25	QUANTITY	0.99+
Both	QUANTITY	0.99+
first step	QUANTITY	0.99+
one petabyte	QUANTITY	0.98+
today	DATE	0.98+
15	QUANTITY	0.98+
one	QUANTITY	0.98+
three approaches	QUANTITY	0.98+
13th year	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
MIT	ORGANIZATION	0.97+
seven years ago	DATE	0.97+
McDonald's	ORGANIZATION	0.96+
MIT Chief Data Officer and	EVENT	0.95+
R&D	ORGANIZATION	0.95+
10 years ago	DATE	0.95+
this morning	DATE	0.94+
this evening	DATE	0.93+
one place	QUANTITY	0.93+
one perspective	QUANTITY	0.92+
about a year and a half ago	DATE	0.91+
over 32 years ago	DATE	0.9+
a lot of talk	QUANTITY	0.9+
a billion documents	QUANTITY	0.9+
CDO	TITLE	0.89+
decades	QUANTITY	0.88+
one statistic	QUANTITY	0.87+
2019	DATE	0.85+
first data	QUANTITY	0.84+
of years ago	DATE	0.83+
Step two	QUANTITY	0.8+
Tamr	OTHER	0.77+
Information Quality Symposium 2019	EVENT	0.77+
PowerPoints	TITLE	0.76+
documents	QUANTITY	0.75+
theCUBE	ORGANIZATION	0.75+
one physical	QUANTITY	0.73+
10 years	QUANTITY	0.72+
87, 88 range	QUANTITY	0.71+
President	PERSON	0.7+
Chief Data Officer	PERSON	0.7+
Enterprise Data Warehouse	ORGANIZATION	0.66+
Goka-lan	ORGANIZATION	0.66+
first Chief Data	QUANTITY	0.63+
first Chief Data Officer	QUANTITY	0.63+
Edge	TITLE	0.63+
tons	QUANTITY	0.62+

Mick Hollison, Cloudera | theCUBE NYC 2018

(lively peaceful music) >> Live, from New York, it's The Cube. Covering "The Cube New York City 2018." Brought to you by SiliconANGLE Media and its ecosystem partners. >> Well, everyone, welcome back to The Cube special conversation here in New York City. We're live for Cube NYC. This is our ninth year covering the big data ecosystem, now evolved into AI, machine learning, cloud. All things data in conjunction with Strata Conference, which is going on right around the corner. This is the Cube studio. I'm John Furrier. Dave Vellante. Our next guest is Mick Hollison, who is the CMO, Chief Marketing Officer, of Cloudera. Welcome to The Cube, thanks for joining us. >> Thanks for having me. >> So Cloudera, obviously we love Cloudera. Cube started in Cloudera's office, (laughing) everyone in our community knows that. I keep, keep saying it all the time. But we're so proud to have the honor of working with Cloudera over the years. And, uh, the thing that's interesting though is that the new building in Palo Alto is right in front of the old building where the first Palo Alto office was. So, a lot of success. You have a billboard in the airport. Amr Awadallah is saying, hey, it's a milestone. You're in the airport. But your business is changing. You're reaching new audiences. You have, you're public. You guys are growing up fast. All the data is out there. Tom's doing a great job. But, the business side is changing. Data is everywhere, it's a big, hardcore enterprise conversation. Give us the update, what's new with Cloudera. >> Yeah. Thanks very much for having me again. It's, it's a delight. I've been with the company for about two years now, so I'm officially part of the problem now. (chuckling) It's been a, it's been a great journey thus far. And really the first order of business when I arrived at the company was, like, welcome aboard. We're going public. Time to dig into the S-1 and reimagine who Cloudera is going to be five, ten years out from now. And we spent a good deal of time, about three or four months, actually crafting what turned out to be just 38 total words and kind of a vision and mission statement. But the, the most central to those was what we were trying to build. And it was a modern platform for machine learning analytics in the cloud. And, each of those words, when you unpack them a little bit, are very, very important. And this week, at Strata, we're really happy on the modern platform side. We just released Cloudera Enterprise Six. It's the biggest release in the history of the company. There are now over 30 open-source projects embedded into this, something that Amr and Mike could have never imagined back in the day when it was just a couple of projects. So, a very very large and meaningful update to the platform. The next piece is machine learning, and Hilary Mason will be giving the kickoff tomorrow, and she's probably forgotten more about ML and AI than somebody like me will ever know. But she's going to give the audience an update on what we're doing in that space. But, the foundation of having that data management platform, is absolutely fundamental and necessary to do good machine learning. Without good data, without good data management, you can't do good ML or AI. Sounds sort of simple but very true. And then the last thing that we'll be announcing this week, is around the analytics space. So, on the analytic side, we announced Cloudera Data Warehouse and Altus Data Warehouse, which is a PaaS flavor of our new data warehouse offering. And last, but certainly not least, is just the "optimize for the cloud" bit. So, everything that we're doing is optimized not just around a single cloud but around multi-cloud, hybrid-cloud, and really trying to bridge that gap for enterprises and what they're doing today. So, it's a new Cloudera to say the very least, but it's all still based on that core foundation and platform that, you got to know it, with very early on. >> And you guys have operating history too, so it's not like it's a pivot for Cloudera. I know for a fact that you guys had very large-scale customers, both with three letter, letters in them, the government, as well as just commercial. So, that's cool. Question I want to ask you is, as the conversation changes from, how many clusters do I have, how am I storing the data, to what problems am I solving because of the enterprises. There's a lot of hard things that enterprises want. They want compliance, all these, you know things that have either legacy. You guys work on those technical products. But, at the end of the day, they want the outcomes, they want to solve some problems. And data is clearly an opportunity and a challenge for large enterprises. What problems are you guys going after, these large enterprises in this modern platform? What are the core problems that you guys knock down? >> Yeah, absolutely. It's a great question. And we sort of categorize the way we think about addressing business problems into three broad categories. We use the terms grow, connect, and protect. So, in the "grow" sense, we help companies build or find new revenue streams. And, this is an amazing part of our business. You see it in everything from doing analytics on clickstreams and helping people understand what's happening with their web visitors and the like, all the way through to people standing up entirely new businesses based simply on their data. One large insurance provider that is a customer of ours, as an example, has taken on the challenge and asked us to engage with them on building really, effectively, insurance as a service. So, think of it as data-driven insurance rates that are gauged based on your driving behaviors in real time. So no longer simply just using demographics as the way that you determine, you know, all 18-year old young men are poor drivers. As it turns out, with actual data you can find out there's some excellent 18 year olds. >> Telematic, not demographics! >> Yeah, yeah, yeah, exactly! >> That Tesla don't connect to the >> Exactly! And Parents will love this, love this as well, I think. So they can find out exactly how their kids are really behaving by the way. >> They're going to know I rolled through the stop signs in Palo Alto. (laughing) My rates just went up. >> Exactly, exactly. So, so helping people grow new businesses based on their data. The second piece is "Connect". This is not just simply connecting devices, but that's a big part of it, so the IOT world is a big engine for us there. One of our favorite customer stories is a company called Komatsu. It's a mining manufacturer. Think of it as the ones that make those, just massive mines that are, that are all over the world. They're particularly big in Australia. And, this is equipment that, when you leave it sit somewhere, because it doesn't work, it actually starts to sink into the earth. So, being able to do predictive maintenance on that level and type and expense of equipment is very valuable to a company like Komatsu. We're helping them do that. So that's the "Connect" piece. And last is "Protect". Since data is in fact the new oil, the most valuable resource on earth, you really need to be able to protect it. Whether that's from a cyber security threat or it's just meeting compliance and regulations that are put in place by governments. Certainly GDPR is got a lot of people thinking very differently about their data management strategies. So we're helping a number of companies in that space as well. So that's how we kind of categorize what we're doing. >> So Mick, I wonder if you could address how that's all affected the ecosystem. I mean, one of the misconceptions early on was that Hadoop, Big Data, is going to kill the enterprise data warehouse. NoSQL is going to knock out Oracle. And, Mike has always said, "No, we are incremental". And people are like, "Yeah, right". But that's really, what's happened here. >> Yes. >> EDW was a fundamental component of your big data strategies. As Amr used to say, you know, SQL is the killer app for, for big data. (chuckling) So all those data sources that have been integrated. So you kind of fast forward to today, you talked about IOT and The Edge. You guys have announced, you know, your own data warehouse and platform as a service. So you see this embracing in this hybrid world emerging. How has that affected the evolution of your ecosystem? >> Yeah, it's definitely evolved considerably. So, I think I'd give you a couple of specific areas. So, clearly we've been quite successful in large enterprises, so the big SI type of vendors want a, want a piece of that action these days. And they're, they're much more engaged than they were early days, when they weren't so sure all of this was real. >> I always say, they like to eat at the trough and then the trough is full, so they dive right in. (all laughing) They're definitely very engaged, and they built big data practices and distinctive analytics practices as well. Beyond that, sort of the developer community has also begun to shift. And it's shifted from simply people that could spell, you know, Hive or could spell Kafka and all of the various projects that are involved. And it is elevated, in particular into a data science community. So one of additional communities that we sort of brought on board with what we're doing, not just with the engine and SPARK, but also with tools for data scientists like Cloudera Data Science Workbench, has added that element to the community that really wasn't a part of it, historically. So that's been a nice add on. And then last, but certainly not least, are the cloud providers. And like everybody, they're, those are complicated relationships because on the one hand, they're incredibly valuable partners to it, certainly both Microsoft and Amazon are critical partners for Cloudera, at the same time, they've got competitive offerings. So, like most successful software companies there's a lot of coopetition to contend with that also wasn't there just a few years ago when we didn't have cloud offerings, and they didn't have, you know, data warehouse in the cloud offerings. But, those are things that have sort of impacted the ecosystem. >> So, I've got to ask you a marketing question, since you're the CMO. By the way, great message UL. I like the, the "grow, connect, protect." I think that's really easy to understand. >> Thank you. >> And the other one was modern. The phrase, say the phrase again. >> Yeah. It's the "Cloudera builds the modern platform for machine learning analytics optimized for the cloud." >> Very tight mission statement. Question on the name. Cloudera. >> Mmhmm. >> It's spelled, it's actually cloud with ERA in the letters, so "the cloud era." People use that term all the time. We're living in the cloud era. >> Yes. >> Cloud-native is the hottest market right now in the Linux foundation. The CNCF has over two hundred and forty members and growing. Cloud-native clearly has indicated that the new, modern developers here in the renaissance of software development, in general, enterprises want more developers. (laughs) Not that you want to be against developers, because, clearly, they're going to hire developers. >> Absolutely. >> And you're going to enable that. And then you've got the, obviously, cloud-native on-premise dynamic. Hybrid cloud and multi-cloud. So is there plans to think about that cloud era, is it a cloud positioning? You see cloud certainly important in what you guys do, because the cloud creates more compute, more capabilities to move data around. >> Sure. >> And (laughs) process it. And make it, make machine learning go faster, which gives more data, more AI capabilities, >> It's the flywheel you and I were discussing. >> It's the flywheel of, what's the innovation sandwich, Dave? You know? (laughs) >> A little bit of data, a little bit of machine itelligence, in the cloud. >> So, the innovation's in play. >> Yeah, Absolutely. >> Positioning around Cloud. How are you looking at that? >> Yeah. So, it's a fascinating story. You were with us in the earliest days, so you know that the original architecture of everything that we built was intended to be run in the public cloud. It turns out, in 2008, there were exactly zero customers that wanted all of their data in a public cloud environment. So the company actually pivoted and re-architected the original design of the offerings to work on-prim. And, no sooner did we do that, then it was time to re-architect it yet again. And we are right in the midst of doing that. So, we really have offerings that span the whole gamut. If you want to just pick up you whole current Cloudera environment in an infrastructure as a service model, we offer something called Altus Director that allows you to do that. Just pick up the entire environment, step it up onto AWUS, or Microsoft Azure, and off you go. If you want the convenience and the elasticity and the ease of use of a true platform as a service, just this past week we announced Altus Data Warehouse, which is a platform as a service kind of a model. For data warehousing, we have the data engineering module for Altus as well. Last, but not least, is everybody's not going to sign up for just one cloud vendor. So we're big believers in multi-cloud. And that's why we support the major cloud vendors that are out there. And, in addition to that, it's going to be a hybrid world for as far out as we can see it. People are going to have certain workloads that, either for economics or for security reasons, they're going to continue to want to run in-house. And they're going to have other workloads, certainly more transient workloads, and I think ML and data science will fall into this camp, that the public cloud's going to make a great deal of sense. And, allowing companies to bridge that gap while maintaining one security compliance and management model, something we call a Shared Data Experience, is really our core differentiator as a business. That's at the very core of what we do. >> Classic cloud workload experience that you're bringing, whether it's on-prim or whatever cloud. >> That's right. >> Cloud is an operating environment for you guys. You look at it just as >> The delivery mechanism. In effect. Awesome. All right, future for Cloudera. What can you share with us. I know you're a public company. Can't say any forward-looking statements. Got to do all those disclaimers. But for customers, what's the, what's the North Star for Cloudera? You mentioned going after a much more hardcore enterprise. >> Yes. >> That's clear. What's the North Star for you guys when you talk to customers? What's the big pitch? >> Yeah. I think there's a, there's a couple of really interesting things that we learned about our business over the course of the past six, nine months or so here. One, was that the greatest need for our offerings is in very, very large and complex enterprises. They have the most data, not surprisingly. And they have the most business gain to be had from leveraging that data. So we narrowed our focus. We have now identified approximately five thousand global customers, so think of it as kind of Fortune or Forbes 5000. That is our sole focus. So, we are entirely focused on that end of the market. Within that market, there are certain industries that we play particularly well in. We're incredibly well-positioned in financial services. Very well-positioned in healthcare and telecommunications. Any regulated industry, that really cares about how they govern and maintain their data, is really the great target audience for us. And so, that continues to be the focus for the business. And we're really excited about that narrowing of focus and what opportunities that's going to build for us. To not just land new customers, but more to expand our existing ones into a broader and broader set of use cases. >> And data is coming down faster. There's more data growth than ever seen before. It's never stopping.. It's only going to get worse. >> We love it. >> Bring it on. >> Any way you look at it, it's getting worse or better. Mick, thanks for spending the time. I know you're super busy with the event going on. Congratulations on the success, and the focus, and the positioning. Appreciate it. Thanks for coming on The Cube. >> Absolutely. Thank you gentlemen. It was a pleasure. >> We are Cube NYC. This is our ninth year doing all action. Everything that's going on in the data world now is horizontally scaling across all aspects of the company, the society, as we know. It's super important, and this is what we're talking about here in New York. This is The Cube, and John Furrier. Dave Vellante. Be back with more after this short break. Stay with us for more coverage from New York City. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media This is the Cube studio. is that the new building in Palo Alto is right So, on the analytic side, we announced What are the core problems that you guys knock down? So, in the "grow" sense, we help companies by the way. They're going to know I rolled Since data is in fact the new oil, address how that's all affected the ecosystem. How has that affected the evolution of your ecosystem? in large enterprises, so the big and all of the various projects that are involved. So, I've got to ask you a marketing question, And the other one was modern. optimized for the cloud." Question on the name. We're living in the cloud era. Cloud-native clearly has indicated that the new, because the cloud creates more compute, And (laughs) process it. machine itelligence, in the cloud. How are you looking at that? that the public cloud's going to make a great deal of sense. Classic cloud workload experience that you're bringing, Cloud is an operating environment for you guys. What can you share with us. What's the North Star for you guys is really the great target audience for us. And data is coming down faster. and the positioning. Thank you gentlemen. is horizontally scaling across all aspects of the

ENTITIES

Entity	Category	Confidence
Komatsu	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Mick Hollison	PERSON	0.99+
Mike	PERSON	0.99+
Australia	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
2008	DATE	0.99+
Palo Alto	LOCATION	0.99+
Tom	PERSON	0.99+
New York	LOCATION	0.99+
Mick	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Tesla	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Hilary Mason	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
three letter	QUANTITY	0.99+
North Star	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
zero customers	QUANTITY	0.99+
five	QUANTITY	0.99+
18 year	QUANTITY	0.99+
ninth year	QUANTITY	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
both	QUANTITY	0.99+
ten years	QUANTITY	0.98+
four months	QUANTITY	0.98+
over two hundred and forty members	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
first	QUANTITY	0.98+
NoSQL	TITLE	0.98+
The Cube	ORGANIZATION	0.98+
over 30 open-source projects	QUANTITY	0.98+
Amr	PERSON	0.98+
today	DATE	0.98+
SQL	TITLE	0.98+
each	QUANTITY	0.98+
GDPR	TITLE	0.98+
tomorrow	DATE	0.98+
Cube	ORGANIZATION	0.97+
approximately five thousand global customers	QUANTITY	0.97+
Strata	ORGANIZATION	0.96+
about two years	QUANTITY	0.96+
Altus	ORGANIZATION	0.96+
earth	LOCATION	0.96+
EDW	TITLE	0.95+
18-year old	QUANTITY	0.95+
Strata Conference	EVENT	0.94+
few years ago	DATE	0.94+
one	QUANTITY	0.94+
AWUS	TITLE	0.93+
Altus Data Warehouse	ORGANIZATION	0.93+
first order	QUANTITY	0.93+
single cloud	QUANTITY	0.93+
Cloudera Enterprise Six	TITLE	0.92+
about three	QUANTITY	0.92+
Cloudera	TITLE	0.84+
three broad categories	QUANTITY	0.84+
past six	DATE	0.82+

Ronen Schwartz, Informatica & John Macintyre, Microsoft | Informatica World 2018

>> Narrator: Live from Las Vegas, it's The Cube! Covering Informatica World 2018. Brought to you by Informatica. >> Welcome back, everyone. We're live here in Las Vegas at the Venetian. This is Informatica World 2018. This is The Cube's exclusive coverage. I'm John Furrier, cohost of The Cube, with Peter Burris, my cohost for the past two days. Wall-to-wall coverage. Our next two guests are Ronen Schwartz, SVP's Junior Vice President, General Manager, Big Data Cloud, and Data Integration for Informatica; and John MacIntyre, who's the product management for Azure Sequel Data Warehouse with Microsoft. Part of the big news this morning on the keynote is the relationship between Microsoft Azure Cloud and Informatica. Welcome back, welcome to The Cube! Thanks for coming! >> Yeah, it's good to be here. >> So great to have you guys on, we were looking forward to this interview all morning, all day. We heard about the rumor of the news. Let's jump into it. But I want you to highlight the relationship, how you guys got here, because it's not just news, it's not just an announcement. There's actually code, shipping, product integration, push button, console, it's cloud, it's real cloud, hyper cloud. >> John: Yeah, yeah, absolutely. >> It's a real product. >> John M.: Absolutely. >> Yeah, definitely, this is correct and I do want to encourage the audience to go directly to the Azure environment, try SQL Data Warehouse and try to load as much data as possible, leverage the Informatica intelligent cloud services. It is, as you said, available today. >> Okay, so explain the product. Let's say you got the Informatica intelligent cloud services on Azure. What is the specific product? Take us through specifically what's happening and what is the impact to customers? >> So if you are a customer and you're looking to get agility, you want to get scale, you want to enjoy the benefits of cloud data warehouse, one of the first barriers that you have is how do I get my data into these new amazing capabilities that I can achieve in the cloud. And I think with this announcement we're simplifying that process and making it really streamlined. From within the same place that you start your new data warehouse, in one click you're actually coming to the strongest IPES that exists in the market and you are able to choose your data source and actually decide what data do you want to move and then in a very simple process, move that data into Azure SQL Data Warehouse. >> John, talk about the ease of use, because one of the things that pops in my head when I think about data is, man it's a pain in the butt. I got to do all this stuff, I got to get it off a storage drive, I got to upload it, I got to set it on a drive, FedEx the drive, whatever. Cloud has to be console based. Talk about that aspect of this deal. >> Well I think, John, you know one of the things that you'll hear from Microsoft is that we want to build the most productive cloud available for customers and when we look at it as Ronen was saying, excuse me, we move data, we get data connected into the Azure cloud and how do we do that in a push button way and so what you'll see through the integration that we've done is that all the way through single sign on, that you can just push a button, build that pipeline, get that data flowing from your on-premises environment and get that into the Azure SQL Data Warehouse with just pushing a few buttons and so what we see is customers are able to really accelerate their migration and movement to the cloud through that productivity. >> And how long has it been in the works for? You guys just didn't meet yesterday and did product integration. Talk about the relationship with Informatica. >> Yeah, we've been working with Informatica for years. Informatica's been a great partner and so we started working on this integration, I think, probably over a year ago and really envisioning what we could do for customers. How do we take all of the really great capabilities that Informatica brings to customers and connect those to the Azure cloud. One of the things that we believe for customers is that customers will live in a hybrid world, at least for some foreseeable time and so how do we enable customers to live in that world, to have their data spread across that world, and get all the lineage, governance, and data management capabilities that you need as an enterprise in this world and that's one of the great things that Informatica brings to the table here. >> And Microsoft, your ethos too is also your, seems to be and you can confirm this if it's true or not, to be open for data portability. >> John M.: Yeah. >> Certainly, GDPR has certainly a huge signal to the market that look, no one's going to fool around with this. Data's at the center of the value proposition. It has to move around. >> That's right. And so when we think about data, data interoperability, data portability, recently we introduced Azure Databricks as a GA service on Azure and so we've already done data interoperability across our relational data warehouse products as well as the Databricks products, so Spark and Spark runtimes can interoperate and have data access with the relational warehouse and the relational warehouse can load into Spark Clusters and so we see this giving customers the freedom to move their data and have their data in places that they need them as critical for them to be successful. >> Ronen, let me just get specific on the news here a second. The product is GA or preview, or? >> The product is in preview and it will be fully GA'd in the Q3 time frame, hopefully the middle toward the end of Q3. Customer can start experiencing with the product today and they will actually see us adding more and more capabilities to this experience even before the GA. >> What are some of the things the customers have been asking for? I know you guys do a lot of work on the product side with the customers so I want to ask the requirements that you guys put together on defining this product. What were some of things that were their pain points that you're solving and was it the ease of use, was it part of the plan of enterprise cataloging? Where did you guys come down when you did your PRD, or your requirements and all this stuff? >> So we've been working with customers and with partners for the last few years over their journey to adopt cloud and I think what we've seen is part of the challenges of adopting cloud was where do I start? How do I figure out what data should I move to the cloud first? What is actually going to be impacted by me doing this? One impact you touch which is security and privacy. Am I putting something in risk? Am I following the company policies? But other things is like, what other system are depending on this data to exist here and so when I move to the cloud, am I actually changing my overall enterprise data architecture? Where Informatica have been focusing, especially with the new catalog capabilities is in really giving the enterprise the full picture of the data. If data is the most important asset that you have, we're actually trying to map it for you, including impact analysis, including relationship dependencies. What we're trying to simplify is actually choosing the right data to move to the cloud and actually dealing with rest of the impact that is happening when you're adopting cloud fast. I think cloud is bringing an amazing premise. We want to make it really, really easy. This latest announcement is actually touching the experience itself, how can a customer go from starting a new data warehouse to bringing the data to the data warehouse. I think we are now making it even simpler than ever before. >> So one of the challenges that enterprises have overall is that they're so few people who really understand how to build these pipelines, how to administer these pipelines. Data scientists are not, the numbers are not growing fast. Microsoft also is an enormously powerful ecosystem itself. Do you anticipate that by doing IICS in this relationship way that your developers can actually start incorporating higher, more complex, more higher value data services in a simple way so that they can start putting it into their applications and reduce the need for those really smart people at large and small companies? >> I mean, I think what we want to get to is this notion of self-service data. And to Ronen's point, but that data has to be governed, that data has to be protected, you need to know that you can trust that data, you can trust the source of that data, (coughs) excuse me, you know that you can make decisions from that data, but we hear from customers is they really want IT and these specialists to get out of the way of the business. And so they want to enable their workforce to actually do data production, to say I can create a data set that I can actually make decisions around. I know the lineage of that data set, I know the quality of that data set, and I know where it's appropriate to go use that data set. It could be for data science. It could be for a data engineer to go pick up and use for another pipeline, or it could be for a business analyst. But I think with this partnership, what we're really focusing on is how do we accelerate that productivity for those people who are discovering the data, managing the data, and then those that can then build these data streams and build these data sets that can be consumed inside an organization. Now I think to your point, once we do that, we believe that we will see a proliferation of analysis and higher level advanced analytics on top of that data. What we're hearing from customers is the challenge isn't necessarily getting machine enlargening services up and running or doing advanced analytics or building models and training models. Yes there is a narrow set of people that go and do that, but inordinately what we hear is that customers are spending the bulk of their time, shaping, managing that data, wrangling that data, getting that data in a form that it can actually be consumed and I think this partnership-- >> A lot of prep work. >> Yeah, a ton of prep work. >> Talk about the dynamic. We've been hearing on The Cube here, certainly, and also out in the industry, that 80% of the time spent managing all this stuff, you guys have a value proposition of caching all the metadata so you can get a clear view and customers, we had Toyota on earlier, said we had all the data, we just actually made all these mistakes because we didn't connect it all. What you guys are doing, coming from Ronen, you're going to bring all of the Microsoft tools to the table now, so I'm a customer, the benefit to me is I get to leverage the power, BI stuff or whatever is coming down the pipe, whatever tools you have in your ecosystem, on-prem and also in the cloud, is that? >> Absolutely and so things like PowerApps going to be an ability with no code, low code experiences to actually go build intelligent applications, build things like sales oriented applications, recruiting oriented applications, and leverage that data, that is really what we want to unlock for enterprises and for data professionals. >> What do you think the time will be, just ballpark, ballpark order of magnitude, time to, that you're going to save on the setup? If 80% is industry benchmark people throwing around, but say 80% is wrangling setup, 20% analysis. What do you guys see the impact with something like the intelligent cloud service with Azure? >> Ronen, you can speak to what you're seeing already from some of the customers, but I think even from what we saw this morning in the keynote, we're cutting down the time dramatically in terms of, from identifying what data has value and then actually getting that, moving into Azure, what you saw in less than 10 minutes today would take days if not weeks to actually get done without these tools-- >> So significant number, big number? >> John M.: Yeah, absolutely. >> And I think there are actually two parts to people going through the adoption. One is the technology of moving the data, but the other one that is even, I think, a bigger barrier and sometimes even more important is can I actually just discover and identify the data and can I actually get all the metadata needed so that I can get the approval or I can get personally comfortable with the data that I'm choosinng. I think this cost now is actually being eliminated and that is actually going to allow more people to consume more data even faster, but I do agree that I think the demo speaks better than anything else, got a lot of good-- >> John F.: A few clicks and you're there, got some great props on Twitter, saw some great tweets. The question that begs next is now that I got a pipeline and automating, all this stuff's going on, console based and cataloging all this great stuff, AI, machine learning involved, where, is there, did you guys put the secret sauce in some of the tech? I mean, can you share what's under the hood at all? (laughs) Or is that the secret sauce? >> So, I can not steal some of the demos of tomorrow, but I think you will-- >> Yes you can. (laughs) >> Come on, tell us. >> But I think you will see an interesting AI driven interface-- >> That's a yes. >> From Microsoft working very interestingly with the catalog to drive intelligence to the users, so we will definitely demo it tomorrow on stage. >> John F.: So that's a yes. >> Yes, the answer is yes. >> But I want to build on this because I asked a question about whether or not developers are going to get access to this. If I have a platform that allows me to build very, very complex, but very rich, in a simple way, pipelines to data, I have a catalog that allows me to discover data, sustain knowledge about that data as the data changes over time, and I have a very simple way of setting that up and running it through an Azure cloud experience, can I anticipate that over time certain conventions for how data gets established, gets set up, organized, formats, all that other stuff, starts to emerge as a combination of this partnership so that developers can go into an account and say, okay so we're going to do this for you, oh, you have customer data, you have this data, I want to be able to grab that and make it part of my application. Isn't that where this goes over time? >> Yes, yes, in a very substantive way. I think we're also looking at it from, you'll have stay tuned on the Microsoft side, but we're working towards looking at data entities, business entities, and how do we enrich those entities and to your point, where do they get enriched in that data pipeline and then how do they get consumed and how do they get consumed in a way where we're expressing the data model, the schema, the lineage, and all of these things in a way that's very discoverable for those consuming that data, so they understand where it's coming from so that people, so we look at this partnership in terms of getting that data, getting that data more enriched, and getting that data more consumable in a standard way for application developers. Again, it could be those building intelligent applications, it could be those building business applications and there's a whole set of tools-- >> Or some as-yet-undefined class of applications that are made possible because it's easier to find the data, acknowledge the data, use the data. >> John M.: Yeah, absolutely. >> If we had more time, I'd love to drill down on the future with Microservices, containers, Kubernetes, all the cool stuff that's going on around cloud native. I'm sure there's a lot of head room there from a developer standpoint. Final question is, extending the partnership. Is there a go to market together? Are you guys taking it to the field? What's the relationship with Microsoft, your ecosystem, your developers, your customers, and Informatica? >> Yeah, we're doing a lot of joint go-to-market. Today already we've been doing a lot all the way up to this announcement and I think you'll see that increase based on this announcement. I don't know if Ronen you want to talk about specific things we're doing. >> Yeah, I think the success with the customer is already there and there is actually a really nice list of customers here that are mutual customer of ours doing exactly these scenarios. We'll make it easier for them to do it from now on. >> Yep. >> From a go-to-market perspective, we have a really nice go-to-market motion where the sales teams are actually getting aligned. The new visible integration will make it even easier for them. >> Yeah, this really hits a lot of the sweet spot, multi-cloud, hybrid cloud, truly data-driven, ease of use, getting up and running. Congratulations, Ronen, great job. John, great to see you. Here inside The Cube, putting all the data, packing it, sharing it out over the airwaves and over the Internet. Just The Cube, I'm John Furrier, Peter Burris, thanks for watching. Back with more live coverage. Stay with us for more coverage here at Informatica World 2018, live in Las Vegas. We'll be right back. (soft electronic music)

Published Date : May 22 2018

SUMMARY :

Brought to you by Informatica. Part of the big news this So great to have you guys on, leverage the Informatica What is the specific product? in the market and you are able because one of the things and get that into the been in the works for? and that's one of the great things seems to be and you can confirm this Data's at the center of and the relational warehouse on the news here a second. in the Q3 time frame, What are some of the the right data to move to the cloud and reduce the need for that data has to be governed, that 80% of the time spent and leverage that data, What do you guys see the impact so that I can get the approval (laughs) Or is that the secret sauce? Yes you can. intelligence to the users, that allows me to build and to your point, where acknowledge the data, use the data. on the future with Microservices, all the way up to this announcement them to do it from now on. we have a really nice go-to-market motion and over the Internet.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Ronen Schwartz	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Ronen	PERSON	0.99+
John MacIntyre	PERSON	0.99+
80%	QUANTITY	0.99+
John F.	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
John M.	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Las Vegas	LOCATION	0.99+
FedEx	ORGANIZATION	0.99+
two parts	QUANTITY	0.99+
20%	QUANTITY	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
less than 10 minutes	QUANTITY	0.99+
tomorrow	DATE	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
Informatica World 2018	EVENT	0.98+
Q3	DATE	0.98+
Databricks	ORGANIZATION	0.98+
One	QUANTITY	0.97+
Azure	TITLE	0.97+
this morning	DATE	0.95+
two guests	QUANTITY	0.95+
one click	QUANTITY	0.95+
Twitter	ORGANIZATION	0.95+
John Macintyre	PERSON	0.95+
single	QUANTITY	0.94+
SQL Data Warehouse	TITLE	0.94+
Azure cloud	TITLE	0.94+
first	QUANTITY	0.92+
first barriers	QUANTITY	0.92+
Informatica World 2018	EVENT	0.91+
GDPR	TITLE	0.91+
Venetian	LOCATION	0.91+
Spark	ORGANIZATION	0.91+
Big Data Cloud	ORGANIZATION	0.89+
PowerApps	TITLE	0.87+
over a year ago	DATE	0.84+
things	QUANTITY	0.81+
SVP	PERSON	0.77+

Seth Dobrin & Jennifer Gibbs | IBM CDO Strategy Summit 2017

>> Live from Boston, Massachusetts. It's The Cube! Covering IBM Chief Data Officer's Summit. Brought to you by IBM. (techno music) >> Welcome back to The Cube's live coverage of the IBM CDO Strategy Summit here in Boston, Massachusetts. I'm your host Rebecca Knight along with my Co-host Dave Vellante. We're joined by Jennifer Gibbs, the VP Enterprise Data Management of TD Bank, and Seth Dobrin who is VP and Chief Data Officer of IBM Analytics. Thanks for joining us Seth and Jennifer. >> Thanks for having us. >> Thank you. >> So Jennifer, I want to start with you can you tell our viewers a little about TD Bank, America's Most Convenient Bank. Based, of course, in Toronto. (laughs). >> Go figure. (laughs) >> So tell us a little bit about your business. >> So TD is a, um, very old bank, headquartered in Toronto. We do have, ah, a lot of business as well in the U.S. Through acquisition we've built quite a big business on the Eastern seaboard of the United States. We've got about 85 thousand employees and we're servicing 42 lines of business when it comes to our Data Management and our Analytics programs, bank wide. >> So talk about your Data Management and Analytics programs a little bit. Tell our viewers a little bit about those. >> So, we split up our office of the Chief Data Officer, about 3 to 4 years ago and so we've been maturing. >> That's relatively new. >> Relatively new, probably, not unlike peers of ours as well. We started off with a strong focus on Data Governance. Setting up roles and responsibilities, data storage organization and councils from which we can drive consensus and discussion. And then we started rolling out some of our Data Management programs with a focus on Data Quality Management and Meta Data Management, across the business. So setting standards and policies and supporting business processes and tooling for those programs. >> Seth when we first met, now you're a long timer at IBM. (laughs) When we first met you were a newbie. But we heard today, about,it used to be the Data Warehouse was king but now Process is king. Can you unpack that a little bit? What does that mean? >> So, you know, to make value of data, it's more than just having it in one place, right? It's what you do with the data, how you ingest the data, how you make it available for other uses. And so it's really, you know, data is not for the sake of data. Data is not a digital dropping of applications, right? The whole purpose of having and collecting data is to use it to generate new value for the company. And that new value could be cost savings, it could be a cost avoidance, or it could be net new revenue. Um, and so, to do that right, you need processes. And the processes are everything from business processes, to technical processes, to implementation processes. And so it's the whole, you need all of it. >> And so Jennifer, I don't know if you've seen kind of a similar evolution from data warehouse to data everywhere, I'm sure you have. >> Yeah. >> But the data quality problem was hard enough when you had this sort of central master data management approach. How are you dealing with it? Is there less of a single version of the truth now than there ever was, and how do you deal with the data quality challenge? >> I think it's important to scope out the work effort in a way that you can get the business moving in the right direction without overwhelming and focusing on the areas that are most important to the bank. So, we've identified and scoped out what we call critical data. So each line of business has to identify what's critical to them. Does relate very strongly to what Seth said around what are your core business processes and what data are you leveraging to provide value to that, to the bank. So, um, data quality for us is about a consistent approach, to ensure the most critical elements of data that used for business processes are where they need to be from a quality perspective. >> You can go down a huge rabbit whole with data quality too, right? >> Yeah. >> Data quality is about what's good enough, and defining, you know. >> Right. >> Mm-hmm (affirmative) >> It's not, I liked your, someone, I think you said, it's not about data quality, it's about, you know it's, you got to understand what good enough is, and it's really about, you know, what is the state of the data and under, it's really about understanding the data, right? Than it is perfection. There are some cases, especially in banking, where you need perfection, but there's tons of cases where you don't. And you shouldn't spend a lot of resources on something that's not value added. And I think it's important to do, even things like, data quality, around a specific use case so that you do it right. >> And what you were saying too, it that it's good enough but then that, that standard is changing too, all the time. >> Yeah and that changes over time and it's, you know, if you drive it by use case and not just, we have get this boil the ocean kind of approach where all data needs to be perfect. And all data will never be perfect. And back to your question about processes, usually, a data quality issue, is not a data issue, it's a process issue. You get bad data quality because a process is broken or it's not working for a business or it's changed and no one's documented it so there's a work around, right? And so that's really where your data quality issues come from. Um, and I think that's important to remember. >> Yeah, and I think also coming out of the data quality efforts that we're making, to your point, is it central wise or is it cross business? It's really driving important conversations around who's the producer of this data, who's the consumer of this data? What does data quality mean to you? So it's really generating a lot of conversation across lines of business so that we can start talking about data in more of a shared way versus more of a business by business point of view. So those conversations are important by-products I would say of the individual data quality efforts that we're doing across the bank. >> Well, and of course, you're in a regulated business so you can have the big hammer of hey, we've got regulations, so if somebody spins up a Hadoop Cluster in some line of business you can reel 'em in, presumably, more easily, maybe not always. Seth you operate in an unregulated business. You consult with clients that are in unregulated businesses, is that a bigger challenge for you to reel in? >> So, I think, um, I think that's changing. >> Mm-hmm (affirmative) >> You know, there's new regulations coming out in Europe that basically have global impact, right? This whole GDPR thing. It's not just if you're based in Europe. It's if you have a subject in Europe and that's an employee, a contractor, a customer. And so everyone is subject to regulations now, whether they like it or not. And, in fact, there was some level of regulation even in the U.S., which is kind of the wild, wild, west when it comes to regulations. But I think, um, you should, even doing it because of regulation is not the right answer. I mean it's a great stick to hold up. It's great to be able to go to your board and say, "Hey if we don't do this, we need to spend this money 'cause it's going to cost us, in the case of GDPR, four percent of our revenue per instance.". Yikes, right? But really it's about what's the value and how do you use that information to drive value. A lot of these regulation are about lineage, right? Understanding where your data came from, how it's being processed, who's doing what with it. A lot of it is around quality, right? >> Yep. >> And so these are all good things, even if you're not in a regulated industry. And they help you build a better connection with your customer, right? I think lots of people are scared of GDPR. I think it's a really good thing because it forces companies to build a personal relationship with each of their clients. Because you need to get consent to do things with their data, very explicitly. No more of these 30 pages, two point font, you know ... >> Click a box. >> Click a box. >> Yeah. >> It's, I am going to use your data for X. Are you okay with that? Yes or no. >> So I'm interested from, to hear from both of you, what are you hearing from customers on this? Because this is such a sensitive topic and, in particularly, financial data, which is so private. What are you, what are you hearing from customers on this? >> Um, I think customers are, um, are, especially us in our industry, and us as a bank. Our relationship with our customer is top priority and so maintaining that trust and confidence is always a top priority. So whenever we leverage data or look for use cases to leverage data, making sure that that trust will not be compromised is critically important. So finding that balance between innovating with data while also maintaining that trust and frankly being very transparent with customers around what we're using it for, why we're using it, and what value it brings to them, is something that we're focused on with, with all of our data initiatives. >> So, big part of your job is understanding how data can affect and contribute to the monetization, you know, of your businesses. Um, at the simplest level, two ways, cut costs, increase revenue. Where do you each see the emphasis? I'm sure both, but is there a greater emphasis on cutting costs 'cause you're both established, you know, businesses, with hundreds of thousands, well in your case, 85 thousand employees. Where do you see the emphasis? Is it greater on cutting costs or not necessarily? >> I think for us, I don't necessarily separate the two. Anything we can do to drive more efficiency within our business processes is going to help us focus our efforts on innovative use of data, innovative ways to interact with our customers, innovative ways to understand more about out customers. So, I see them both as, um, I don't see them mutually exclusive, I see them as contributing to each. >> Mm-hmm (affirmative) >> So our business cases tend to have an efficiency slant to them or a productivity slant to them and that helps us redirect effort to other, other things that provide extra value to our clients. So I'd say it's a mix. >> I mean I think, I think you have to do the cost savings and cost avoidance ones first. Um, you learn a lot about your data when you do that. You learn a lot about the gaps. You learn about how would I even think about bringing external data in to generate that new revenue if I don't understand my own data? How am I going to tie 'em all together? Um, and there's a whole lot of cultural change that needs to happen before you can even start generating revenue from data. And you kind of cut your teeth on that by doing the really, simple cost savings, cost avoidance ones first, right? Inevitably, maybe not in the bank, but inevitably most company's supply chain. Let's go find money we can take out of your supply chain. Most companies, if you take out one percent of the supply chain budget, you're talking a lot of money for the company, right? And so you can generate a lot of money to free up to spend on some of these other things. >> So it's a proof of concept to bring everyone along. >> Well it's a proof of concept but it's also, it's more of a cultural change, right? >> Mm-hmm (affirmative) It's not even, you don't even frame it up as a proof of concept for data or analytics, you just frame it up, we're going to save the company, you know, one percent of our supply chain, right? We're going to save the company a billion dollars. >> Yes. >> And then there's gain share there 'cause we're going to put that thing there. >> And then there's a gain share and then other people are like, "Well, how do I do that?". And how do I do that, and how do I do that? And it kind of picks up. >> Mm-hmm (affirmative) But I don't think you can jump just to making new revenue. You got to kind of get there iteratively. >> And it becomes a virtuous circle. >> It becomes a virtuous circle and you kind of change the culture as you do it. But you got to start with, I don't, I don't think they're mutually exclusive, but I think you got to start with the cost avoidance and cost savings. >> Mm-hmm (affirmative) >> Great. Well, Seth, Jennifer thanks so much for coming on The Cube. We've had a great conversation. >> Thanks for having us. >> Thanks. >> Thanks you guys. >> We will have more from the IBM CDO Summit in Boston, Massachusetts, just after this. (techno music)

Published Date : Oct 25 2017

SUMMARY :

Brought to you by IBM. Cube's live coverage of the So Jennifer, I want to start with you (laughs) So tell us a little of the United States. So talk about your Data Management and of the Chief Data Officer, And then we started met you were a newbie. And so it's the whole, you need all of it. to data everywhere, I'm sure you have. How are you dealing with it? So each line of business has to identify and defining, you know. And I think it's important to do, And what you were And back to your question about processes, across lines of business so that we can business so you can have the big hammer of So, I think, um, I and how do you use that And they help you build Are you okay with that? what are you hearing and so maintaining that Where do you each see the emphasis? as contributing to each. So our business cases tend to have And so you can generate a lot of money to bring everyone along. It's not even, you don't even frame it up to put that thing there. And it kind of picks up. But I don't think you can jump change the culture as you do it. much for coming on The Cube. from the IBM CDO Summit

ENTITIES

Entity	Category	Confidence
Seth	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jennifer	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer Gibbs	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
TD Bank	ORGANIZATION	0.99+
Toronto	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
TD	ORGANIZATION	0.99+
42 lines	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
30 pages	QUANTITY	0.99+
United States	LOCATION	0.99+
one percent	QUANTITY	0.99+
both	QUANTITY	0.99+
two point	QUANTITY	0.99+
U.S.	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
each line	QUANTITY	0.99+
GDPR	TITLE	0.99+
today	DATE	0.98+
each	QUANTITY	0.98+
85 thousand employees	QUANTITY	0.98+
hundreds of thousands	QUANTITY	0.98+
four percent	QUANTITY	0.97+
first	QUANTITY	0.97+
one place	QUANTITY	0.97+
two ways	QUANTITY	0.97+
about 85 thousand employees	QUANTITY	0.95+
4 years ago	DATE	0.93+
IBM	EVENT	0.93+
IBM CDO Summit	EVENT	0.91+
IBM CDO Strategy Summit	EVENT	0.91+
Data Warehouse	ORGANIZATION	0.89+
billion dollars	QUANTITY	0.89+
IBM Chief Data Officer's	EVENT	0.88+
about 3	DATE	0.81+
tons of cases	QUANTITY	0.79+
America	ORGANIZATION	0.77+
CDO Strategy Summit 2017	EVENT	0.76+
single version	QUANTITY	0.67+
Data Officer	PERSON	0.59+
Cube	ORGANIZATION	0.58+
money	QUANTITY	0.52+
lot	QUANTITY	0.45+
The Cube	ORGANIZATION	0.36+

Mick Bass, 47Lining - Data Platforms 2017 - #DataPlatforms2017

>> Live, from The Wigwam, in Phoenix, Arizona, it's theCube, covering Data Platforms 2017. Brought to you by Cue Ball. Hey, welcome back everybody. Jeff Frick here with theCube. Welcome back to Data Platforms 2017, at the historic Wigwam Resort, just outside of Phoenix, Arizona. I'm here all day with George Gilbert from Wikibon, and we're excited to be joined by our next guest. He's Mick Bass, the CEO of 47Lining. Mick, welcome. >> Welcome, thanks for having me, yes. >> Absolutely. So, what is 47Lining, for people that aren't familiar? >> Well, you know every cloud has a silver lining, and if you look at the periodic table, 47 is the atomic number for silver. So, we are a consulting services company that helps customers build out data platforms and ongoing data processes and data machines in Amazon web services. And, one of the primary use cases that we help customers with is to establish data lakes in Amazon web services to help them answer some of their most valuable business questions. >> So, there's always this question about own vs buy, right, with Cloud and Amazon, specifically. >> Mm-hmm, mm-hmm. >> And, with a data lake, the perception right... That's huge, this giant cost. Clearly that's from benefits that come with putting your data lake in AWS vs having it on Primm. What are some of the things you take customers through, and kind of the scenario planning and the value planning? >> Well, just a couple of the really important aspects, one, is this notion of elastic and on-demand pricing. In a Cloud based data lake, you can start out with actually a very small infrastructure footprint that's focused on maybe just one or two business use cases. You can pay only for the data that you need to get your data leg bootstrapped, and demonstrate the business benefit from one of those use cases. But, then it's very easy to scale that up, in a pay as you go kind of a way. The second, you know, really important benefit that customers experience in a platform that's built on AWS, is the breadth of the tools and capabilities that they can bring to bare for their predictive analytics and descriptive analytics, and streaming kinds of data problems. So, you need Spark, you can have it. You need Hive, you can have it. You need a high performance, close to the metal, data warehouse, on a cluster database, you can have it. So, analysts are really empowered through this approach because they can choose the right tool for the right job, and reduce the time to business benefit, based on what their business owners are asking them for. >> You touched on something really interesting, which was... So, when a customer is on Primm, and let's say is evaluating Cloudera, MaPr, Hortonworks, there's a finite set of services or software components within that distro. Once they're on the Cloud, there's a thousand times more... As you were saying, you could have one of 27 different data warehouse products, you could have many different sequel products, some of which are really delivered as services. >> Mm-hmm >> How does the consideration of the customer's choice change when they go to the Cloud? >> Well, I think that what they find is that it's much more tenable to take an agile, iterative process, where they're trying to align the outgoing cost of the data lake build to keep that in alignment with the business benefits that come from it. And, so if you recognize the need for a particular kind of analytics approach, but you're not going to need that until down the road, two or three quarters from now. It's easy to get started with simple use cases, and then like add those incremental services, as the need manifests. One of the things that I mention in my talk, that I always encourage our customers to keep in mind, is that a data lake is more than just a technology construct. It's not just an analysis set of machinery, it's really a business construct. Your data lake has a profit and loss statement, and the way that you interact with your business owners to identify this specific value sources, that you're going to make pop for you company, can be made to align with the cost footprint, as you build your data lake out. >> So I'm curious, when you're taking customers though the journey to start kind of thinking of the data lake and AWS, are there any specific kind of application spaces, or vertical spaces where you have pretty high confidence that you can secure an early, and relatively easy, win to help them kind of move down the road? >> Absolutely. So, you know, many of our customers, in a very common, you know, business need, is to enhance the set of information that they have available for a 360 degree view of the customer. In many cases, this information and data, it's available in different parts of the enterprises, but it might be siloed. And, a data lake approach in AWS really helps you to pull it together in an agile fashion based on particular, quarter by quarter, objectives or capabilities that you're trying to respond to. Another very common example is predictive analytics for things like fraud detection, or mechanical failure. So, in eCommerce kinds of situations, being able to pull together semi-structured information that might be coming from web servers or logs, or like what cookies are associated with this particular user. It's very easy to pull together a fraud oriented predictive analytic. And, then the third area that is very common is internet of things use cases. Many enterprises are augmenting their existing data warehouse with sensor oriented time series data, and there's really no place in the enterprise for that data currently to land. >> So, when you say they are augmenting the data warehouse, are they putting it in the data warehouse, or they putting it in a sort of adjunct, time series database, from which they can sort of curate aggregates, and things like that to put in the data warehouse? >> It's very much the latter, right. And, the time series data itself may come from multiple different vendors and the input formats, in which that information lands, can be pretty diverse. And so, it's not really a good fit for a typical kind of data warehouse ingest or intake process. >> So, if you were to look at, sort of, maturity models for the different use cases, where would we be, you know, like IOT, Customer 360, fraud, things like that? >> I think, you know, so many customers have pretty rich fraud analytics capabilities, but some of the pain points that we hear is that it's difficult for them to access the most recent technologies. In some cases the order management systems that those analytics are running on are quite old. We just finished some work with a customer where literally the order management system's running on a mainframe, even today. Those systems have the ability to accept steer from like a sidecar decision support predictive analytic system. And, one of the things that's really cool about the Cloud is you could build a custom API just for that fraud analytics use case so that you can inject exactly the right information that makes it super cheap and easy for the ops team, that's running that mainframe, to consume the fraud improvement decision signal that you're offering. >> Interesting. And so, this may be diving into the weeds a little bit, but if you've got an order management system that's decades old and you're going to plug-in something that has to meet some stringent performance requirements, how do you, sort of, test... It's not just the end to end performance once, but you know for the 99th percentile, that someone doesn't get locked out for five minutes while he's to trying to finish his shopping cart. >> Exactly. And I mean, I think this is what is important about the concept of building data machines, in the Cloud. This is not like a once and done kind of process. You're not building an analytic that produces a print out that an executive is going to look at (laughing) and make a decision. (laughing) You're really creating a process that runs at consumer scale, and you're going to apply all of the same kinds of metrics of percentile performance that you would apply at any kind of large scale consumer delivery system. >> Do you custom-build, a fraud prevention application for each customer? Or, is there a template and then some additional capabilities that you'll learn by running through their training data? >> Well, I think largely, there are business by business distinctions in the approach that these customers take to fraud detection. There's also business by business direction distinction in their current state. But, what we find is that the commonalities in the kinds of patterns and approaches that you tend to apply. So, you know... We may have extra data about you based on your behavior on the web, and your behavior on a mobile app. The particulars of that data might be different for Enterprise A vs Enterprise B, but this pattern of joining up mobile data plus web data plus, maybe, phone-in call center data. Putting those all together, to increase the signal that can be made available to a fraud prevention algorithm, that's very common across all enterprises. And so, one of the roles that we play is to set up the platform, so that it's really easy to mobilize each of these data sources. So in many cases, it's the customer's data scientist that's saying, I think I know how to do a better job for my business. I just need to be unleashed to be able to access this data, and if I'm blocked, I need a platform where the answer that I get back is oh, you could have that, like, second quarter of 2019. Instead, you want to say, oh, we can onboard that data in an agile fashion pay, and increment a little bit of money because you've identified a specific benefit that could be made available by having that data. >> Alright Mick, well thanks for stopping by. I'm going to send Andy Jassy a note that we found the silver lining to the Cloud (laughing) So, I'm excited for that, if nothing else, so that made the trip well worth while, so thanks for taking a few minutes. >> You bet, thanks so much, guys. >> Alright Mick Bass, George Gilbert, Jeff Frick, you're watching theCube, from Data Platforms 2017. We'll be right back after this short break. Thanks for watching. (computer techno beat)

Published Date : May 26 2017

SUMMARY :

Brought to you by Cue Ball. So, what is 47Lining, for people that aren't familiar? and if you look at the periodic table, So, there's always this question about own vs buy, right, What are some of the things you take customers through, and reduce the time to business benefit, you could have many different sequel products, and the way that you interact with your business owners for that data currently to land. and the input formats, so that you can inject exactly the right information It's not just the end to end performance once, a print out that an executive is going to look at (laughing) of patterns and approaches that you tend to apply. the silver lining to the Cloud (laughing) Thanks for watching.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Mick Bass	PERSON	0.99+
Jeff Frick	PERSON	0.99+
five minutes	QUANTITY	0.99+
one	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Mick	PERSON	0.99+
360 degree	QUANTITY	0.99+
Cue Ball	PERSON	0.99+
AWS	ORGANIZATION	0.99+
47Lining	ORGANIZATION	0.99+
99th percentile	QUANTITY	0.99+
Phoenix, Arizona	LOCATION	0.99+
two	QUANTITY	0.99+
second quarter of 2019	DATE	0.99+
One	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
second	QUANTITY	0.98+
each	QUANTITY	0.98+
Spark	TITLE	0.96+
today	DATE	0.96+
Cloud	TITLE	0.95+
27 different data warehouse products	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
decades	QUANTITY	0.94+
three quarters	QUANTITY	0.9+
each customer	QUANTITY	0.89+
MaPr	ORGANIZATION	0.87+
third area	QUANTITY	0.87+
two business use cases	QUANTITY	0.81+
The Wigwam	ORGANIZATION	0.8+
theCube	ORGANIZATION	0.8+
Wigwam Resort	LOCATION	0.78+
Cloud	ORGANIZATION	0.77+
IOT	ORGANIZATION	0.76+
47	OTHER	0.74+
a thousand times	QUANTITY	0.73+
Customer	ORGANIZATION	0.72+
Cloudera	ORGANIZATION	0.7+
2017	DATE	0.7+
things	QUANTITY	0.68+
#DataPlatforms2017	EVENT	0.62+
Platforms	TITLE	0.61+
Primm	ORGANIZATION	0.59+
Data	ORGANIZATION	0.58+
Data Platforms	EVENT	0.53+
Data Platforms 2017	TITLE	0.5+
lake	ORGANIZATION	0.49+
2017	EVENT	0.46+
Data Platforms	ORGANIZATION	0.38+
360	OTHER	0.24+

Next-Generation Analytics Social Influencer Roundtable - #BigDataNYC 2016 #theCUBE

>> Narrator: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors, CISCO, IBM, NVIDIA, and our ecosystem sponsors, now here's your host, Dave Valante. >> Welcome back to New York City, everybody, this is the Cube, the worldwide leader in live tech coverage, and this is a cube first, we've got a nine person, actually eight person panel of experts, data scientists, all alike. I'm here with my co-host, James Cubelis, who has helped organize this panel of experts. James, welcome. >> Thank you very much, Dave, it's great to be here, and we have some really excellent brain power up there, so I'm going to let them talk. >> Okay, well thank you again-- >> And I'll interject my thoughts now and then, but I want to hear them. >> Okay, great, we know you well, Jim, we know you'll do that, so thank you for that, and appreciate you organizing this. Okay, so what I'm going to do to our panelists is ask you to introduce yourself. I'll introduce you, but tell us a little bit about yourself, and talk a little bit about what data science means to you. A number of you started in the field a long time ago, perhaps data warehouse experts before the term data science was coined. Some of you started probably after Hal Varian said it was the sexiest job in the world. (laughs) So think about how data science has changed and or what it means to you. We're going to start with Greg Piateski, who's from Boston. A Ph.D., KDnuggets, Greg, tell us about yourself and what data science means to you. >> Okay, well thank you Dave and thank you Jim for the invitation. Data science in a sense is the second oldest profession. I think people have this built-in need to find patterns and whatever we find we want to organize the data, but we do it well on a small scale, but we don't do it well on a large scale, so really, data science takes our need and helps us organize what we find, the patterns that we find that are really valid and useful and not just random, I think this is a big challenge of data science. I've actually started in this field before the term Data Science existed. I started as a researcher and organized the first few workshops on data mining and knowledge discovery, and the term data mining became less fashionable, became predictive analytics, now it's data science and it will be something else in a few years. >> Okay, thank you, Eves Mulkearns, Eves, I of course know you from Twitter. A lot of people know you as well. Tell us about your experiences and what data scientist means to you. >> Well, data science to me is if you take the two words, the data and the science, the science it holds a lot of expertise and skills there, it's statistics, it's mathematics, it's understanding the business and putting that together with the digitization of what we have. It's not only the structured data or the unstructured data what you store in the database try to get out and try to understand what is in there, but even video what is coming on and then trying to find, like George already said, the patterns in there and bringing value to the business but looking from a technical perspective, but still linking that to the business insights and you can do that on a technical level, but then you don't know yet what you need to find, or what you're looking for. >> Okay great, thank you. Craig Brown, Cube alum. How many people have been on the Cube actually before? >> I have. >> Okay, good. I always like to ask that question. So Craig, tell us a little bit about your background and, you know, data science, how has it changed, what's it all mean to you? >> Sure, so I'm Craig Brown, I've been in IT for almost 28 years, and that was obviously before the term data science, but I've evolved from, I started out as a developer. And evolved through the data ranks, as I called it, working with data structures, working with data systems, data technologies, and now we're working with data pure and simple. Data science to me is an individual or team of individuals that dissect the data, understand the data, help folks look at the data differently than just the information that, you know, we usually use in reports, and get more insights on, how to utilize it and better leverage it as an asset within an organization. >> Great, thank you Craig, okay, Jennifer Shin? Math is obviously part of being a data scientist. You're good at math I understand. Tell us about yourself. >> Yeah, so I'm a senior principle data scientist at the Nielsen Company. I'm also the founder of 8 Path Solutions, which is a data science, analytics, and technology company, and I'm also on the faculty in the Master of Information and Data Science program at UC Berkeley. So math is part of the IT statistics for data science actually this semester, and I think for me, I consider myself a scientist primarily, and data science is a nice day job to have, right? Something where there's industry need for people with my skill set in the sciences, and data gives us a great way of being able to communicate sort of what we know in science in a way that can be used out there in the real world. I think the best benefit for me is that now that I'm a data scientist, people know what my job is, whereas before, maybe five ten years ago, no one understood what I did. Now, people don't necessarily understand what I do now, but at least they understand kind of what I do, so it's still an improvement. >> Excellent. Thank you Jennifer. Joe Caserta, you're somebody who started in the data warehouse business, and saw that snake swallow a basketball and grow into what we now know as big data, so tell us about yourself. >> So I've been doing data for 30 years now, and I wrote the Data Warehouse ETL Toolkit with Ralph Timbal, which is the best selling book in the industry on preparing data for analytics, and with the big paradigm shift that's happened, you know for me the past seven years has been, instead of preparing data for people to analyze data to make decisions, now we're preparing data for machines to make the decisions, and I think that's the big shift from data analysis to data analytics and data science. >> Great, thank you. Miriam, Miriam Fridell, welcome. >> Thank you. I'm Miriam Fridell, I work for Elder Research, we are a data science consultancy, and I came to data science, sort of through a very circuitous route. I started off as a physicist, went to work as a consultant and software engineer, then became a research analyst, and finally came to data science. And I think one of the most interesting things to me about data science is that it's not simply about building an interesting model and doing some interesting mathematics, or maybe wrangling the data, all of which I love to do, but it's really the entire analytics lifecycle, and a value that you can actually extract from data at the end, and that's one of the things that I enjoy most is seeing a client's eyes light up or a wow, I didn't really know we could look at data that way, that's really interesting. I can actually do something with that, so I think that, to me, is one of the most interesting things about it. >> Great, thank you. Justin Sadeen, welcome. >> Absolutely, than you, thank you. So my name is Justin Sadeen, I work for Morph EDU, an artificial intelligence company in Atlanta, Georgia, and we develop learning platforms for non-profit and private educational institutions. So I'm a Marine Corp veteran turned data enthusiast, and so what I think about data science is the intersection of information, intelligence, and analysis, and I'm really excited about the transition from big data into smart data, and that's what I see data science as. >> Great, and last but not least, Dez Blanchfield, welcome mate. >> Good day. Yeah, I'm the one with the funny accent. So data science for me is probably the funniest job I've ever to describe to my mom. I've had quite a few different jobs, and she's never understood any of them, and this one she understands the least. I think a fun way to describe what we're trying to do in the world of data science and analytics now is it's the equivalent of high altitude mountain climbing. It's like the extreme sport version of the computer science world, because we have to be this magical unicorn of a human that can understand plain english problems from C-suite down and then translate it into code, either as soles or as teams of developers. And so there's this black art that we're expected to be able to transmogrify from something that we just in plain english say I would like to know X, and we have to go and figure it out, so there's this neat extreme sport view I have of rushing down the side of a mountain on a mountain bike and just dodging rocks and trees and things occasionally, because invariably, we do have things that go wrong, and they don't quite give us the answers we want. But I think we're at an interesting point in time now with the explosion in the types of technology that are at our fingertips, and the scale at which we can do things now, once upon a time we would sit at a terminal and write code and just look at data and watch it in columns, and then we ended up with spreadsheet technologies at our fingertips. Nowadays it's quite normal to instantiate a small high performance distributed cluster of computers, effectively a super computer in a public cloud, and throw some data at it and see what comes back. And we can do that on a credit card. So I think we're at a really interesting tipping point now where this coinage of data science needs to be slightly better defined, so that we can help organizations who have weird and strange questions that they want to ask, tell them solutions to those questions, and deliver on them in, I guess, a commodity deliverable. I want to know xyz and I want to know it in this time frame and I want to spend this much amount of money to do it, and I don't really care how you're going to do it. And there's so many tools we can choose from and there's so many platforms we can choose from, it's this little black art of computing, if you'd like, we're effectively making it up as we go in many ways, so I think it's one of the most exciting challenges that I've had, and I think I'm pretty sure I speak for most of us in that we're lucky that we get paid to do this amazing job. That we get make up on a daily basis in some cases. >> Excellent, well okay. So we'll just get right into it. I'm going to go off script-- >> Do they have unicorns down under? I think they have some strange species right? >> Well we put the pointy bit on the back. You guys have in on the front. >> So I was at an IBM event on Friday. It was a chief data officer summit, and I attended what was called the Data Divas' breakfast. It was a women in tech thing, and one of the CDOs, she said that 25% of chief data officers are women, which is much higher than you would normally see in the profile of IT. We happen to have 25% of our panelists are women. Is that common? Miriam and Jennifer, is that common for the data science field? Or is this a higher percentage than you would normally see-- >> James: Or a lower percentage? >> I think certainly for us, we have hired a number of additional women in the last year, and they are phenomenal data scientists. I don't know that I would say, I mean I think it's certainly typical that this is still a male-dominated field, but I think like many male-dominated fields, physics, mathematics, computer science, I think that that is slowly changing and evolving, and I think certainly, that's something that we've noticed in our firm over the years at our consultancy, as we're hiring new people. So I don't know if I would say 25% is the right number, but hopefully we can get it closer to 50. Jennifer, I don't know if you have... >> Yeah, so I know at Nielsen we have actually more than 25% of our team is women, at least the team I work with, so there seems to be a lot of women who are going into the field. Which isn't too surprising, because with a lot of the issues that come up in STEM, one of the reasons why a lot of women drop out is because they want real world jobs and they feel like they want to be in the workforce, and so I think this is a great opportunity with data science being so popular for these women to actually have a job where they can still maintain that engineering and science view background that they learned in school. >> Great, well Hillary Mason, I think, was the first data scientist that I ever interviewed, and I asked her what are the sort of skills required and the first question that we wanted to ask, I just threw other women in tech in there, 'cause we love women in tech, is about this notion of the unicorn data scientist, right? It's been put forth that there's the skill sets required to be a date scientist are so numerous that it's virtually impossible to have a data scientist with all those skills. >> And I love Dez's extreme sports analogy, because that plays into the whole notion of data science, we like to talk about the theme now of data science as a team sport. Must it be an extreme sport is what I'm wondering, you know. The unicorns of the world seem to be... Is that realistic now in this new era? >> I mean when automobiles first came out, they were concerned that there wouldn't be enough chauffeurs to drive all the people around. Is there an analogy with data, to be a data-driven company. Do I need a data scientist, and does that data scientist, you know, need to have these unbelievable mixture of skills? Or are we doomed to always have a skill shortage? Open it up. >> I'd like to have a crack at that, so it's interesting, when automobiles were a thing, when they first bought cars out, and before they, sort of, were modernized by the likes of Ford's Model T, when we got away from the horse and carriage, they actually had human beings walking down the street with a flag warning the public that the horseless carriage was coming, and I think data scientists are very much like that. That we're kind of expected to go ahead of the organization and try and take the challenges we're faced with today and see what's going to come around the corner. And so we're like the little flag-bearers, if you'd like, in many ways of this is where we're at today, tell me where I'm going to be tomorrow, and try and predict the day after as well. It is very much becoming a team sport though. But I think the concept of data science being a unicorn has come about because the coinage hasn't been very well defined, you know, if you were to ask 10 people what a data scientist were, you'd get 11 answers, and I think this is a really challenging issue for hiring managers and C-suites when the generants say I was data science, I want big data, I want an analyst. They don't actually really know what they're asking for. Generally, if you ask for a database administrator, it's a well-described job spec, and you can just advertise it and some 20 people will turn up and you interview to decide whether you like the look and feel and smell of 'em. When you ask for a data scientist, there's 20 different definitions of what that one data science role could be. So we don't initially know what the job is, we don't know what the deliverable is, and we're still trying to figure that out, so yeah. >> Craig what about you? >> So from my experience, when we talk about data science, we're really talking about a collection of experiences with multiple people I've yet to find, at least from my experience, a data science effort with a lone wolf. So you're talking about a combination of skills, and so you don't have, no one individual needs to have all that makes a data scientist a data scientist, but you definitely have to have the right combination of skills amongst a team in order to accomplish the goals of data science team. So from my experiences and from the clients that I've worked with, we refer to the data science effort as a data science team. And I believe that's very appropriate to the team sport analogy. >> For us, we look at a data scientist as a full stack web developer, a jack of all trades, I mean they need to have a multitude of background coming from a programmer from an analyst. You can't find one subject matter expert, it's very difficult. And if you're able to find a subject matter expert, you know, through the lifecycle of product development, you're going to require that individual to interact with a number of other members from your team who are analysts and then you just end up well training this person to be, again, a jack of all trades, so it comes full circle. >> I own a business that does nothing but data solutions, and we've been in business 15 years, and it's been, the transition over time has been going from being a conventional wisdom run company with a bunch of experts at the top to becoming more of a data-driven company using data warehousing and BI, but now the trend is absolutely analytics driven. So if you're not becoming an analytics-driven company, you are going to be behind the curve very very soon, and it's interesting that IBM is now coining the phrase of a cognitive business. I think that is absolutely the future. If you're not a cognitive business from a technology perspective, and an analytics-driven perspective, you're going to be left behind, that's for sure. So in order to stay competitive, you know, you need to really think about data science think about how you're using your data, and I also see that what's considered the data expert has evolved over time too where it used to be just someone really good at writing SQL, or someone really good at writing queries in any language, but now it's becoming more of a interdisciplinary action where you need soft skills and you also need the hard skills, and that's why I think there's more females in the industry now than ever. Because you really need to have a really broad width of experiences that really wasn't required in the past. >> Greg Piateski, you have a comment? >> So there are not too many unicorns in nature or as data scientists, so I think organizations that want to hire data scientists have to look for teams, and there are a few unicorns like Hillary Mason or maybe Osama Faiat, but they generally tend to start companies and very hard to retain them as data scientists. What I see is in other evolution, automation, and you know, steps like IBM, Watson, the first platform is eventually a great advance for data scientists in the short term, but probably what's likely to happen in the longer term kind of more and more of those skills becoming subsumed by machine unique layer within the software. How long will it take, I don't know, but I have a feeling that the paradise for data scientists may not be very long lived. >> Greg, I have a follow up question to what I just heard you say. When a data scientist, let's say a unicorn data scientist starts a company, as you've phrased it, and the company's product is built on data science, do they give up becoming a data scientist in the process? It would seem that they become a data scientist of a higher order if they've built a product based on that knowledge. What is your thoughts on that? >> Well, I know a few people like that, so I think maybe they remain data scientists at heart, but they don't really have the time to do the analysis and they really have to focus more on strategic things. For example, today actually is the birthday of Google, 18 years ago, so Larry Page and Sergey Brin wrote a very influential paper back in the '90s About page rank. Have they remained data scientist, perhaps a very very small part, but that's not really what they do, so I think those unicorn data scientists could quickly evolve to have to look for really teams to capture those skills. >> Clearly they come to a point in their career where they build a company based on teams of data scientists and data engineers and so forth, which relates to the topic of team data science. What is the right division of roles and responsibilities for team data science? >> Before we go, Jennifer, did you have a comment on that? >> Yeah, so I guess I would say for me, when data science came out and there was, you know, the Venn Diagram that came out about all the skills you were supposed to have? I took a very different approach than all of the people who I knew who were going into data science. Most people started interviewing immediately, they were like this is great, I'm going to get a job. I went and learned how to develop applications, and learned computer science, 'cause I had never taken a computer science course in college, and made sure I trued up that one part where I didn't know these things or had the skills from school, so I went headfirst and just learned it, and then now I have actually a lot of technology patents as a result of that. So to answer Jim's question, actually. I started my company about five years ago. And originally started out as a consulting firm slash data science company, then it evolved, and one of the reasons I went back in the industry and now I'm at Nielsen is because you really can't do the same sort of data science work when you're actually doing product development. It's a very very different sort of world. You know, when you're developing a product you're developing a core feature or functionality that you're going to offer clients and customers, so I think definitely you really don't get to have that wide range of sort of looking at 8 million models and testing things out. That flexibility really isn't there as your product starts getting developed. >> Before we go into the team sport, the hard skills that you have, are you all good at math? Are you all computer science types? How about math? Are you all math? >> What were your GPAs? (laughs) >> David: Anybody not math oriented? Anybody not love math? You don't love math? >> I love math, I think it's required. >> David: So math yes, check. >> You dream in equations, right? You dream. >> Computer science? Do I have to have computer science skills? At least the basic knowledge? >> I don't know that you need to have formal classes in any of these things, but I think certainly as Jennifer was saying, if you have no skills in programming whatsoever and you have no interest in learning how to write SQL queries or RR Python, you're probably going to struggle a little bit. >> James: It would be a challenge. >> So I think yes, I have a Ph.D. in physics, I did a lot of math, it's my love language, but I think you don't necessarily need to have formal training in all of these things, but I think you need to have a curiosity and a love of learning, and so if you don't have that, you still want to learn and however you gain that knowledge I think, but yeah, if you have no technical interests whatsoever, and don't want to write a line of code, maybe data science is not the field for you. Even if you don't do it everyday. >> And statistics as well? You would put that in that same general category? How about data hacking? You got to love data hacking, is that fair? Eaves, you have a comment? >> Yeah, I think so, while we've been discussing that for me, the most important part is that you have a logical mind and you have the capability to absorb new things and the curiosity you need to dive into that. While I don't have an education in IT or whatever, I have a background in chemistry and those things that I learned there, I apply to information technology as well, and from a part that you say, okay, I'm a tech-savvy guy, I'm interested in the tech part of it, you need to speak that business language and if you can do that crossover and understand what other skill sets or parts of the roles are telling you I think the communication in that aspect is very important. >> I'd like throw just something really quickly, and I think there's an interesting thing that happens in IT, particularly around technology. We tend to forget that we've actually solved a lot of these problems in the past. If we look in history, if we look around the second World War, and Bletchley Park in the UK, where you had a very similar experience as humans that we're having currently around the whole issue of data science, so there was an interesting challenge with the enigma in the shark code, right? And there was a bunch of men put in a room and told, you're mathematicians and you come from universities, and you can crack codes, but they couldn't. And so what they ended up doing was running these ads, and putting challenges, they actually put, I think it was crossword puzzles in the newspaper, and this deluge of women came out of all kinds of different roles without math degrees, without science degrees, but could solve problems, and they were thrown at the challenge of cracking codes, and invariably, they did the heavy lifting. On a daily basis for converting messages from one format to another, so that this very small team at the end could actually get in play with the sexy piece of it. And I think we're going through a similar shift now with what we're refer to as data science in the technology and business world. Where the people who are doing the heavy lifting aren't necessarily what we'd think of as the traditional data scientists, and so, there have been some unicorns and we've championed them, and they're great. But I think the shift's going to be to accountants, actuaries, and statisticians who understand the business, and come from an MBA star background that can learn the relevant pieces of math and models that we need to to apply to get the data science outcome. I think we've already been here, we've solved this problem, we've just got to learn not to try and reinvent the wheel, 'cause the media hypes this whole thing of data science is exciting and new, but we've been here a couple times before, and there's a lot to be learned from that, my view. >> I think we had Joe next. >> Yeah, so I was going to say that, data science is a funny thing. To use the word science is kind of a misnomer, because there is definitely a level of art to it, and I like to use the analogy, when Michelangelo would look at a block of marble, everyone else looked at the block of marble to see a block of marble. He looks at a block of marble and he sees a finished sculpture, and then he figures out what tools do I need to actually make my vision? And I think data science is a lot like that. We hear a problem, we see the solution, and then we just need the right tools to do it, and I think part of consulting and data science in particular. It's not so much what we know out of the gate, but it's how quickly we learn. And I think everyone here, what makes them brilliant, is how quickly they could learn any tool that they need to see their vision get accomplished. >> David: Justin? >> Yeah, I think you make a really great point, for me, I'm a Marine Corp veteran, and the reason I mentioned that is 'cause I work with two veterans who are problem solvers. And I think that's what data scientists really are, in the long run are problem solvers, and you mentioned a great point that, yeah, I think just problem solving is the key. You don't have to be a subject matter expert, just be able to take the tools and intelligently use them. >> Now when you look at the whole notion of team data science, what is the right mix of roles, like role definitions within a high-quality or a high-preforming data science teams now IBM, with, of course, our announcement of project, data works and so forth. We're splitting the role division, in terms of data scientist versus data engineers versus application developer versus business analyst, is that the right breakdown of roles? Or what would the panelists recommend in terms of understanding what kind of roles make sense within, like I said, a high performing team that's looking for trying to develop applications that depend on data, machine learning, and so forth? Anybody want to? >> I'll tackle that. So the teams that I have created over the years made up these data science teams that I brought into customer sites have a combination of developer capabilities and some of them are IT developers, but some of them were developers of things other than applications. They designed buildings, they did other things with their technical expertise besides building technology. The other piece besides the developer is the analytics, and analytics can be taught as long as they understand how algorithms work and the code behind the analytics, in other words, how are we analyzing things, and from a data science perspective, we are leveraging technology to do the analyzing through the tool sets, so ultimately as long as they understand how tool sets work, then we can train them on the tools. Having that analytic background is an important piece. >> Craig, is it easier to, I'll go to you in a moment Joe, is it easier to cross train a data scientist to be an app developer, than to cross train an app developer to be a data scientist or does it not matter? >> Yes. (laughs) And not the other way around. It depends on the-- >> It's easier to cross train a data scientist to be an app developer than-- >> Yes. >> The other way around. Why is that? >> Developing code can be as difficult as the tool set one uses to develop code. Today's tool sets are very user friendly. where developing code is very difficult to teach a person to think along the lines of developing code when they don't have any idea of the aspects of code, of building something. >> I think it was Joe, or you next, or Jennifer, who was it? >> I would say that one of the reasons for that is data scientists will probably know if the answer's right after you process data, whereas data engineer might be able to manipulate the data but may not know if the answer's correct. So I think that is one of the reasons why having a data scientist learn the application development skills might be a easier time than the other way around. >> I think Miriam, had a comment? Sorry. >> I think that what we're advising our clients to do is to not think, before data science and before analytics became so required by companies to stay competitive, it was more of a waterfall, you have a data engineer build a solution, you know, then you throw it over the fence and the business analyst would have at it, where now, it must be agile, and you must have a scrum team where you have the data scientist and the data engineer and the project manager and the product owner and someone from the chief data office all at the table at the same time and all accomplishing the same goal. Because all of these skills are required, collectively in order to solve this problem, and it can't be done daisy chained anymore it has to be a collaboration. And that's why I think spark is so awesome, because you know, spark is a single interface that a data engineer can use, a data analyst can use, and a data scientist can use. And now with what we've learned today, having a data catalog on top so that the chief data office can actually manage it, I think is really going to take spark to the next level. >> James: Miriam? >> I wanted to comment on your question to Craig about is it harder to teach a data scientist to build an application or vice versa, and one of the things that we have worked on a lot in our data science team is incorporating a lot of best practices from software development, agile, scrum, that sort of thing, and I think particularly with a focus on deploying models that we don't just want to build an interesting data science model, we want to deploy it, and get some value. You need to really incorporate these processes from someone who might know how to build applications and that, I think for some data scientists can be a challenge, because one of the fun things about data science is you get to get into the data, and you get your hands dirty, and you build a model, and you get to try all these cool things, but then when the time comes for you to actually deploy something, you need deployment-grade code in order to make sure it can go into production at your client side and be useful for instance, so I think that there's an interesting challenge on both ends, but one of the things I've definitely noticed with some of our data scientists is it's very hard to get them to think in that mindset, which is why you have a team of people, because everyone has different skills and you can mitigate that. >> Dev-ops for data science? >> Yeah, exactly. We call it insight ops, but yeah, I hear what you're saying. Data science is becoming increasingly an operational function as opposed to strictly exploratory or developmental. Did some one else have a, Dez? >> One of the things I was going to mention, one of the things I like to do when someone gives me a new problem is take all the laptops and phones away. And we just end up in a room with a whiteboard. And developers find that challenging sometimes, so I had this one line where I said to them don't write the first line of code until you actually understand the problem you're trying to solve right? And I think where the data science focus has changed the game for organizations who are trying to get some systematic repeatable process that they can throw data at and just keep getting answers and things, no matter what the industry might be is that developers will come with a particular mindset on how they're going to codify something without necessarily getting the full spectrum and understanding the problem first place. What I'm finding is the people that come at data science tend to have more of a hacker ethic. They want to hack the problem, they want to understand the challenge, and they want to be able to get it down to plain English simple phrases, and then apply some algorithms and then build models, and then codify it, and so most of the time we sit in a room with whiteboard markers just trying to build a model in a graphical sense and make sure it's going to work and that it's going to flow, and once we can do that, we can codify it. I think when you come at it from the other angle from the developer ethic, and you're like I'm just going to codify this from day one, I'm going to write code. I'm going to hack this thing out and it's just going to run and compile. Often, you don't truly understand what he's trying to get to at the end point, and you can just spend days writing code and I think someone made the comment that sometimes you don't actually know whether the output is actually accurate in the first place. So I think there's a lot of value being provided from the data science practice. Over understanding the problem in plain english at a team level, so what am I trying to do from the business consulting point of view? What are the requirements? How do I build this model? How do I test the model? How do I run a sample set through it? Train the thing and then make sure what I'm going to codify actually makes sense in the first place, because otherwise, what are you trying to solve in the first place? >> Wasn't that Einstein who said if I had an hour to solve a problem, I'd spend 55 minutes understanding the problem and five minutes on the solution, right? It's exactly what you're talking about. >> Well I think, I will say, getting back to the question, the thing with building these teams, I think a lot of times people don't talk about is that engineers are actually very very important for data science projects and data science problems. For instance, if you were just trying to prototype something or just come up with a model, then data science teams are great, however, if you need to actually put that into production, that code that the data scientist has written may not be optimal, so as we scale out, it may be actually very inefficient. At that point, you kind of want an engineer to step in and actually optimize that code, so I think it depends on what you're building and that kind of dictates what kind of division you want among your teammates, but I do think that a lot of times, the engineering component is really undervalued out there. >> Jennifer, it seems that the data engineering function, data discovery and preparation and so forth is becoming automated to a greater degree, but if I'm listening to you, I don't hear that data engineering as a discipline is becoming extinct in terms of a role that people can be hired into. You're saying that there's a strong ongoing need for data engineers to optimize the entire pipeline to deliver the fruits of data science in production applications, is that correct? So they play that very much operational role as the backbone for... >> So I think a lot of times businesses will go to data scientist to build a better model to build a predictive model, but that model may not be something that you really want to implement out there when there's like a million users coming to your website, 'cause it may not be efficient, it may take a very long time, so I think in that sense, it is important to have good engineers, and your whole product may fail, you may build the best model it may have the best output, but if you can't actually implement it, then really what good is it? >> What about calibrating these models? How do you go about doing that and sort of testing that in the real world? Has that changed overtime? Or is it... >> So one of the things that I think can happen, and we found with one of our clients is when you build a model, you do it with the data that you have, and you try to use a very robust cross-validation process to make sure that it's robust and it's sturdy, but one thing that can sometimes happen is after you put your model into production, there can be external factors that, societal or whatever, things that have nothing to do with the data that you have or the quality of the data or the quality of the model, which can actually erode the model's performance over time. So as an example, we think about cell phone contracts right? Those have changed a lot over the years, so maybe five years ago, the type of data plan you had might not be the same that it is today, because a totally different type of plan is offered, so if you're building a model on that to say predict who's going to leave and go to a different cell phone carrier, the validity of your model overtime is going to completely degrade based on nothing that you have, that you put into the model or the data that was available, so I think you need to have this sort of model management and monitoring process to take this factors into account and then know when it's time to do a refresh. >> Cross-validation, even at one point in time, for example, there was an article in the New York Times recently that they gave the same data set to five different data scientists, this is survey data for the presidential election that's upcoming, and five different data scientists came to five different predictions. They were all high quality data scientists, the cross-validation showed a wide variation about who was on top, whether it was Hillary or whether it was Trump so that shows you that even at any point in time, cross-validation is essential to understand how robust the predictions might be. Does somebody else have a comment? Joe? >> I just want to say that this even drives home the fact that having the scrum team for each project and having the engineer and the data scientist, data engineer and data scientist working side by side because it is important that whatever we're building we assume will eventually go into production, and we used to have in the data warehousing world, you'd get the data out of the systems, out of your applications, you do analysis on your data, and the nirvana was maybe that data would go back to the system, but typically it didn't. Nowadays, the applications are dependent on the insight coming from the data science team. With the behavior of the application and the personalization and individual experience for a customer is highly dependent, so it has to be, you said is data science part of the dev-ops team, absolutely now, it has to be. >> Whose job is it to figure out the way in which the data is presented to the business? Where's the sort of presentation, the visualization plan, is that the data scientist role? Does that depend on whether or not you have that gene? Do you need a UI person on your team? Where does that fit? >> Wow, good question. >> Well usually that's the output, I mean, once you get to the point where you're visualizing the data, you've created an algorithm or some sort of code that produces that to be visualized, so at the end of the day that the customers can see what all the fuss is about from a data science perspective. But it's usually post the data science component. >> So do you run into situations where you can see it and it's blatantly obvious, but it doesn't necessarily translate to the business? >> Well there's an interesting challenge with data, and we throw the word data around a lot, and I've got this fun line I like throwing out there. If you torture data long enough, it will talk. So the challenge then is to figure out when to stop torturing it, right? And it's the same with models, and so I think in many other parts of organizations, we'll take something, if someone's doing a financial report on performance of the organization and they're doing it in a spreadsheet, they'll get two or three peers to review it, and validate that they've come up with a working model and the answer actually makes sense. And I think we're rushing so quickly at doing analysis on data that comes to us in various formats and high velocity that I think it's very important for us to actually stop and do peer reviews, of the models and the data and the output as well, because otherwise we start making decisions very quickly about things that may or may not be true. It's very easy to get the data to paint any picture you want, and you gave the example of the five different attempts at that thing, and I had this shoot out thing as well where I'll take in a team, I'll get two different people to do exactly the same thing in completely different rooms, and come back and challenge each other, and it's quite amazing to see the looks on their faces when they're like, oh, I didn't see that, and then go back and do it again until, and then just keep iterating until we get to the point where they both get the same outcome, in fact there's a really interesting anecdote about when the UNIX operation system was being written, and a couple of the authors went away and wrote the same program without realizing that each other were doing it, and when they came back, they actually had line for line, the same piece of C code, 'cause they'd actually gotten to a truth. A perfect version of that program, and I think we need to often look at, when we're building models and playing with data, if we can't come at it from different angles, and get the same answer, then maybe the answer isn't quite true yet, so there's a lot of risk in that. And it's the same with presentation, you know, you can paint any picture you want with the dashboard, but who's actually validating when the dashboard's painting the correct picture? >> James: Go ahead, please. >> There is a science actually, behind data visualization, you know if you're doing trending, it's a line graph, if you're doing comparative analysis, it's bar graph, if you're doing percentages, it's a pie chart, like there is a certain science to it, it's not that much of a mystery as the novice thinks there is, but what makes it challenging is that you also, just like any presentation, you have to consider your audience. And your audience, whenever we're delivering a solution, either insight, or just data in a grid, we really have to consider who is the consumer of this data, and actually cater the visual to that person or to that particular audience. And that is part of the art, and that is what makes a great data scientist. >> The consumer may in fact be the source of the data itself, like in a mobile app, so you're tuning their visualization and then their behavior is changing as a result, and then the data on their changed behavior comes back, so it can be a circular process. >> So Jim, at a recent conference, you were tweeting about the citizen data scientist, and you got emasculated by-- >> I spoke there too. >> Okay. >> TWI on that same topic, I got-- >> Kirk Borne I hear came after you. >> Kirk meant-- >> Called foul, flag on the play. >> Kirk meant well. I love Claudia Emahoff too, but yeah, it's a controversial topic. >> So I wonder what our panel thinks of that notion, citizen data scientist. >> Can I respond about citizen data scientists? >> David: Yeah, please. >> I think this term was introduced by Gartner analyst in 2015, and I think it's a very dangerous and misleading term. I think definitely we want to democratize the data and have access to more people, not just data scientists, but managers, BI analysts, but when there is already a term for such people, we can call the business analysts, because it implies some training, some understanding of the data. If you use the term citizen data scientist, it implies that without any training you take some data and then you find something there, and they think as Dev's mentioned, we've seen many examples, very easy to find completely spurious random correlations in data. So we don't want citizen dentists to treat our teeth or citizen pilots to fly planes, and if data's important, having citizen data scientists is equally dangerous, so I'm hoping that, I think actually Gartner did not use the term citizen data scientist in their 2016 hype course, so hopefully they will put this term to rest. >> So Gregory, you apparently are defining citizen to mean incompetent as opposed to simply self-starting. >> Well self-starting is very different, but that's not what I think what was the intention. I think what we see in terms of data democratization, there is a big trend over automation. There are many tools, for example there are many companies like Data Robot, probably IBM, has interesting machine learning capability towards automation, so I think I recently started a page on KDnuggets for automated data science solutions, and there are already 20 different forums that provide different levels of automation. So one can deliver in full automation maybe some expertise, but it's very dangerous to have part of an automated tool and at some point then ask citizen data scientists to try to take the wheels. >> I want to chime in on that. >> David: Yeah, pile on. >> I totally agree with all of that. I think the comment I just want to quickly put out there is that the space we're in is a very young, and rapidly changing world, and so what we haven't had yet is this time to stop and take a deep breath and actually define ourselves, so if you look at computer science in general, a lot of the traditional roles have sort of had 10 or 20 years of history, and so thorough the hiring process, and the development of those spaces, we've actually had time to breath and define what those jobs are, so we know what a systems programmer is, and we know what a database administrator is, but we haven't yet had a chance as a community to stop and breath and say, well what do we think these roles are, and so to fill that void, the media creates coinages, and I think this is the risk we've got now that the concept of a data scientist was just a term that was coined to fill a void, because no one quite knew what to call somebody who didn't come from a data science background if they were tinkering around data science, and I think that's something that we need to sort of sit up and pay attention to, because if we don't own that and drive it ourselves, then somebody else is going to fill the void and they'll create these very frustrating concepts like data scientist, which drives us all crazy. >> James: Miriam's next. >> So I wanted to comment, I agree with both of the previous comments, but in terms of a citizen data scientist, and I think whether or not you're citizen data scientist or an actual data scientist whatever that means, I think one of the most important things you can have is a sense of skepticism, right? Because you can get spurious correlations and it's like wow, my predictive model is so excellent, you know? And being aware of things like leaks from the future, right? This actually isn't predictive at all, it's a result of the thing I'm trying to predict, and so I think one thing I know that we try and do is if something really looks too good, we need to go back in and make sure, did we not look at the data correctly? Is something missing? Did we have a problem with the ETL? And so I think that a healthy sense of skepticism is important to make sure that you're not taking a spurious correlation and trying to derive some significant meaning from it. >> I think there's a Dilbert cartoon that I saw that described that very well. Joe, did you have a comment? >> I think that in order for citizen data scientists to really exist, I think we do need to have more maturity in the tools that they would use. My vision is that the BI tools of today are all going to be replaced with natural language processing and searching, you know, just be able to open up a search bar and say give me sales by region, and to take that one step into the future even further, it should actually say what are my sales going to be next year? And it should trigger a simple linear regression or be able to say which features of the televisions are actually affecting sales and do a clustering algorithm, you know I think hopefully that will be the future, but I don't see anything of that today, and I think in order to have a true citizen data scientist, you would need to have that, and that is pretty sophisticated stuff. >> I think for me, the idea of citizen data scientist I can relate to that, for instance, when I was in graduate school, I started doing some research on FDA data. It was an open source data set about 4.2 million data points. Technically when I graduated, the paper was still not published, and so in some sense, you could think of me as a citizen data scientist, right? I wasn't getting funding, I wasn't doing it for school, but I was still continuing my research, so I'd like to hope that with all the new data sources out there that there might be scientists or people who are maybe kept out of a field people who wanted to be in STEM and for whatever life circumstance couldn't be in it. That they might be encouraged to actually go and look into the data and maybe build better models or validate information that's out there. >> So Justin, I'm sorry you had one comment? >> It seems data science was termed before academia adopted formalized training for data science. But yeah, you can make, like Dez said, you can make data work for whatever problem you're trying to solve, whatever answer you see, you want data to work around it, you can make it happen. And I kind of consider that like in project management, like data creep, so you're so hyper focused on a solution you're trying to find the answer that you create an answer that works for that solution, but it may not be the correct answer, and I think the crossover discussion works well for that case. >> So but the term comes up 'cause there's a frustration I guess, right? That data science skills are not plentiful, and it's potentially a bottleneck in an organization. Supposedly 80% of your time is spent on cleaning data, is that right? Is that fair? So there's a problem. How much of that can be automated and when? >> I'll have a shot at that. So I think there's a shift that's going to come about where we're going to move from centralized data sets to data at the edge of the network, and this is something that's happening very quickly now where we can't just hold everything back to a central spot. When the internet of things actually wakes up. Things like the Boeing Dreamliner 787, that things got 6,000 sensors in it, produces half a terabyte of data per flight. There are 87,400 flights per day in domestic airspace in the U.S. That's 43.5 petabytes of raw data, now that's about three years worth of disk manufacturing in total, right? We're never going to copy that across one place, we can't process, so I think the challenge we've got ahead of us is looking at how we're going to move the intelligence and the analytics to the edge of the network and pre-cook the data in different tiers, so have a look at the raw material we get, and boil it down to a slightly smaller data set, bring a meta data version of that back, and eventually get to the point where we've only got the very minimum data set and data points we need to make key decisions. Without that, we're already at the point where we have too much data, and we can't munch it fast enough, and we can't spin off enough tin even if we witch the cloud on, and that's just this never ending deluge of noise, right? And you've got that signal versus noise problem so then we're now seeing a shift where people looking at how do we move the intelligence back to the edge of network which we actually solved some time ago in the securities space. You know, spam filtering, if an emails hits Google on the west coast of the U.S. and they create a check some for that spam email, it immediately goes into a database, and nothing gets on the opposite side of the coast, because they already know it's spam. They recognize that email coming in, that's evil, stop it. So we've already fixed its insecurity with intrusion detection, we've fixed it in spam, so we now need to take that learning, and bring it into business analytics, if you like, and see where we're finding patterns and behavior, and brew that out to the edge of the network, so if I'm seeing a demand over here for tickets on a new sale of a show, I need to be able to see where else I'm going to see that demand and start responding to that before the demand comes about. I think that's a shift that we're going to see quickly, because we'll never keep up with the data munching challenge and the volume's just going to explode. >> David: We just have a couple minutes. >> That does sound like a great topic for a future Cube panel which is data science on the edge of the fog. >> I got a hundred questions around that. So we're wrapping up here. Just got a couple minutes. Final thoughts on this conversation or any other pieces that you want to punctuate. >> I think one thing that's been really interesting for me being on this panel is hearing all of my co-panelists talking about common themes and things that we are also experiencing which isn't a surprise, but it's interesting to hear about how ubiquitous some of the challenges are, and also at the announcement earlier today, some of the things that they're talking about and thinking about, we're also talking about and thinking about. So I think it's great to hear we're all in different countries and different places, but we're experiencing a lot of the same challenges, and I think that's been really interesting for me to hear about. >> David: Great, anybody else, final thoughts? >> To echo Dez's thoughts, it's about we're never going to catch up with the amount of data that's produced, so it's about transforming big data into smart data. >> I could just say that with the shift from normal data, small data, to big data, the answer is automate, automate, automate, and we've been talking about advanced algorithms and machine learning for the science for changing the business, but there also needs to be machine learning and advanced algorithms for the backroom where we're actually getting smarter about how we ingestate and how we fix data as it comes in. Because we can actually train the machines to understand data anomalies and what we want to do with them over time. And I think the further upstream we get of data correction, the less work there will be downstream. And I also think that the concept of being able to fix data at the source is gone, that's behind us. Right now the data that we're using to analyze to change the business, typically we have no control over. Like Dez said, they're coming from censors and machines and internet of things and if it's wrong, it's always going to be wrong, so we have to figure out how to do that in our laboratory. >> Eaves, final thoughts? >> I think it's a mind shift being a data scientist if you look back at the time why did you start developing or writing code? Because you like to code, whatever, just for the sake of building a nice algorithm or a piece of software, or whatever, and now I think with the spirit of a data scientist, you're looking at a problem and say this is where I want to go, so you have more the top down approach than the bottom up approach. And have the big picture and that is what you really need as a data scientist, just look across technologies, look across departments, look across everything, and then on top of that, try to apply as much skills as you have available, and that's kind of unicorn that they're trying to look for, because it's pretty hard to find people with that wide vision on everything that is happening within the company, so you need to be aware of technology, you need to be aware of how a business is run, and how it fits within a cultural environment, you have to work with people and all those things together to my belief to make it very difficult to find those good data scientists. >> Jim? Your final thought? >> My final thoughts is this is an awesome panel, and I'm so glad that you've come to New York, and I'm hoping that you all stay, of course, for the the IBM Data First launch event that will take place this evening about a block over at Hudson Mercantile, so that's pretty much it. Thank you, I really learned a lot. >> I want to second Jim's thanks, really, great panel. Awesome expertise, really appreciate you taking the time, and thanks to the folks at IBM for putting this together. >> And I'm big fans of most of you, all of you, on this session here, so it's great just to meet you in person, thank you. >> Okay, and I want to thank Jeff Frick for being a human curtain there with the sun setting here in New York City. Well thanks very much for watching, we are going to be across the street at the IBM announcement, we're going to be on the ground. We open up again tomorrow at 9:30 at Big Data NYC, Big Data Week, Strata plus the Hadoop World, thanks for watching everybody, that's a wrap from here. This is the Cube, we're out. (techno music)

Published Date : Sep 28 2016

SUMMARY :

Brought to you by headline sponsors, and this is a cube first, and we have some really but I want to hear them. and appreciate you organizing this. and the term data mining Eves, I of course know you from Twitter. and you can do that on a technical level, How many people have been on the Cube I always like to ask that question. and that was obviously Great, thank you Craig, and I'm also on the faculty and saw that snake swallow a basketball and with the big paradigm Great, thank you. and I came to data science, Great, thank you. and so what I think about data science Great, and last but not least, and the scale at which I'm going to go off script-- You guys have in on the front. and one of the CDOs, she said that 25% and I think certainly, that's and so I think this is a great opportunity and the first question talk about the theme now and does that data scientist, you know, and you can just advertise and from the clients I mean they need to have and it's been, the transition over time but I have a feeling that the paradise and the company's product and they really have to focus What is the right division and one of the reasons I You dream in equations, right? and you have no interest in learning but I think you need to and the curiosity you and there's a lot to be and I like to use the analogy, and the reason I mentioned that is that the right breakdown of roles? and the code behind the analytics, And not the other way around. Why is that? idea of the aspects of code, of the reasons for that I think Miriam, had a comment? and someone from the chief data office and one of the things that an operational function as opposed to and so most of the time and five minutes on the solution, right? that code that the data but if I'm listening to you, that in the real world? the data that you have or so that shows you that and the nirvana was maybe that the customers can see and a couple of the authors went away and actually cater the of the data itself, like in a mobile app, I love Claudia Emahoff too, of that notion, citizen data scientist. and have access to more people, to mean incompetent as opposed to and at some point then ask and the development of those spaces, and so I think one thing I think there's a and I think in order to have a true so I'd like to hope that with all the new and I think So but the term comes up and the analytics to of the fog. or any other pieces that you want to and also at the so it's about transforming big data and machine learning for the science and now I think with the and I'm hoping that you and thanks to the folks at IBM so it's great just to meet you in person, This is the Cube, we're out.

ENTITIES

Entity	Category	Confidence
Jennifer	PERSON	0.99+
Jennifer Shin	PERSON	0.99+
Miriam Fridell	PERSON	0.99+
Greg Piateski	PERSON	0.99+
Justin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
Jeff Frick	PERSON	0.99+
2015	DATE	0.99+
Joe Caserta	PERSON	0.99+
James Cubelis	PERSON	0.99+
James	PERSON	0.99+
Miriam	PERSON	0.99+
Jim	PERSON	0.99+
Joe	PERSON	0.99+
Claudia Emahoff	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Hillary	PERSON	0.99+
New York	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
Justin Sadeen	PERSON	0.99+
Greg	PERSON	0.99+
Dave	PERSON	0.99+
55 minutes	QUANTITY	0.99+
Trump	PERSON	0.99+
2016	DATE	0.99+
Craig	PERSON	0.99+
Dave Valante	PERSON	0.99+
George	PERSON	0.99+
Dez Blanchfield	PERSON	0.99+
UK	LOCATION	0.99+
Ford	ORGANIZATION	0.99+
Craig Brown	PERSON	0.99+
10	QUANTITY	0.99+
8 Path Solutions	ORGANIZATION	0.99+
CISCO	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
two	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Kirk	PERSON	0.99+
25%	QUANTITY	0.99+
Marine Corp	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
43.5 petabytes	QUANTITY	0.99+
Boston	LOCATION	0.99+
Data Robot	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
Einstein	PERSON	0.99+
New York City	LOCATION	0.99+
Nielsen	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Friday	DATE	0.99+
Ralph Timbal	PERSON	0.99+
U.S.	LOCATION	0.99+
6,000 sensors	QUANTITY	0.99+
UC Berkeley	ORGANIZATION	0.99+
Sergey Brin	PERSON	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Warehouse: