Paul Sonderegger, Oracle - In The Studio - #Wikibon Boston

>> Announcer: From the Silicon Valley Media Office in Boston, Massachusetts, it's The Cube! Now, here's your host, Dave Vellante. >> Hi, everybody, welcome to a special Silicon Angle, The Cube on the ground. We're going to be talking about data capital with Paul Sonderegger, who is a big data strategist at Oracle, and he leads Oracle's data capital initiative. Paul, thanks for coming in, welcome to The Cube. >> Thank you, Dave, it's good to be here. >> So data capital, it's a topic that's gaining a lot of momentum, people talking about data value, they've talked about that for years, but what is data capital? >> Well, what we're saying with data capital, is that data fulfills the literal economic textbook definition of capital. Capital is a produced good, as opposed to a natural resource that you have to invest to create it, and it is then an necessary input into some other good or service. So when we define data capital, we say that data capital is the recorded information necessary to produce a good or service. Which is really boring, so let me give you an example. So imagine, picture a retailer. A retailer wants to go into a new market. To do that, the retailer has to expand its inventory, it has to extend its supply chain, it has to buy property, all of these kinds of investments. If it lacks the financial capital to make all of those investments, it can't go, cannot go into that new region. By the same token, if this retailer wants to create a new dynamic pricing algorithm, or a new recommendation engine, but lacks the data to feed those algorithms, it cannot create that ability. It cannot provide that service. Data is now a kind of capital. >> And for years, data was viewed by a lot of organizations, particularly general counsel, as a liability, and then the big data meme sort of took off and all of a sudden, data becomes an asset. Are organizations viewing data as an asset? >> A lot of organizations are starting to view data as an asset, even though they can't account for it that way. So by current accounting standards, companies are not allowed to treat the money that they spend on developing information, on capturing data, as an asset. However, what you see with these online consumer services, the ones that we know, Uber, Airbnb, Netflix, Linkedin, these companies absolutely treat data as an asset. They treat it, not just as a record of what happened, but as a raw material for creating new digital products and services. >> You too, you tweeted out an article recently on Uber, and Uber lost about, what is it? 1.2 billion- >> At least. >> Over six months, at least. >> At least. >> And then the article calculated how much it was actually paid, I mean basically, the conclusion was it paid 1.2 billion for data. >> Yeah. >> It was about $1.20 per data for ride record, which actually is not a bad deal, when you think about it that way. >> Well, that's the thing, it's not a bad deal when you consider that the big picture they have in view is the global market for personal transportation, which The Economist estimates is about 10 trillion dollars annually. Well, to go after a 10 trillion dollar market, if you can build up a unique stock of data capital, of a billion records at about a billion dollars per record, that's probably a pretty good deal, yeah. >> So, money obviously is fungible, it's currency. Data is not a currency, but digital data is fungible, right, I mean, you can use data in a lot of different ways, can't you? >> No, no, it's, and this actually is a really important point, it's a really important point. Data is actually not fungible. This is part of data's curious economic identity. So data, contrary to popular wisdom, data is not abundant. Data consists of countless unique observations, and one of the issues here is that, two pieces of data are usually not fungible. You can't replace one with the other because they carry different information. They carry different semantics. So just to make it very, very concrete, one of the things that we see now, a huge use of data capital is in fraud detection. And one of our customers handles the fraud detection for person-to-person mobile payments. So say you go away for a weekend with a friend, you come back, you want to split the tab, and you just want to make a payment directly to the other person. You do this through your phone. Those transactions, that account to account transfer, gets checked for possible fraudulent activity in the moment, as it happens, and there is a scoring algorithm that sniffs those transactions and gives it a score to indicate whether or not it may be fraudulent or if it's legitimate. Well, this company, they use the information they capture about whether their algorithm captured, caught, all of the fraudulent transactions or missed some, and whether that algorithm mistakenly flagged legitimate transactions as fraudulent. They capture all of those false positives and false negatives, feed it back into the system, and improve the performance of the algorithm for the next go around. Here's why this matters: the data created by that algorithm about its own performance, is a proprietary asset. It is unique. And no other data with substitute for it. And in that way, it becomes the basis for a sustainable competitive advantage. >> It's a great example. So the algorithm maybe is free, you can grab an algorithm, it's how you apply it that is proprietary, and now, okay, so we've established that the data is not fungible. But digital data doesn't necessarily have high asset specificity. Do you agree with that? In other words, I can use data in different ways, if it's digital. Yeah, absolutely, as a matter of fact, this is one of the other characteristics of data. It is non-rivalrous, is what economists would call it. And this means that two parties can use the same piece of data at the same time. Which is not the case with, say, a tractor. One guy on a tractor means that none of the other people can ride that tractor. Data's not like that. So data can be put to multiple uses simultaneously. And what becomes very interesting is that different uses of data can command different prices. There's actually a project going on right now where Harvard Law School is scanning and digitizing the entire collection of US case law. Now this is The Law, the law that we all as Americans are bound to. Yet, it is locked up in a way, in just, in all of these 43,000 books. Well, Harvard and a startup called Ravel Law, they are working on scanning and digitizing this data, which can then be searched, for free, all of these, you can search this entire body of case law, for free, so you can go in and search "privacy," for example, and see all of the judgements that mention privacy over the entire history of US case law. But, if you want, for example, to analyze how different judges, current sitting judges, rule on cases related to privacy, well, that's a service that you would pay for from Ravel. The exact same data, their algorithms are working on the same body of data. You can search it for free, but the analysis that you might want on that same data, you can only get for a fee. So different uses of data can command different prices. >> So, some excellent examples there. What are the implications of all of this for competitive strategies, what should companies, how should they apply this for competitive strategies? >> Well, when we think about competitive strategy with data capital, we think in terms of three principles of data capital, is what we call them. The first one is that data comes from activity. The second one is, data tends to make more data, and the third is that platforms tend to win. So these three principles, even if we just run through them in their turn, the first one, data comes from activity, this means that, in order to capture data, your company has to be part of the activity that produces it at the time that activity happens. And the competitive strategy implication here is that, if your company is not part of that activity when it happens, your chance to capture its data is lost, forever. And so this means that interactions with customers are critical targets to digitize and datify before the competition gets in there and shuts you out. The second principle, data tends to make more data, this is what we were talking about with algorithms. Analytics are great, they're very important, analytics provide information to people so that they can make better choices, but the real action is in algorithms. And here is where you're feeding your unique stock of data capital to algorithms, that not only act on that data, but create data about their own performance, that then improve their future performance, and that data capital flywheel becomes a competitive advantage that's very hard to catch. The third principle is that platforms tend to win. So platforms are common in information-intensive industries, we see them with a credit card, for example, we see them in financial services. A credit card is a payment platform between consumers on the one side, merchants on the other. A video game console is a platform between developers on the one side and gamers on the other. The thing about platform competition is that it tends to lead toward a winner-take-all outcome. Not always, but that's how it tends to go. And with the digitization and datification of more activities, platform competition is coming for industries that have never seen it before. >> So platform beats product, but it's winner-take-all, or number two maybe breaks even, right? >> That tends to be the way it goes. >> And number three loses money, okay. The first point you were making about, you've got to be there when the transaction occurs, you've got to show up. The second one's interesting, data tends to make more data. So, and you talked about algorithms and improving and fine-tuning in that feedback loop. I would imagine customers are challenged in terms of investments, do they spend money on acquiring more data, or do they spend money on improving their algorithms, and then the answer is got to do both, but budgets are limited. How are customers dealing with that challenge? >> Well, prioritization becomes really critical here. So not all data is created equal, but it's very difficult to know which data will be more valuable in the future. However, there are ways to improve your guess. And one of the best ways is to, go after data that your competition could get as well. So this is data that comes from activities with customers. Data from activities with suppliers, with partners. Those are all places where the competition could also try to digitize and datify those activities. So companies should really look outside their own four walls. But the next part, you know, figuring out, what do you do with it? This is where companies really need to take a page out of actual science as they approach data science, and science is all about argument. It's all about experimentation, testing, and keeping the hypotheses that are proven and discarding the ones that are disproven. What this means is that companies need a data lab environment, where they can cut the time, the cost, the effort, of forming and testing new hypotheses, getting new answers to new questions from their data. >> Okay, so, data has value, you've got to prioritize. How do you actually value the data so that I can prioritize and figure out what I should be focusing on in the lab and in production? >> Yeah, well, the basic answer is to go where the money is. So there are a couple things you can do with data. One is that you can improve your operational effectiveness, and so here, you should go look at your big cost areas, and focus your limited data science and managerial resources on trying to figure out, hey, can we become more efficient in whatever your big cost driver is? If it's shipping and logistics, if it's inventory management, if it's customer acquisition, if it's marketing and advertising, so that's one way to go. The next big thing that you can do with data is try to create a new product or service, a new ... create new value in a way that generates revenue. Here, there is a little caveat, which is that, companies may also want to consider creating new capabilities, maybe enriching the customer experience, making connections across multiple channels, that they can't actually charge for, not today. But, what they get, is data that no one else has. What they get from, let's say, making an investment into, bring together the in-store shopping experience with the, with the targeted emails, with, with communication through social feeds and through Twitter. Let's say that they invest in trying to tie that data together, to get a richer picture of their consumers' behavior. They might not be able to charge for that today. But, they may get insight into the way that shopping experience works that no one else can see, which then leads to a value-added service tomorrow. And I know it all sounds very speculative, but this is basically the nature of prototyping, of new product creation. >> Well, Uber's overused as an example, but this is a good application of Uber because they, essentially they pay for driver acquisition, which doesn't scale well. >> Yeah. >> But they get data. >> That's right. >> Because they're there at the point of the transaction and the activity and they've got data that nobody else has. >> Yeah, yeah, that's exactly right, and, you know, one of the ways to think about that is that, you're like a blackjack player, counting cards, and every time you play a hand as a company, you get data, information that may help you improve your future bets. This is why Vegas kicks out card counters, because it's an advantage for the future. But what we're talking about here, in digitizing activity with customers, every time you capture data about your interaction with those customers, you gain something simply for having carried out that activity. >> And so, thinking about, back to value for a minute, I mean I can envision some kind of value flow methodology where you assess the data intensity of the activity, and then assign some kind of, I don't know, score or a value to that activity, and then you can then look at that in relation to other activities. Is that a viable approach? >> It absolutely is. What companies need here is a new way to measure how much data they've got, how much they use, and then ascribe ... value created, you know, by that data. So the, how much they've got, you know, we can think about this, we always talk in terms of gigabytes and petabytes. But really we need some finer measurements. Data is an observation about something in the real world. And so, companies should start to think about measuring their data in terms of observations, in terms of attribute-value pairs. So even thinking about the record captured per activity, that's not enough. Companies should start thinking in terms of, how many columns are in that record? How many attributes are captured in these observations we make from that activity? The next issue, you know, how much do they use? Well, now, companies need to look at, how many of these observations are being touched, are being tapped by queries? Whether they're automatically generated, whether they are generated ad hoc by some data scientist, rooting around for some new understanding. So there's a set of questions there about, what percentage of these observations we possess are we actually using in queries of some kind? And then the third piece, how much value do we create from it? This is where ... This is a tough one, and it's really an estimation. It's, most likely what we need here is a new method for attributing the, profitabilty of a particular business unit to its use of that data. And I realize this is an estimation, but this is, there's a precedent for this in brand valuation, this is the coin of the realm when you're talking about putting a value to intangible assets. >> Well, as long as you're consistently applying that methodology across your portfolio, then, then at least you've got a relative measure and you can get back to prioritization, which is a key factor here. Is there an underlying technical architecture that has to be in place to take advantage of all this data capital momentum? >> There is, there is, companies are moving toward a hybrid cloud, big data architecture. >> What does that mean? >> It means that almost all the buzzwords are used, and we're going to need new ones. No, what it means is that, companies are going to find themselves in a situation where some of their computing activities, storage, processing, application execution, analytics, some of those activities will take place in a public cloud environment, some of it will take place within their own data centers, reconfigured to act as private clouds. And there are lots of potential reasons for this. There could be, companies have to deal with, not only existing regulations, which sometimes will prevent them from putting data up into a cloud, but they are also going to have to deal with regulatory arbitrage, maybe the regulations will change, or maybe they've got agreements with partners that are embodied in service level agreements that again require them to keep the data under their own observation. Even in that case, even in that case, the business still wants to consume all of those computing resources inside the data center as if they were services. The business doesn't care where they come from. And so this is one of the things that Oracle is providing, is an architecture for Oracle public cloud, and private cloud in the data center. It is the same on both sides of the wire. And in fact, can even be purchased in the same way so that even these, this Oracle cloud at customer, these machines, they are purchased on a subscription basis, just as public cloud capabilities are. And the reason this is good is because it allows IT leaders to provide to the business, computing capabilities, storage capabilities, you know, as needed, that can be consumed as services, regardless of where they come from. >> Yeah, so you've got the data locality issue, which is speed of light problems, you don't want to move data, then you've got compliance and governance, and you're saying, that hybrid approach allows you to have the cake and eat it, too. >> Yeah. >> Essentially. Are there other sort of benefits to taking this approach? >> Well, one of the, you know, the, one of the other pieces that we should talk about here is the big data aspect, and really what that means is, that, relational, Hadoop, NoSQL, graph database, repositories, they're all going to, they're all peers. They're all peers now, and, you know, this is Oracle's perspective, and as I'm sure you know, Oracle makes a relational database, it's very popular. Yeah, we've been doing it for a while, we're pretty good at it. Oracle's perspective on the future of data management is that Hadoop, NoSQL, graph, relational, all of these methods of data management will be peers and act together in a single high-performance enterprise system. And here's why. The reason is that, as our customers digitize and datify more of their activities, more of the world, they're creating data that's born in shapes and formats that don't necessarily lend themselves to a relational representation. It's more convenient to hold them in a Hadoop file system, and it's more convenient to hold them in just a great big key value store like NoSQL. And yet, they would like to use these data sources as if they were in the same system and not really have to worry about where they are. And we see this with, we see this with telecom providers who want to combine call data records with customer, warehouse, you know, customer data in the data warehouse. We see it with financial services companies who want to do a similar thing of combining research with portfolio investments records of what their high net worth customers have invested, with transaction data from the equities markets. So we see this polyglot future, the future of all of these different data management technologies, and their applications in the analytics built on top, working together, and existing in this hybrid cloud environment. >> So that's different than the historical Oracle, at least perceived messaging, where a lot of people believe that Oracle sees its Oracle database as a hammer, and every opportunity is a nail. You're telling a completely different story now. >> Well, it turns out there are many nails. So, you know, the hammer's still a good thing, but it turns out that, you know, there are also brads and tacks and Philips and flathead screwdrivers too. And this is just one of the consequences of our customers creating more kinds of data. Images, audio, JSON, XML, you know, spectrographic images from drones that are analyzing how much green is in a photograph because that indicates the chlorophyll content. We know, we know that our customers' ability to compete is based on how they create value from data capital. And so Oracle is in the business of making the things that make data more valuable, and we want to reinvent enterprise computing as a set of services that are easier to buy and use. >> And SQL is the lowest common denominator there, because of the skill sets that are available, is that right or? >> Well, it's funny, it's not necessarily a lowest common denominator, it turns out it's just incredibly useful. (laughs) Sequel is not just a technology standard, it's actually, in a manner of speaking, it's sort of a thinking standard. SQL is based on literally hundreds of years of hard thinking about how to think straight. You can trace SQL back to predicate logic, which was one of the critical ideas in the renaissance of mathematics and logic in the 1800s. So SQL embodies this way to think about, to think logically, to think about the attributes of things and their values and to reason about them in an automated fashion. And that is not going away, that in fact is going to become more powerful, more useful. >> Business processes are wired to that way of thinking, is what you're saying. >> That's exactly right. If you want to improve your operational effectiveness as a company, you're going to have to standardize some of your procedures and automate them, and that means you're going to standardize the information component of those activities. You can automate them better. And you're going to want to ask questions about, how's it going? And SQL is incredibly useful for doing that. >> So we went way over our time, this is very interesting discussion, but I have to ask you, what is it you do at Oracle? Do you work with customers to help them understand data strategies and catalyze new thinking? What's your day-to-day like? >> Yeah, I do a lot of this, a lot of telling the story, because we're in a huge time of change. Every 20 years or so, the IT industry goes through an architectural shift, and that changes, not just the technologies used to create value from data, but it changes the very value created from data itself. It changes what you can do with information. So, I spend a lot of time explaining these ideas of data capital, and sitting down with executives at our customers, helping them understand how to look out at the world and see the data that is not there yet, and what that means for the way that they compete, and then we talk through the competitive strategies that follow from that, and the technical architecture required to execute those strategies. >> Excellent. Well, Paul, thanks very much for sharing your knowledge with our Cube audience and coming into the Silicon Angle Media Studios here at Marlborough. >> Well, it's my pleasure. Thanks for having me. >> All right, you're welcome. Okay, thanks for watching, everybody. This is The Cube, Silicon Angle Media's special on the ground production. We'll see you next time. (peppy synth music)

Published Date : Sep 21 2016

SUMMARY :

Announcer: From the Silicon Valley Media Office The Cube on the ground. is that data fulfills the literal economic textbook and all of a sudden, data becomes an asset. A lot of organizations are starting to view data You too, you tweeted out an article paid, I mean basically, the conclusion was when you think about it that way. is the global market for personal transportation, right, I mean, you can use data and one of the issues here is that, that mention privacy over the entire history What are the implications of all of this and the third is that platforms tend to win. and fine-tuning in that feedback loop. But the next part, you know, figuring out, so that I can prioritize and figure out One is that you can improve your operational effectiveness, but this is a good application of Uber and the activity and they've got data that nobody else has. and every time you play a hand as a company, look at that in relation to other activities. Data is an observation about something in the real world. that has to be in place to take advantage There is, there is, companies are moving And the reason this is good is because it allows IT leaders that hybrid approach allows you Are there other sort of benefits to taking this approach? is the big data aspect, and really what that means is, So that's different than the historical Oracle, a photograph because that indicates the chlorophyll content. And that is not going away, that in fact is going to become to that way of thinking, is what you're saying. and that means you're going to standardize and that changes, not just the technologies used into the Silicon Angle Media Studios here at Marlborough. Well, it's my pleasure. special on the ground production.

ENTITIES

Entity	Category	Confidence
Paul Sonderegger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Harvard	ORGANIZATION	0.99+
Paul	PERSON	0.99+
1.2 billion	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Harvard Law School	ORGANIZATION	0.99+
two parties	QUANTITY	0.99+
Linkedin	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
Philips	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
10 trillion dollar	QUANTITY	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
third piece	QUANTITY	0.99+
SQL	TITLE	0.99+
43,000 books	QUANTITY	0.99+
one	QUANTITY	0.99+
Vegas	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Silicon Angle Media Studios	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
third	QUANTITY	0.99+
US	LOCATION	0.99+
1800s	DATE	0.99+
One guy	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
today	DATE	0.99+
tomorrow	DATE	0.99+
hundreds of years	QUANTITY	0.99+
One	QUANTITY	0.98+
Over six months	QUANTITY	0.98+
first point	QUANTITY	0.98+
Ravel	ORGANIZATION	0.98+
both sides	QUANTITY	0.98+
three principles	QUANTITY	0.98+
The Cube	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
third principle	QUANTITY	0.98+
one way	QUANTITY	0.98+
NoSQL	TITLE	0.96+
about 10 trillion dollars	QUANTITY	0.96+
Twitter	ORGANIZATION	0.96+
second principle	QUANTITY	0.96+
Marlborough	LOCATION	0.96+
second one	QUANTITY	0.95+
about a billion dollars	QUANTITY	0.95+
one side	QUANTITY	0.95+
Silicon Angle	ORGANIZATION	0.94+
single	QUANTITY	0.94+
Silicon Valley Media Office	ORGANIZATION	0.93+
#Wikibon	ORGANIZATION	0.89+
Americans	PERSON	0.85+
a billion records	QUANTITY	0.84+
about $1.20 per	QUANTITY	0.83+
years	QUANTITY	0.81+
two	QUANTITY	0.81+

Conquering Big Data Part 1: Data as Capital

>> Narrator: From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here is your host, Dave Vellante. >> Hi, everybody. This is Dave Vellante. Welcome to a special presentation, Conquering Big Data. This is part one: Data as Capital, and this is sponsored by Oracle. With me is Paul Sonderegger, a big data strategist from Oracle. Paul, it's good to see you in theCUBE again. >> It's good to be here. >> Okay, so we were talking earlier. This whole thing for us at SiliconANGLE Media started around 2010 when we started to pay attention to the dupe trend, and data is the new source of competitive advantage, data is the new oil, and in six or seven short years, we've come quite a long way. Everybody says that they want to be data-driven. Where are we today from your perspective? >> I think the cover article of the Economist just a couple of weeks ago captured it pretty well where it said the data is the world's most valuable resource, and part of the evidence for that is that the top five most valuable listed firms or publicly listed firms worldwide are all data-heavy technology companies, so we're at the point now where the effect of accumulating data, stocks of data capital is obvious and using it is obvious but nonetheless, we are still at the beginning of the changes that the rise of data capital are going to bring. >> As I said, most executives would say they want their companies to be data-driven. Many actually say, "Oh yes, our company is data-driven," but when you start to peel the onion, do you agree that most companies aren't really as data-centric as they may claim to be? >> A lot of companies, they just struggle with the philosophy of what data is and what effect it has on the way they compete. Don't get me wrong. All executives understand that more data helps you make better decisions. That's evergreen. That's a good idea. But a lot of companies fail to appreciate that data. Contrary to popular wisdom, is not abundant. There's a lot of it but it consists of countless unique observations, and so really, the way that executives need to think about data is that it is scarce. Data really consists of observations of things that are going on in the world, and if you are not there when those activities happen, when these events take place, your opportunity to capture those observations is lost. It doesn't come back. >> Okay, so let's get into this. You've written about and talked about the three principles of data capital, so let's start there and go through them. Principle one is data comes from activity. Okay. I guess that sounds obvious but what does it mean? >> This is the issue that we were just talking about. This is the first principle of data capital, that data comes from activity and a lot of executives will say, "Yes, obviously. "We put in this big ERP application back in the '90s, "and it captured all of this data about our own processes, "so then we reported on it "so we can see what's going on." All of that is true but what a lot of executives miss is that they're in competition for data. So the data that ERP apps and CRM apps and all of these enterprise applications produce, those are all data from the company's own activities but what's happening now is the digitization and datafication of activities outside the company, activities that customers carry on. It could be in everyday consumer life, it could be in B2B environments as well, it could be the movement of trucks, the movement of inventory done through supply chains run by partners. Executives have to get the habit of looking out at the world and seeing the data that is not there yet, information coming from these activities that is lost. It's either captured on paper or it's not captured at all, and putting sensors and mobile apps into those activities before their rivals do because when an activity happens, if you are not part of it, your opportunity to capture its data is lost. It doesn't come back. >> So data, raw data is abundant but the data that is actually valuable to organizations you're saying is scarce and takes a lot of refinement to use the oil analogy. >> Think about it this way. Remember Sir Edmund Halley, the guy who predicted the comet? >> Dave: Right. >> Sir Edmund Halley predicted when you will die. This is actually one of his signal achievements a lot of people have forgotten about. Halley was the first one to work out mortality tables, what is expected, what is life expectancy. The reason that that could be valuable is that he showed that life insurance policies that the British government was offering were mispriced depending on how old you were and how much longer you expected to live. The data that he used to make those calculations was not his. It came from Breslau. It came from another city, and it came from a particular church, which had kept really rigorous records during that time. Before the priests of Breslau said, "Hey, you could use this data," Halley had no ability to make this prediction. He had no ability to identify the mispricing of life insurance policies. That data, those observations was a scarce resource concentrated in another city that he needed in order to figure all this out. We have exactly the same situation now. Exactly the same situation now where companies taking observations of activities that they conduct with their partners, activities that they conduct with their customers build up into these concentrations of observations that are unique, they're proprietary, and they are the necessary fuel for creating new digital products and services. >> And many of those observations come from data outside of the organization. Okay, let's look at the second principle. Data makes more data. What are you talking about here? Are you talking about metadata? Can you explain? >> Sure. Providing data to people so they can make better decisions is always a good thing. It has been a good thing for a long time. It will continue to be a good thing. But the real money is in algorithms. The real money is in using these stocks of data capital to feed algorithms for two reasons. One is that algorithms can take decisions beyond human scale either in a more situations per unit time or simply faster than human beings can. The second reason it's important is because algorithms produce data about their own performance, which can be fed back into the model to improve their future performance. This is true of dynamic pricing algorithms, which capture data about what change did this price switch have on conversion rates, for example. It applies in fraud detection. We have customers who are banks who look at how many legitimate transactions did our current fraud detection algorithm wrongly flagged because they get complaints about it, how many fraudulent transactions did our current algorithm actually missed because investigations get kicked off through other processes. Those observations about the performance of the algorithm go back into the model improving its future performance. This applies to algorithms for inventory detection and fleet movement. So the second principle is the data tends to make more data, and this virtuous cycle with algorithms creates a competitive advantage that is very, very hard to catch. >> And I'm hearing you have to act on that data and continue to iterate. It's not obviously a one-shot static deal. We kind of all know that but it's this constant improvement that's going to give you that competitive edge. >> That's really the key, and this is at the very heart of machine learning, so all the talk about AI and all the talk about machine learning, one of the tactics of machine learning algorithms is that they learn from their own behaviors and improve their behaviors over time, so really, this particular kind of competitive advantage is baked in to the practice of machine learning and AI. >> Okay, great. Now your third principle is that platforms tend to win. You've written that this is where the real money is, so what do you mean by platforms? Are you talking about platforms versus products? What do you mean? >> Here, we're talking about platforms not as technologists often think about it where there is a foundational technology and then you build on top. We're talking about platforms as economists see them, so through the eyes of an economist, a platform is an intermediary that serves a two-sided market, and usually it makes it easier, cheaper, faster for the two sides to do business with each other. So just to use a very familiar example, credit cards are a payment platform, and they serve a two-sided market. On one side, you have merchants. On the other side, you have consumers. And of course, we as consumers, we want to carry the card more merchants will take. Merchants want to take the card more consumers have in their pocket. And so growth on one side of the market tends to encourage growth on the other side of the market. They kind of ladder up like that, and that means that platform competition tends toward a winner-take-all outcome, and so we have seen this in, say, the competition for the desktop operating system. That was a platform competition. We see it in the competition for the mobile operating system but it's also something that you see in gaming platforms, for example. More game developers want to develop for the platforms where there are more gamers. Gamers want to have the platform where there are more games. The reason that this matters now is because the digitization and datafication of more daily activities brings platform competition to industries that have never see it before. So just to use a simple example, look at farming. You can now have a drone. It will go out and take pictures of a field, and the drone will do spectrographic analysis of the images, and it's looking for green, which is a proxy for the degree of chlorophyll in the plants. It uses that information to inform the fertilizer spreader about how to tailor the fertilizer to the plants, not to the field but to the individual plants. The tractor in the middle is in competition to be the platform for digital agricultural services, and that is not how makers of large agricultural equipment typically think about competition. >> Okay, so let's move on. If data is so important, it's the new source of competitive advantage, we're talking today about data as capital, but the accounting field doesn't look at data as the same way in which they do a financial asset. You don't see companies recognizing the value of data on their balance sheets yet at the same time, you said the top five firms worldwide in terms of market value are data-oriented. So I'm sure that's much greater than the capital assets that they have on their books. So what's going on there? Should the accounting world be coming into the 21st century? Should companies wait until they do? What are your thoughts on that? >> I won't presume to give the accounting industry any advice on what they ought to but I will say that regardless of how the accounting standards look at data. The most successful data-driven companies, they already recognize that data is a true asset despite the fact that they cannot put it on the balance sheet as an asset with a certain dollar value. These firms, they already recognize that data is not just a record of what happened, it is a raw material for creating new digital products and services. In that way, it is capital like capital equipment, like financial capital, like if you do not have this input, you cannot create the service that you have in mind. And so that's why these data-heavy companies are not satisfied with the stocks of data capital they've got. These platform businesses are constantly on the lookout for new activities they can go digitize and datafy, adjacent activities that are next to the ones that they have already captured in order to further build out this stock of data capital, in order to create more raw material for new products and services. I will presume to give corporations in general advice, and the advice is that you've got to get this idea that data is not just a record of what happened, it is a raw material for new digital products and services. Digital products and services are the competitive field for providing value to your customers. >> So don't wait for the accounting industry to catch up is really your advice there. >> Not at all. >> So you said digitize, datafy, and that's leads us what you've talked in the past about data trade, the monetization question, so let's talk about monetization. How should organizations think about monetizing data? Should they be selling data? Should they be thinking about it differently? Why should they be monetizing data? >> The first thing to remember is that data trade is a decades-old practice. Credit bureaus were one of the first kinds of companies to build an entire business on the trade of data, and so they're accumulating information about consumers and then providing them to banks so the banks can more easily, quickly, effectively make lending decisions, and that increases access to credit, which is a good thing overall. It's a very, very useful thing. But what's happening now is that the data trade is massively expanding, buying and selling of data about different kinds of aspects of consumer buying and shopping behavior, for example but we're also starting to see the buying and selling of data in the world of the Internet of Things. As you may know, Oracle has a very large data marketplace, the largest online marketplace, a data marketplace of consumer shopping and browsing behavior, so we have five billion consumer profiles, 400 million business profiles, $3 trillion in transactions. One of the things to note about this whole business is that the data in our marketplace is created by a whole set of other firms. Just to give you one example, there's 15,000 websites which are the sources for online browsing behavior, those websites have no idea what value that data will provide to the companies who use it. They don't know. Instead, they are originating this data, and they are selling it on for these secondary purposes, and those secondary purposes really are discovered by the companies who buy the data and use it, and that data then goes into targeting marketing campaigns. It goes into refining product launch plans. It goes into redesigning social media publishing calendars and activities. The reason all this matters is because data consists of observations. The value from those observations only happens when it gets used. There is this curious issue. Just like Edmund Halley needed data from Breslau in order to figure out life expectancy and figure out the proper pricing of these insurance policies, we have the same issue today where data originates in one set of activities but the firms that create it may not create the greatest value from it, and so we need these data marketplaces in order to grow the overall value created from this digitization and datafication. >> Paul, are there pitfalls that people should, I'm sure there are many but maybe a couple you could point to that people need to think about when they enter this data monetization journey? >> Sure. One of the ones that comes out right away is personally identifiable information and invasions of privacy. So one of the ways to deal with that is to anonymize these records, strip out all the personally identifiable information, and then the next step that you can take is to aggregate them. So on that first piece about stripping out personally identifiable information, there are obvious pieces like name, first name, last name, and social security number, taxpayer ID number but new regulations in Europe, the General Data Protection Regulation, the GDPR has expanded the notion of personally identifiable information to any piece of data that could be uniquely tied back to a specific individual, so for example, something like an IMEI number, that unique code for your phone as it connects to the cellular network, in some cases perhaps even IP address. So this notion of personally identifiable information is expanding, so that's one thing for companies to be aware of. This notion of aggregation is an interesting one because even the GDPR says that if you aggregate a whole bunch of records together, and reidentification of those individual records is no longer possible, the GDPR doesn't even apply to those data products, so one of the things companies should be thinking about is can they create data products that provide observations about a part of the world that other firms are interested in and yet at a high enough, at a large enough level of aggregation that the issues are around personally identifiable information are all resolved. >> And this becomes really important. GDPR goes in effect next May, next May 18. >> Next May. >> So things to think about. All right. Last question before we summarize this. Metrics, even though the accounting industry isn't counting data as an asset, are there new metrics that organizations are using or should be using to quantify the value of their data? >> There are. McKinsey writes about this occasionally. They have taken just a really simple, back of the envelope calculation for looking at revenue per employee for companies in a given industry, and then calling out the radical differences in revenue per employee for firms known to be highly data-centric versus others who perhaps are older or have been in the business longer or who have greater traditional capital assets, so something even that simple can be a useful tool but I suspect that we're going to need a new family of metrics. There has been talk for a while about data productivity, about measuring that. It's often been difficult to do but we've entered into a new world now where observations about how data gets used within a company, looking at the queries going against the data management infrastructure is now not only possible but cost-effective. I suspect that we're actually going to see a new metric of data productivity that is related to traditional measures of labor productivity and capital productivity, which economists have known about for a long time, but I think we'll see a way of measuring the work done, the value-creating work done by a company's digital data infrastructure which can then be related to what's their return on invested capital as well as what is their labor productivity. I think we'll start to see a new set of metrics like that. >> And it maybe is implicit in even the McKinsey example of revenue per employee, something as simple as that. Maybe if you could isolate that and identify the input of labor and capital, maybe you can get to that. >> And then if you could isolate the input of work done by queries acting on data, then yeah, you ought to be able to establish that relationship. >> Okay, good. Let's summarize. Before I do, I just want to remind people to think about some questions. We're going to have a Q&A session right after this in the chat area right below. Okay, so we kind of introduced the notion of data capital and talked about why it's important. You mentioned the top five firms worldwide in terms of value are data-oriented companies, and then we talked about your three principles around data capital. Why don't you summarize the three for us? >> Sure. Data comes from activity, so digitize and datafy activities outside your firms before your rivals do. Data tends to make more data, so feed the data you've got into algorithms so that they can create data about their own performance creating a virtuous cycle. And then the third is platforms tend to win, and here, companies really need an active imagination to look at their industries and their business models and imagine them, either imagine their own business model reinvented as a platform, an intermediary between two side of the market where the digitization and datafication helps them create a new kind of value, or imagine another firm like that that comes to attack them. >> Okay, and then we talked about the accounting industry, how it has not begun to recognize data as value, put in a balance sheet, et cetera. You chose not to suggest that they should or should not. Rather, you chose to focus on the companies, the organizations that they should not wait for the accounting industry to catch up, that they should really dive in and begin thinking about how to digitize, you call it datafy, and that led to a conversation on monetization, and then you talked about data markets as a critical emerging, re-emerging entity and dynamic that's occurring there. Maybe some comments? >> Sure. For decades now, we've had businesses with traditional business models working as data sellers. Again, credit bureaus are a good example, market research firms are another good one, LexisNexis, Bloomberg but I think what we're going to see is a rise in data marketplaces where you've got a new kind of business model. It's an exchange. And you've got data originators providing data into the marketplace for sale, and you've got buyers on the other side, probably mostly companies but there could be nonprofits, there could be governments as well actually, and those, those are actually really exciting because exchanges like that, increases in data trade help to spread the wealth of data capital to more parties. It makes it possible for companies who need data but have not datafied the activities that they just discovered they care about go and source that data. It also helps firms who have managed to create these data capital assets but they're not sure what to do with them themselves make them available to places where they can create value. >> Excellent. Then you talked about ways to avoid some of the pitfalls, particularly those associated with personal information and the upcoming GDPR, and then we wrapped with a conversation around metrics, some simple metrics have been posed like revenue per employee, and you noted a McKinsey study that those data-oriented companies have a higher revenue per employee but then you suggested that we're going to start peeling back those metrics and looking at the contribution of labor plus capital in terms of what you call, a new metric called data productivity, so we're going to follow that and hopefully talk to you down the road and learn more about that. Paul, thanks so much for spending some time with us. I really appreciate it. >> Thank you. >> You're welcome. Okay, now as I say, think about your questions. Go down below. Paul and I will be here for a Q&A in the chat below. Thanks for watching, everybody. We'll see you next time. (light music)

Published Date : Jun 2 2017

SUMMARY :

Narrator: From the SiliconANGLE Media office Paul, it's good to see you in theCUBE again. and data is the new source of competitive advantage, is that the top five most valuable listed firms aren't really as data-centric as they may claim to be? But a lot of companies fail to appreciate that data. of data capital, so let's start there and go through them. and datafication of activities outside the company, but the data that is actually valuable to organizations Remember Sir Edmund Halley, the guy who predicted the comet? that the British government was offering were mispriced Okay, let's look at the second principle. So the second principle is the data tends to make more data, and continue to iterate. and all the talk about machine learning, so what do you mean by platforms? and the drone will do spectrographic analysis but the accounting field doesn't look at data and the advice is that you've got to get this idea is really your advice there. and that's leads us what you've talked in the past One of the things to note about this whole business level of aggregation that the issues And this becomes really important. So things to think about. back of the envelope calculation and identify the input of labor and capital, And then if you could isolate the input of work done in the chat area right below. or imagine another firm like that that comes to attack them. for the accounting industry to catch up, but have not datafied the activities and hopefully talk to you down the road Paul and I will be here for a Q&A in the chat below.

ENTITIES

Entity	Category	Confidence
Paul Sonderegger	PERSON	0.99+
Paul	PERSON	0.99+
Dave Vellante	PERSON	0.99+
LexisNexis	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Edmund Halley	PERSON	0.99+
$3 trillion	QUANTITY	0.99+
Bloomberg	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
General Data Protection Regulation	TITLE	0.99+
15,000 websites	QUANTITY	0.99+
21st century	DATE	0.99+
Halley	PERSON	0.99+
One	QUANTITY	0.99+
two reasons	QUANTITY	0.99+
Dave	PERSON	0.99+
three	QUANTITY	0.99+
Breslau	LOCATION	0.99+
second reason	QUANTITY	0.99+
two side	QUANTITY	0.99+
two sides	QUANTITY	0.99+
first one	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one side	QUANTITY	0.99+
McKinsey	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
one	QUANTITY	0.99+
next May	DATE	0.99+
first piece	QUANTITY	0.99+
one example	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
six	QUANTITY	0.98+
Next May	DATE	0.98+
two-sided	QUANTITY	0.98+
five firms	QUANTITY	0.98+
today	DATE	0.98+
next May 18	DATE	0.98+
second principle	QUANTITY	0.98+
third principle	QUANTITY	0.98+
third	QUANTITY	0.98+
400 million business profiles	QUANTITY	0.98+
first principle	QUANTITY	0.97+
three principles	QUANTITY	0.97+
seven short years	QUANTITY	0.96+
first thing	QUANTITY	0.95+
one set	QUANTITY	0.95+
one thing	QUANTITY	0.92+
five billion consumer profiles	QUANTITY	0.9+
90s	DATE	0.9+
Sir	PERSON	0.89+
couple of weeks ago	DATE	0.88+
British government	ORGANIZATION	0.85+
first kinds	QUANTITY	0.85+
2010	DATE	0.84+
one-shot	QUANTITY	0.84+
Oracl	ORGANIZATION	0.8+
part one	QUANTITY	0.74+
Economist	TITLE	0.72+
five most valuable listed	QUANTITY	0.71+
couple	QUANTITY	0.68+
Part 1	OTHER	0.67+
McKinsey	PERSON	0.67+
unique observations	QUANTITY	0.62+
top	QUANTITY	0.6+
Last	QUANTITY	0.56+
decades	QUANTITY	0.5+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Paul Sonderegger: