Conquering Big Data Part 1: Data as Capital
>> Narrator: From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here is your host, Dave Vellante. >> Hi, everybody. This is Dave Vellante. Welcome to a special presentation, Conquering Big Data. This is part one: Data as Capital, and this is sponsored by Oracle. With me is Paul Sonderegger, a big data strategist from Oracle. Paul, it's good to see you in theCUBE again. >> It's good to be here. >> Okay, so we were talking earlier. This whole thing for us at SiliconANGLE Media started around 2010 when we started to pay attention to the dupe trend, and data is the new source of competitive advantage, data is the new oil, and in six or seven short years, we've come quite a long way. Everybody says that they want to be data-driven. Where are we today from your perspective? >> I think the cover article of the Economist just a couple of weeks ago captured it pretty well where it said the data is the world's most valuable resource, and part of the evidence for that is that the top five most valuable listed firms or publicly listed firms worldwide are all data-heavy technology companies, so we're at the point now where the effect of accumulating data, stocks of data capital is obvious and using it is obvious but nonetheless, we are still at the beginning of the changes that the rise of data capital are going to bring. >> As I said, most executives would say they want their companies to be data-driven. Many actually say, "Oh yes, our company is data-driven," but when you start to peel the onion, do you agree that most companies aren't really as data-centric as they may claim to be? >> A lot of companies, they just struggle with the philosophy of what data is and what effect it has on the way they compete. Don't get me wrong. All executives understand that more data helps you make better decisions. That's evergreen. That's a good idea. But a lot of companies fail to appreciate that data. Contrary to popular wisdom, is not abundant. There's a lot of it but it consists of countless unique observations, and so really, the way that executives need to think about data is that it is scarce. Data really consists of observations of things that are going on in the world, and if you are not there when those activities happen, when these events take place, your opportunity to capture those observations is lost. It doesn't come back. >> Okay, so let's get into this. You've written about and talked about the three principles of data capital, so let's start there and go through them. Principle one is data comes from activity. Okay. I guess that sounds obvious but what does it mean? >> This is the issue that we were just talking about. This is the first principle of data capital, that data comes from activity and a lot of executives will say, "Yes, obviously. "We put in this big ERP application back in the '90s, "and it captured all of this data about our own processes, "so then we reported on it "so we can see what's going on." All of that is true but what a lot of executives miss is that they're in competition for data. So the data that ERP apps and CRM apps and all of these enterprise applications produce, those are all data from the company's own activities but what's happening now is the digitization and datafication of activities outside the company, activities that customers carry on. It could be in everyday consumer life, it could be in B2B environments as well, it could be the movement of trucks, the movement of inventory done through supply chains run by partners. Executives have to get the habit of looking out at the world and seeing the data that is not there yet, information coming from these activities that is lost. It's either captured on paper or it's not captured at all, and putting sensors and mobile apps into those activities before their rivals do because when an activity happens, if you are not part of it, your opportunity to capture its data is lost. It doesn't come back. >> So data, raw data is abundant but the data that is actually valuable to organizations you're saying is scarce and takes a lot of refinement to use the oil analogy. >> Think about it this way. Remember Sir Edmund Halley, the guy who predicted the comet? >> Dave: Right. >> Sir Edmund Halley predicted when you will die. This is actually one of his signal achievements a lot of people have forgotten about. Halley was the first one to work out mortality tables, what is expected, what is life expectancy. The reason that that could be valuable is that he showed that life insurance policies that the British government was offering were mispriced depending on how old you were and how much longer you expected to live. The data that he used to make those calculations was not his. It came from Breslau. It came from another city, and it came from a particular church, which had kept really rigorous records during that time. Before the priests of Breslau said, "Hey, you could use this data," Halley had no ability to make this prediction. He had no ability to identify the mispricing of life insurance policies. That data, those observations was a scarce resource concentrated in another city that he needed in order to figure all this out. We have exactly the same situation now. Exactly the same situation now where companies taking observations of activities that they conduct with their partners, activities that they conduct with their customers build up into these concentrations of observations that are unique, they're proprietary, and they are the necessary fuel for creating new digital products and services. >> And many of those observations come from data outside of the organization. Okay, let's look at the second principle. Data makes more data. What are you talking about here? Are you talking about metadata? Can you explain? >> Sure. Providing data to people so they can make better decisions is always a good thing. It has been a good thing for a long time. It will continue to be a good thing. But the real money is in algorithms. The real money is in using these stocks of data capital to feed algorithms for two reasons. One is that algorithms can take decisions beyond human scale either in a more situations per unit time or simply faster than human beings can. The second reason it's important is because algorithms produce data about their own performance, which can be fed back into the model to improve their future performance. This is true of dynamic pricing algorithms, which capture data about what change did this price switch have on conversion rates, for example. It applies in fraud detection. We have customers who are banks who look at how many legitimate transactions did our current fraud detection algorithm wrongly flagged because they get complaints about it, how many fraudulent transactions did our current algorithm actually missed because investigations get kicked off through other processes. Those observations about the performance of the algorithm go back into the model improving its future performance. This applies to algorithms for inventory detection and fleet movement. So the second principle is the data tends to make more data, and this virtuous cycle with algorithms creates a competitive advantage that is very, very hard to catch. >> And I'm hearing you have to act on that data and continue to iterate. It's not obviously a one-shot static deal. We kind of all know that but it's this constant improvement that's going to give you that competitive edge. >> That's really the key, and this is at the very heart of machine learning, so all the talk about AI and all the talk about machine learning, one of the tactics of machine learning algorithms is that they learn from their own behaviors and improve their behaviors over time, so really, this particular kind of competitive advantage is baked in to the practice of machine learning and AI. >> Okay, great. Now your third principle is that platforms tend to win. You've written that this is where the real money is, so what do you mean by platforms? Are you talking about platforms versus products? What do you mean? >> Here, we're talking about platforms not as technologists often think about it where there is a foundational technology and then you build on top. We're talking about platforms as economists see them, so through the eyes of an economist, a platform is an intermediary that serves a two-sided market, and usually it makes it easier, cheaper, faster for the two sides to do business with each other. So just to use a very familiar example, credit cards are a payment platform, and they serve a two-sided market. On one side, you have merchants. On the other side, you have consumers. And of course, we as consumers, we want to carry the card more merchants will take. Merchants want to take the card more consumers have in their pocket. And so growth on one side of the market tends to encourage growth on the other side of the market. They kind of ladder up like that, and that means that platform competition tends toward a winner-take-all outcome, and so we have seen this in, say, the competition for the desktop operating system. That was a platform competition. We see it in the competition for the mobile operating system but it's also something that you see in gaming platforms, for example. More game developers want to develop for the platforms where there are more gamers. Gamers want to have the platform where there are more games. The reason that this matters now is because the digitization and datafication of more daily activities brings platform competition to industries that have never see it before. So just to use a simple example, look at farming. You can now have a drone. It will go out and take pictures of a field, and the drone will do spectrographic analysis of the images, and it's looking for green, which is a proxy for the degree of chlorophyll in the plants. It uses that information to inform the fertilizer spreader about how to tailor the fertilizer to the plants, not to the field but to the individual plants. The tractor in the middle is in competition to be the platform for digital agricultural services, and that is not how makers of large agricultural equipment typically think about competition. >> Okay, so let's move on. If data is so important, it's the new source of competitive advantage, we're talking today about data as capital, but the accounting field doesn't look at data as the same way in which they do a financial asset. You don't see companies recognizing the value of data on their balance sheets yet at the same time, you said the top five firms worldwide in terms of market value are data-oriented. So I'm sure that's much greater than the capital assets that they have on their books. So what's going on there? Should the accounting world be coming into the 21st century? Should companies wait until they do? What are your thoughts on that? >> I won't presume to give the accounting industry any advice on what they ought to but I will say that regardless of how the accounting standards look at data. The most successful data-driven companies, they already recognize that data is a true asset despite the fact that they cannot put it on the balance sheet as an asset with a certain dollar value. These firms, they already recognize that data is not just a record of what happened, it is a raw material for creating new digital products and services. In that way, it is capital like capital equipment, like financial capital, like if you do not have this input, you cannot create the service that you have in mind. And so that's why these data-heavy companies are not satisfied with the stocks of data capital they've got. These platform businesses are constantly on the lookout for new activities they can go digitize and datafy, adjacent activities that are next to the ones that they have already captured in order to further build out this stock of data capital, in order to create more raw material for new products and services. I will presume to give corporations in general advice, and the advice is that you've got to get this idea that data is not just a record of what happened, it is a raw material for new digital products and services. Digital products and services are the competitive field for providing value to your customers. >> So don't wait for the accounting industry to catch up is really your advice there. >> Not at all. >> So you said digitize, datafy, and that's leads us what you've talked in the past about data trade, the monetization question, so let's talk about monetization. How should organizations think about monetizing data? Should they be selling data? Should they be thinking about it differently? Why should they be monetizing data? >> The first thing to remember is that data trade is a decades-old practice. Credit bureaus were one of the first kinds of companies to build an entire business on the trade of data, and so they're accumulating information about consumers and then providing them to banks so the banks can more easily, quickly, effectively make lending decisions, and that increases access to credit, which is a good thing overall. It's a very, very useful thing. But what's happening now is that the data trade is massively expanding, buying and selling of data about different kinds of aspects of consumer buying and shopping behavior, for example but we're also starting to see the buying and selling of data in the world of the Internet of Things. As you may know, Oracle has a very large data marketplace, the largest online marketplace, a data marketplace of consumer shopping and browsing behavior, so we have five billion consumer profiles, 400 million business profiles, $3 trillion in transactions. One of the things to note about this whole business is that the data in our marketplace is created by a whole set of other firms. Just to give you one example, there's 15,000 websites which are the sources for online browsing behavior, those websites have no idea what value that data will provide to the companies who use it. They don't know. Instead, they are originating this data, and they are selling it on for these secondary purposes, and those secondary purposes really are discovered by the companies who buy the data and use it, and that data then goes into targeting marketing campaigns. It goes into refining product launch plans. It goes into redesigning social media publishing calendars and activities. The reason all this matters is because data consists of observations. The value from those observations only happens when it gets used. There is this curious issue. Just like Edmund Halley needed data from Breslau in order to figure out life expectancy and figure out the proper pricing of these insurance policies, we have the same issue today where data originates in one set of activities but the firms that create it may not create the greatest value from it, and so we need these data marketplaces in order to grow the overall value created from this digitization and datafication. >> Paul, are there pitfalls that people should, I'm sure there are many but maybe a couple you could point to that people need to think about when they enter this data monetization journey? >> Sure. One of the ones that comes out right away is personally identifiable information and invasions of privacy. So one of the ways to deal with that is to anonymize these records, strip out all the personally identifiable information, and then the next step that you can take is to aggregate them. So on that first piece about stripping out personally identifiable information, there are obvious pieces like name, first name, last name, and social security number, taxpayer ID number but new regulations in Europe, the General Data Protection Regulation, the GDPR has expanded the notion of personally identifiable information to any piece of data that could be uniquely tied back to a specific individual, so for example, something like an IMEI number, that unique code for your phone as it connects to the cellular network, in some cases perhaps even IP address. So this notion of personally identifiable information is expanding, so that's one thing for companies to be aware of. This notion of aggregation is an interesting one because even the GDPR says that if you aggregate a whole bunch of records together, and reidentification of those individual records is no longer possible, the GDPR doesn't even apply to those data products, so one of the things companies should be thinking about is can they create data products that provide observations about a part of the world that other firms are interested in and yet at a high enough, at a large enough level of aggregation that the issues are around personally identifiable information are all resolved. >> And this becomes really important. GDPR goes in effect next May, next May 18. >> Next May. >> So things to think about. All right. Last question before we summarize this. Metrics, even though the accounting industry isn't counting data as an asset, are there new metrics that organizations are using or should be using to quantify the value of their data? >> There are. McKinsey writes about this occasionally. They have taken just a really simple, back of the envelope calculation for looking at revenue per employee for companies in a given industry, and then calling out the radical differences in revenue per employee for firms known to be highly data-centric versus others who perhaps are older or have been in the business longer or who have greater traditional capital assets, so something even that simple can be a useful tool but I suspect that we're going to need a new family of metrics. There has been talk for a while about data productivity, about measuring that. It's often been difficult to do but we've entered into a new world now where observations about how data gets used within a company, looking at the queries going against the data management infrastructure is now not only possible but cost-effective. I suspect that we're actually going to see a new metric of data productivity that is related to traditional measures of labor productivity and capital productivity, which economists have known about for a long time, but I think we'll see a way of measuring the work done, the value-creating work done by a company's digital data infrastructure which can then be related to what's their return on invested capital as well as what is their labor productivity. I think we'll start to see a new set of metrics like that. >> And it maybe is implicit in even the McKinsey example of revenue per employee, something as simple as that. Maybe if you could isolate that and identify the input of labor and capital, maybe you can get to that. >> And then if you could isolate the input of work done by queries acting on data, then yeah, you ought to be able to establish that relationship. >> Okay, good. Let's summarize. Before I do, I just want to remind people to think about some questions. We're going to have a Q&A session right after this in the chat area right below. Okay, so we kind of introduced the notion of data capital and talked about why it's important. You mentioned the top five firms worldwide in terms of value are data-oriented companies, and then we talked about your three principles around data capital. Why don't you summarize the three for us? >> Sure. Data comes from activity, so digitize and datafy activities outside your firms before your rivals do. Data tends to make more data, so feed the data you've got into algorithms so that they can create data about their own performance creating a virtuous cycle. And then the third is platforms tend to win, and here, companies really need an active imagination to look at their industries and their business models and imagine them, either imagine their own business model reinvented as a platform, an intermediary between two side of the market where the digitization and datafication helps them create a new kind of value, or imagine another firm like that that comes to attack them. >> Okay, and then we talked about the accounting industry, how it has not begun to recognize data as value, put in a balance sheet, et cetera. You chose not to suggest that they should or should not. Rather, you chose to focus on the companies, the organizations that they should not wait for the accounting industry to catch up, that they should really dive in and begin thinking about how to digitize, you call it datafy, and that led to a conversation on monetization, and then you talked about data markets as a critical emerging, re-emerging entity and dynamic that's occurring there. Maybe some comments? >> Sure. For decades now, we've had businesses with traditional business models working as data sellers. Again, credit bureaus are a good example, market research firms are another good one, LexisNexis, Bloomberg but I think what we're going to see is a rise in data marketplaces where you've got a new kind of business model. It's an exchange. And you've got data originators providing data into the marketplace for sale, and you've got buyers on the other side, probably mostly companies but there could be nonprofits, there could be governments as well actually, and those, those are actually really exciting because exchanges like that, increases in data trade help to spread the wealth of data capital to more parties. It makes it possible for companies who need data but have not datafied the activities that they just discovered they care about go and source that data. It also helps firms who have managed to create these data capital assets but they're not sure what to do with them themselves make them available to places where they can create value. >> Excellent. Then you talked about ways to avoid some of the pitfalls, particularly those associated with personal information and the upcoming GDPR, and then we wrapped with a conversation around metrics, some simple metrics have been posed like revenue per employee, and you noted a McKinsey study that those data-oriented companies have a higher revenue per employee but then you suggested that we're going to start peeling back those metrics and looking at the contribution of labor plus capital in terms of what you call, a new metric called data productivity, so we're going to follow that and hopefully talk to you down the road and learn more about that. Paul, thanks so much for spending some time with us. I really appreciate it. >> Thank you. >> You're welcome. Okay, now as I say, think about your questions. Go down below. Paul and I will be here for a Q&A in the chat below. Thanks for watching, everybody. We'll see you next time. (light music)
SUMMARY :
Narrator: From the SiliconANGLE Media office Paul, it's good to see you in theCUBE again. and data is the new source of competitive advantage, is that the top five most valuable listed firms aren't really as data-centric as they may claim to be? But a lot of companies fail to appreciate that data. of data capital, so let's start there and go through them. and datafication of activities outside the company, but the data that is actually valuable to organizations Remember Sir Edmund Halley, the guy who predicted the comet? that the British government was offering were mispriced Okay, let's look at the second principle. So the second principle is the data tends to make more data, and continue to iterate. and all the talk about machine learning, so what do you mean by platforms? and the drone will do spectrographic analysis but the accounting field doesn't look at data and the advice is that you've got to get this idea is really your advice there. and that's leads us what you've talked in the past One of the things to note about this whole business level of aggregation that the issues And this becomes really important. So things to think about. back of the envelope calculation and identify the input of labor and capital, And then if you could isolate the input of work done in the chat area right below. or imagine another firm like that that comes to attack them. for the accounting industry to catch up, but have not datafied the activities and hopefully talk to you down the road Paul and I will be here for a Q&A in the chat below.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul Sonderegger | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
LexisNexis | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Edmund Halley | PERSON | 0.99+ |
$3 trillion | QUANTITY | 0.99+ |
Bloomberg | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
General Data Protection Regulation | TITLE | 0.99+ |
15,000 websites | QUANTITY | 0.99+ |
21st century | DATE | 0.99+ |
Halley | PERSON | 0.99+ |
One | QUANTITY | 0.99+ |
two reasons | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
Breslau | LOCATION | 0.99+ |
second reason | QUANTITY | 0.99+ |
two side | QUANTITY | 0.99+ |
two sides | QUANTITY | 0.99+ |
first one | QUANTITY | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
one side | QUANTITY | 0.99+ |
McKinsey | ORGANIZATION | 0.99+ |
GDPR | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
next May | DATE | 0.99+ |
first piece | QUANTITY | 0.99+ |
one example | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
six | QUANTITY | 0.98+ |
Next May | DATE | 0.98+ |
two-sided | QUANTITY | 0.98+ |
five firms | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
next May 18 | DATE | 0.98+ |
second principle | QUANTITY | 0.98+ |
third principle | QUANTITY | 0.98+ |
third | QUANTITY | 0.98+ |
400 million business profiles | QUANTITY | 0.98+ |
first principle | QUANTITY | 0.97+ |
three principles | QUANTITY | 0.97+ |
seven short years | QUANTITY | 0.96+ |
first thing | QUANTITY | 0.95+ |
one set | QUANTITY | 0.95+ |
one thing | QUANTITY | 0.92+ |
five billion consumer profiles | QUANTITY | 0.9+ |
90s | DATE | 0.9+ |
Sir | PERSON | 0.89+ |
couple of weeks ago | DATE | 0.88+ |
British government | ORGANIZATION | 0.85+ |
first kinds | QUANTITY | 0.85+ |
2010 | DATE | 0.84+ |
one-shot | QUANTITY | 0.84+ |
Oracl | ORGANIZATION | 0.8+ |
part one | QUANTITY | 0.74+ |
Economist | TITLE | 0.72+ |
five most valuable listed | QUANTITY | 0.71+ |
couple | QUANTITY | 0.68+ |
Part 1 | OTHER | 0.67+ |
McKinsey | PERSON | 0.67+ |
unique observations | QUANTITY | 0.62+ |
top | QUANTITY | 0.6+ |
Last | QUANTITY | 0.56+ |
decades | QUANTITY | 0.5+ |