Paul Sonderegger, Oracle - In The Studio - #Wikibon Boston
>> Announcer: From the Silicon Valley Media Office in Boston, Massachusetts, it's The Cube! Now, here's your host, Dave Vellante. >> Hi, everybody, welcome to a special Silicon Angle, The Cube on the ground. We're going to be talking about data capital with Paul Sonderegger, who is a big data strategist at Oracle, and he leads Oracle's data capital initiative. Paul, thanks for coming in, welcome to The Cube. >> Thank you, Dave, it's good to be here. >> So data capital, it's a topic that's gaining a lot of momentum, people talking about data value, they've talked about that for years, but what is data capital? >> Well, what we're saying with data capital, is that data fulfills the literal economic textbook definition of capital. Capital is a produced good, as opposed to a natural resource that you have to invest to create it, and it is then an necessary input into some other good or service. So when we define data capital, we say that data capital is the recorded information necessary to produce a good or service. Which is really boring, so let me give you an example. So imagine, picture a retailer. A retailer wants to go into a new market. To do that, the retailer has to expand its inventory, it has to extend its supply chain, it has to buy property, all of these kinds of investments. If it lacks the financial capital to make all of those investments, it can't go, cannot go into that new region. By the same token, if this retailer wants to create a new dynamic pricing algorithm, or a new recommendation engine, but lacks the data to feed those algorithms, it cannot create that ability. It cannot provide that service. Data is now a kind of capital. >> And for years, data was viewed by a lot of organizations, particularly general counsel, as a liability, and then the big data meme sort of took off and all of a sudden, data becomes an asset. Are organizations viewing data as an asset? >> A lot of organizations are starting to view data as an asset, even though they can't account for it that way. So by current accounting standards, companies are not allowed to treat the money that they spend on developing information, on capturing data, as an asset. However, what you see with these online consumer services, the ones that we know, Uber, Airbnb, Netflix, Linkedin, these companies absolutely treat data as an asset. They treat it, not just as a record of what happened, but as a raw material for creating new digital products and services. >> You too, you tweeted out an article recently on Uber, and Uber lost about, what is it? 1.2 billion- >> At least. >> Over six months, at least. >> At least. >> And then the article calculated how much it was actually paid, I mean basically, the conclusion was it paid 1.2 billion for data. >> Yeah. >> It was about $1.20 per data for ride record, which actually is not a bad deal, when you think about it that way. >> Well, that's the thing, it's not a bad deal when you consider that the big picture they have in view is the global market for personal transportation, which The Economist estimates is about 10 trillion dollars annually. Well, to go after a 10 trillion dollar market, if you can build up a unique stock of data capital, of a billion records at about a billion dollars per record, that's probably a pretty good deal, yeah. >> So, money obviously is fungible, it's currency. Data is not a currency, but digital data is fungible, right, I mean, you can use data in a lot of different ways, can't you? >> No, no, it's, and this actually is a really important point, it's a really important point. Data is actually not fungible. This is part of data's curious economic identity. So data, contrary to popular wisdom, data is not abundant. Data consists of countless unique observations, and one of the issues here is that, two pieces of data are usually not fungible. You can't replace one with the other because they carry different information. They carry different semantics. So just to make it very, very concrete, one of the things that we see now, a huge use of data capital is in fraud detection. And one of our customers handles the fraud detection for person-to-person mobile payments. So say you go away for a weekend with a friend, you come back, you want to split the tab, and you just want to make a payment directly to the other person. You do this through your phone. Those transactions, that account to account transfer, gets checked for possible fraudulent activity in the moment, as it happens, and there is a scoring algorithm that sniffs those transactions and gives it a score to indicate whether or not it may be fraudulent or if it's legitimate. Well, this company, they use the information they capture about whether their algorithm captured, caught, all of the fraudulent transactions or missed some, and whether that algorithm mistakenly flagged legitimate transactions as fraudulent. They capture all of those false positives and false negatives, feed it back into the system, and improve the performance of the algorithm for the next go around. Here's why this matters: the data created by that algorithm about its own performance, is a proprietary asset. It is unique. And no other data with substitute for it. And in that way, it becomes the basis for a sustainable competitive advantage. >> It's a great example. So the algorithm maybe is free, you can grab an algorithm, it's how you apply it that is proprietary, and now, okay, so we've established that the data is not fungible. But digital data doesn't necessarily have high asset specificity. Do you agree with that? In other words, I can use data in different ways, if it's digital. Yeah, absolutely, as a matter of fact, this is one of the other characteristics of data. It is non-rivalrous, is what economists would call it. And this means that two parties can use the same piece of data at the same time. Which is not the case with, say, a tractor. One guy on a tractor means that none of the other people can ride that tractor. Data's not like that. So data can be put to multiple uses simultaneously. And what becomes very interesting is that different uses of data can command different prices. There's actually a project going on right now where Harvard Law School is scanning and digitizing the entire collection of US case law. Now this is The Law, the law that we all as Americans are bound to. Yet, it is locked up in a way, in just, in all of these 43,000 books. Well, Harvard and a startup called Ravel Law, they are working on scanning and digitizing this data, which can then be searched, for free, all of these, you can search this entire body of case law, for free, so you can go in and search "privacy," for example, and see all of the judgements that mention privacy over the entire history of US case law. But, if you want, for example, to analyze how different judges, current sitting judges, rule on cases related to privacy, well, that's a service that you would pay for from Ravel. The exact same data, their algorithms are working on the same body of data. You can search it for free, but the analysis that you might want on that same data, you can only get for a fee. So different uses of data can command different prices. >> So, some excellent examples there. What are the implications of all of this for competitive strategies, what should companies, how should they apply this for competitive strategies? >> Well, when we think about competitive strategy with data capital, we think in terms of three principles of data capital, is what we call them. The first one is that data comes from activity. The second one is, data tends to make more data, and the third is that platforms tend to win. So these three principles, even if we just run through them in their turn, the first one, data comes from activity, this means that, in order to capture data, your company has to be part of the activity that produces it at the time that activity happens. And the competitive strategy implication here is that, if your company is not part of that activity when it happens, your chance to capture its data is lost, forever. And so this means that interactions with customers are critical targets to digitize and datify before the competition gets in there and shuts you out. The second principle, data tends to make more data, this is what we were talking about with algorithms. Analytics are great, they're very important, analytics provide information to people so that they can make better choices, but the real action is in algorithms. And here is where you're feeding your unique stock of data capital to algorithms, that not only act on that data, but create data about their own performance, that then improve their future performance, and that data capital flywheel becomes a competitive advantage that's very hard to catch. The third principle is that platforms tend to win. So platforms are common in information-intensive industries, we see them with a credit card, for example, we see them in financial services. A credit card is a payment platform between consumers on the one side, merchants on the other. A video game console is a platform between developers on the one side and gamers on the other. The thing about platform competition is that it tends to lead toward a winner-take-all outcome. Not always, but that's how it tends to go. And with the digitization and datification of more activities, platform competition is coming for industries that have never seen it before. >> So platform beats product, but it's winner-take-all, or number two maybe breaks even, right? >> That tends to be the way it goes. >> And number three loses money, okay. The first point you were making about, you've got to be there when the transaction occurs, you've got to show up. The second one's interesting, data tends to make more data. So, and you talked about algorithms and improving and fine-tuning in that feedback loop. I would imagine customers are challenged in terms of investments, do they spend money on acquiring more data, or do they spend money on improving their algorithms, and then the answer is got to do both, but budgets are limited. How are customers dealing with that challenge? >> Well, prioritization becomes really critical here. So not all data is created equal, but it's very difficult to know which data will be more valuable in the future. However, there are ways to improve your guess. And one of the best ways is to, go after data that your competition could get as well. So this is data that comes from activities with customers. Data from activities with suppliers, with partners. Those are all places where the competition could also try to digitize and datify those activities. So companies should really look outside their own four walls. But the next part, you know, figuring out, what do you do with it? This is where companies really need to take a page out of actual science as they approach data science, and science is all about argument. It's all about experimentation, testing, and keeping the hypotheses that are proven and discarding the ones that are disproven. What this means is that companies need a data lab environment, where they can cut the time, the cost, the effort, of forming and testing new hypotheses, getting new answers to new questions from their data. >> Okay, so, data has value, you've got to prioritize. How do you actually value the data so that I can prioritize and figure out what I should be focusing on in the lab and in production? >> Yeah, well, the basic answer is to go where the money is. So there are a couple things you can do with data. One is that you can improve your operational effectiveness, and so here, you should go look at your big cost areas, and focus your limited data science and managerial resources on trying to figure out, hey, can we become more efficient in whatever your big cost driver is? If it's shipping and logistics, if it's inventory management, if it's customer acquisition, if it's marketing and advertising, so that's one way to go. The next big thing that you can do with data is try to create a new product or service, a new ... create new value in a way that generates revenue. Here, there is a little caveat, which is that, companies may also want to consider creating new capabilities, maybe enriching the customer experience, making connections across multiple channels, that they can't actually charge for, not today. But, what they get, is data that no one else has. What they get from, let's say, making an investment into, bring together the in-store shopping experience with the, with the targeted emails, with, with communication through social feeds and through Twitter. Let's say that they invest in trying to tie that data together, to get a richer picture of their consumers' behavior. They might not be able to charge for that today. But, they may get insight into the way that shopping experience works that no one else can see, which then leads to a value-added service tomorrow. And I know it all sounds very speculative, but this is basically the nature of prototyping, of new product creation. >> Well, Uber's overused as an example, but this is a good application of Uber because they, essentially they pay for driver acquisition, which doesn't scale well. >> Yeah. >> But they get data. >> That's right. >> Because they're there at the point of the transaction and the activity and they've got data that nobody else has. >> Yeah, yeah, that's exactly right, and, you know, one of the ways to think about that is that, you're like a blackjack player, counting cards, and every time you play a hand as a company, you get data, information that may help you improve your future bets. This is why Vegas kicks out card counters, because it's an advantage for the future. But what we're talking about here, in digitizing activity with customers, every time you capture data about your interaction with those customers, you gain something simply for having carried out that activity. >> And so, thinking about, back to value for a minute, I mean I can envision some kind of value flow methodology where you assess the data intensity of the activity, and then assign some kind of, I don't know, score or a value to that activity, and then you can then look at that in relation to other activities. Is that a viable approach? >> It absolutely is. What companies need here is a new way to measure how much data they've got, how much they use, and then ascribe ... value created, you know, by that data. So the, how much they've got, you know, we can think about this, we always talk in terms of gigabytes and petabytes. But really we need some finer measurements. Data is an observation about something in the real world. And so, companies should start to think about measuring their data in terms of observations, in terms of attribute-value pairs. So even thinking about the record captured per activity, that's not enough. Companies should start thinking in terms of, how many columns are in that record? How many attributes are captured in these observations we make from that activity? The next issue, you know, how much do they use? Well, now, companies need to look at, how many of these observations are being touched, are being tapped by queries? Whether they're automatically generated, whether they are generated ad hoc by some data scientist, rooting around for some new understanding. So there's a set of questions there about, what percentage of these observations we possess are we actually using in queries of some kind? And then the third piece, how much value do we create from it? This is where ... This is a tough one, and it's really an estimation. It's, most likely what we need here is a new method for attributing the, profitabilty of a particular business unit to its use of that data. And I realize this is an estimation, but this is, there's a precedent for this in brand valuation, this is the coin of the realm when you're talking about putting a value to intangible assets. >> Well, as long as you're consistently applying that methodology across your portfolio, then, then at least you've got a relative measure and you can get back to prioritization, which is a key factor here. Is there an underlying technical architecture that has to be in place to take advantage of all this data capital momentum? >> There is, there is, companies are moving toward a hybrid cloud, big data architecture. >> What does that mean? >> It means that almost all the buzzwords are used, and we're going to need new ones. No, what it means is that, companies are going to find themselves in a situation where some of their computing activities, storage, processing, application execution, analytics, some of those activities will take place in a public cloud environment, some of it will take place within their own data centers, reconfigured to act as private clouds. And there are lots of potential reasons for this. There could be, companies have to deal with, not only existing regulations, which sometimes will prevent them from putting data up into a cloud, but they are also going to have to deal with regulatory arbitrage, maybe the regulations will change, or maybe they've got agreements with partners that are embodied in service level agreements that again require them to keep the data under their own observation. Even in that case, even in that case, the business still wants to consume all of those computing resources inside the data center as if they were services. The business doesn't care where they come from. And so this is one of the things that Oracle is providing, is an architecture for Oracle public cloud, and private cloud in the data center. It is the same on both sides of the wire. And in fact, can even be purchased in the same way so that even these, this Oracle cloud at customer, these machines, they are purchased on a subscription basis, just as public cloud capabilities are. And the reason this is good is because it allows IT leaders to provide to the business, computing capabilities, storage capabilities, you know, as needed, that can be consumed as services, regardless of where they come from. >> Yeah, so you've got the data locality issue, which is speed of light problems, you don't want to move data, then you've got compliance and governance, and you're saying, that hybrid approach allows you to have the cake and eat it, too. >> Yeah. >> Essentially. Are there other sort of benefits to taking this approach? >> Well, one of the, you know, the, one of the other pieces that we should talk about here is the big data aspect, and really what that means is, that, relational, Hadoop, NoSQL, graph database, repositories, they're all going to, they're all peers. They're all peers now, and, you know, this is Oracle's perspective, and as I'm sure you know, Oracle makes a relational database, it's very popular. Yeah, we've been doing it for a while, we're pretty good at it. Oracle's perspective on the future of data management is that Hadoop, NoSQL, graph, relational, all of these methods of data management will be peers and act together in a single high-performance enterprise system. And here's why. The reason is that, as our customers digitize and datify more of their activities, more of the world, they're creating data that's born in shapes and formats that don't necessarily lend themselves to a relational representation. It's more convenient to hold them in a Hadoop file system, and it's more convenient to hold them in just a great big key value store like NoSQL. And yet, they would like to use these data sources as if they were in the same system and not really have to worry about where they are. And we see this with, we see this with telecom providers who want to combine call data records with customer, warehouse, you know, customer data in the data warehouse. We see it with financial services companies who want to do a similar thing of combining research with portfolio investments records of what their high net worth customers have invested, with transaction data from the equities markets. So we see this polyglot future, the future of all of these different data management technologies, and their applications in the analytics built on top, working together, and existing in this hybrid cloud environment. >> So that's different than the historical Oracle, at least perceived messaging, where a lot of people believe that Oracle sees its Oracle database as a hammer, and every opportunity is a nail. You're telling a completely different story now. >> Well, it turns out there are many nails. So, you know, the hammer's still a good thing, but it turns out that, you know, there are also brads and tacks and Philips and flathead screwdrivers too. And this is just one of the consequences of our customers creating more kinds of data. Images, audio, JSON, XML, you know, spectrographic images from drones that are analyzing how much green is in a photograph because that indicates the chlorophyll content. We know, we know that our customers' ability to compete is based on how they create value from data capital. And so Oracle is in the business of making the things that make data more valuable, and we want to reinvent enterprise computing as a set of services that are easier to buy and use. >> And SQL is the lowest common denominator there, because of the skill sets that are available, is that right or? >> Well, it's funny, it's not necessarily a lowest common denominator, it turns out it's just incredibly useful. (laughs) Sequel is not just a technology standard, it's actually, in a manner of speaking, it's sort of a thinking standard. SQL is based on literally hundreds of years of hard thinking about how to think straight. You can trace SQL back to predicate logic, which was one of the critical ideas in the renaissance of mathematics and logic in the 1800s. So SQL embodies this way to think about, to think logically, to think about the attributes of things and their values and to reason about them in an automated fashion. And that is not going away, that in fact is going to become more powerful, more useful. >> Business processes are wired to that way of thinking, is what you're saying. >> That's exactly right. If you want to improve your operational effectiveness as a company, you're going to have to standardize some of your procedures and automate them, and that means you're going to standardize the information component of those activities. You can automate them better. And you're going to want to ask questions about, how's it going? And SQL is incredibly useful for doing that. >> So we went way over our time, this is very interesting discussion, but I have to ask you, what is it you do at Oracle? Do you work with customers to help them understand data strategies and catalyze new thinking? What's your day-to-day like? >> Yeah, I do a lot of this, a lot of telling the story, because we're in a huge time of change. Every 20 years or so, the IT industry goes through an architectural shift, and that changes, not just the technologies used to create value from data, but it changes the very value created from data itself. It changes what you can do with information. So, I spend a lot of time explaining these ideas of data capital, and sitting down with executives at our customers, helping them understand how to look out at the world and see the data that is not there yet, and what that means for the way that they compete, and then we talk through the competitive strategies that follow from that, and the technical architecture required to execute those strategies. >> Excellent. Well, Paul, thanks very much for sharing your knowledge with our Cube audience and coming into the Silicon Angle Media Studios here at Marlborough. >> Well, it's my pleasure. Thanks for having me. >> All right, you're welcome. Okay, thanks for watching, everybody. This is The Cube, Silicon Angle Media's special on the ground production. We'll see you next time. (peppy synth music)
SUMMARY :
Announcer: From the Silicon Valley Media Office The Cube on the ground. is that data fulfills the literal economic textbook and all of a sudden, data becomes an asset. A lot of organizations are starting to view data You too, you tweeted out an article paid, I mean basically, the conclusion was when you think about it that way. is the global market for personal transportation, right, I mean, you can use data and one of the issues here is that, that mention privacy over the entire history What are the implications of all of this and the third is that platforms tend to win. and fine-tuning in that feedback loop. But the next part, you know, figuring out, so that I can prioritize and figure out One is that you can improve your operational effectiveness, but this is a good application of Uber and the activity and they've got data that nobody else has. and every time you play a hand as a company, look at that in relation to other activities. Data is an observation about something in the real world. that has to be in place to take advantage There is, there is, companies are moving And the reason this is good is because it allows IT leaders that hybrid approach allows you Are there other sort of benefits to taking this approach? is the big data aspect, and really what that means is, So that's different than the historical Oracle, a photograph because that indicates the chlorophyll content. And that is not going away, that in fact is going to become to that way of thinking, is what you're saying. and that means you're going to standardize and that changes, not just the technologies used into the Silicon Angle Media Studios here at Marlborough. Well, it's my pleasure. special on the ground production.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Paul Sonderegger | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Harvard | ORGANIZATION | 0.99+ |
Paul | PERSON | 0.99+ |
1.2 billion | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Harvard Law School | ORGANIZATION | 0.99+ |
two parties | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Netflix | ORGANIZATION | 0.99+ |
Philips | ORGANIZATION | 0.99+ |
Airbnb | ORGANIZATION | 0.99+ |
10 trillion dollar | QUANTITY | 0.99+ |
Silicon Angle Media | ORGANIZATION | 0.99+ |
third piece | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
43,000 books | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Vegas | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Silicon Angle Media Studios | ORGANIZATION | 0.99+ |
two pieces | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
US | LOCATION | 0.99+ |
1800s | DATE | 0.99+ |
One guy | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
today | DATE | 0.99+ |
tomorrow | DATE | 0.99+ |
hundreds of years | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
Over six months | QUANTITY | 0.98+ |
first point | QUANTITY | 0.98+ |
Ravel | ORGANIZATION | 0.98+ |
both sides | QUANTITY | 0.98+ |
three principles | QUANTITY | 0.98+ |
The Cube | ORGANIZATION | 0.98+ |
first one | QUANTITY | 0.98+ |
third principle | QUANTITY | 0.98+ |
one way | QUANTITY | 0.98+ |
NoSQL | TITLE | 0.96+ |
about 10 trillion dollars | QUANTITY | 0.96+ |
ORGANIZATION | 0.96+ | |
second principle | QUANTITY | 0.96+ |
Marlborough | LOCATION | 0.96+ |
second one | QUANTITY | 0.95+ |
about a billion dollars | QUANTITY | 0.95+ |
one side | QUANTITY | 0.95+ |
Silicon Angle | ORGANIZATION | 0.94+ |
single | QUANTITY | 0.94+ |
Silicon Valley Media Office | ORGANIZATION | 0.93+ |
#Wikibon | ORGANIZATION | 0.89+ |
Americans | PERSON | 0.85+ |
a billion records | QUANTITY | 0.84+ |
about $1.20 per | QUANTITY | 0.83+ |
years | QUANTITY | 0.81+ |
two | QUANTITY | 0.81+ |