Action Item | How to get more value out of your data, April 06, 2018

>> Hi I'm Peter Burris and welcome to another Wikibon Action Item. (electronic music) One of the most pressing strategic issues that businesses face is how to get more value out of their data, In our opinion that's the essence of a digital business transformation, is the using of data as an asset to improve your operations and take better advantage of market opportunities. The problem of data though, it's shareable, it's copyable, it's reusable. It's easy to create derivative value out of it. One of the biggest misnomers in the digital business world is the notion that data is the new fuel or the new oil. It's not, You can only use oil once. You can apply it to a purpose and not multiple purposes. Data you can apply to a lot of purposes, which is why you are able to get such interesting and increasing returns to that asset if you use it appropriately. Now, this becomes especially important for technology companies that are attempting to provide digital business technologies or services or other capabilities to their customers. In the consumer world, it started to reach a head. Questions about Facebook's reuse of a person's data through an ad based business model is now starting to lead people to question the degree to which the information asymmetry about what I'm giving and how they're using it is really worth the value that I get out of Facebook, is something that consumers and certainly governments are starting to talk about. it's also one of the bases for GDPR, which is going to start enforcing significant fines in the next month or so. In the B2B world that question is going to become especially acute. Why? Because as we try to add intelligence to the services and the products that we are utilizing within digital business, some of that requires a degree of, or some sort of relationship where some amount of data is passed to improve the models and machine learning and AI that are associated with that intelligence. Now, some companies have come out and said flat out they're not going to reuse a customer's data. IBM being a good example of that. When Ginni Rometty at IBM Think said, we're not going to reuse our customer's data. The question for the panel here is, is that going to be a part of a differentiating value proposition in the marketplace? Are we going to see circumstances in which companies keep products and services low by reusing a client's data and others sustaining their experience and sustaining a trust model say they won't. How is that going to play out in front of customers? So joining me today here in the studio, David Floyer. >> Hi there. >> And on the remote lines we have Neil Raden, Jim Kobielus, George Gilbert, and Ralph Finos. Hey, guys. >> All: Hey. >> All right so... Neil, let me start with you. You've been in the BI world as a user, as a consultant, for many, many number of years. Help us understand the relationship between data, assets, ownership, and strategy. >> Oh, God. Well, I don't know that I've been in the BI world. Anyway, as a consultant when we would do a project for a company, there were very clear lines of what belong to us and what belong to the client. They were paying us generously. They would allow us to come in to their company and do things that they needed and in return we treated them with respect. We wouldn't take their data. We wouldn't take their data models that we built, for example, and sell them to another company. That's just, as far as I'm concerned, that's just theft. So if I'm housing another company's data because I'm a cloud provider or some sort of application provider and I say well, you know, I can use this data too. To me the analogy is, I'm a warehousing company and independently I go into the warehouse and I say, you know, these guys aren't moving their inventory fast enough, I think I'll sell some of it. It just isn't right. >> I think it's a great point. Jim Kobielus. As we think about the role that data, machine learning play, training models, delivering new classes of services, we don't have a clean answer right now. So what's your thought on how this is likely to play out? >> I agree totally with Neil, first of all. If it's somebody else's data, you don't own it, therefore you can't sell and you can't monetize it, clearly. But where you have derivative assets, like machine learning models that are derivative from data, it's the same phenomena, it's the same issue at a higher level. You can build and train, or should, your machine learning models only from data that you have legal access to. You own or you have license and so forth. So as you're building these derivative assets, first and foremost, make sure as you're populating your data lake, to build and to do the training, that you have clear ownership over the data. So with GDPR and so forth, we have to be doubly triply vigilant to make sure that we're not using data that we don't have authorized ownership or access to. That is critically important. And so, I get kind of queasy when I hear some people say we use blockchain to make... the sharing of training data more distributed and federated or whatever. It's like wait a second. That doesn't solve the issues of ownership. That makes it even more problematic. If you get this massive blockchain of data coming from hither and yon, who owns what? How do you know? Do you dare build any models whatsoever from any of that data? That's a huge gray area that nobody's really addressed yet. >> Yeah well, it might mean that the blockchain has been poorly designed. I think that we talked in one of the previous Action Items about the role that blockchain design's going to play. But moving aside from the blockchain, so it seems as though we generally agree that data is owned by somebody typically and that the ownership of it, as Neil said, means that you can't intercept it at some point in time just because it is easily copied and then generate rents on it yourself. David Floyer, what does that mean from a ongoing systems design and development standpoint? How are we going to assure, as Jim said, not only that we know what data is ours but make sure that we have the right protection strategies, in a sense, in place to make sure that the data as it moves, we have some influence and control over it. >> Well, my starting point is that AI and AI infused products are fueled by data. You need that data, and Jim and Neil have already talked about that. In my opinion, the most effective way of improving a company's products, whatever the products are, from manufacturing, agriculture, financial services, is to use AI infused capabilities. That is likely to give you the best return on your money and businesses need to focus on their own products. That's the first place you are trying to protect from anybody coming in. Businesses own that data. They own the data about your products, in use by your customers, use that data to improve your products with AI infused function and use it before your competition eats your lunch. >> But let's build on that. So we're not saying that, for example, if you're a storage system supplier, since that's a relatively easy one. You've got very, very fast SSDs. Very, very fast NVMe over Fabric. Great technology. You can collect data about how that system is working but that doesn't give you rights to then also collect data about how the customer's using the system. >> There is a line which you need to make sure that you are covering. For example, Call Home on a product, any product, whose data is that? You need to make sure that you can use that data. You have some sort of agreement with the customer and that's a win-win because you're using that data to improve the product, prove things about it. But that's very, very clear that you should have a contractual relationship, as Jim and Neil were pointing out. You need the right to use that data. It can't come beyond the hand. But you must get it because if you don't get it, you won't be able to improve your products. >> Now, we're talking here about technology products which have often very concrete and obvious ownership and people who are specifically responsible for administering them. But when we start getting into the IoT domain or in other places where the device is infused with intelligence and it might be collecting data that's not directly associated with its purpose, just by virtue of the nature of sensors that are out there and the whole concept of digital twin introduces some tension in all this. George Gilbert. Take us through what's been happening with the overall suppliers of technology that are related to digital twin building, designing, etc. How are they securing or making promises committing to their customers that they will not cross this data boundary as they improve the quality of their twins? >> Well, as you quoted Ginni Rometty starting out, she's saying IBM, unlike its competitors, will not take advantage and leverage and monetize your data. But it's a little more subtle than that and digital twins are just sort of another manifestation of industry-specific sort of solution development that we've done for decades. The differences, as Jim and David have pointed out, that with machine learning, it's not so much code that's at the heart of these digital twins, it's the machine learning models and the data is what informs those models. Now... So you don't want all your secret sauce to go from Mercedes Benz to BMW but at the same time the economics of industry solutions means that you do want some of the repeatability that we've always gotten from industry solutions. You might have parts that are just company specific. And so in IBM's case, if you really parse what they're saying, they take what they learn in terms of the models from the data when they're working with BMW, and some of that is going to go into the industry specific models that they're going to use when they're working with Mercedes-Benz. If you really, really sort of peel the onion back and ask them, it's not the models, it's not the features of the models, but it's the coefficients that weight the features or variables in the models that they will keep segregated by customer. So in other words, you get some of the benefits, the economic benefits of reuse across customers with similar expertise but you don't actually get all of the secret sauce. >> Now, Ralph Finos-- >> And I agree with George here. I think that's an interesting topic. That's one of the important points. It's not kosher to monetize data that you don't own but conceivably if you can abstract from that data at some higher level, like George's describing, in terms of weights and coefficients and so forth, in a neural network that's derivative from the model. At some point in the abstraction, you should be able to monetize. I mean, it's like a paraphrase of some copyrighted material. A paraphrase, I'm not a lawyer, but you can, you can sell a paraphrase because it's your own original work that's based obviously on your reading of Moby Dick or whatever it is you're paraphrasing. >> Yeah, I think-- >> Jim I-- >> Peter: Go ahead, Neil. >> I agree with that but there's a line. There was a guy who worked at Capital One, this was about ten years ago, and he was their chief statistician or whatever. This was before we had words like machine learning and data science, it was called statistics and predictive analytics. He left the company and formed his own company and rewrote and recoded all of the algorithms he had for about 20 different predictive models. Formed a company and then licensed that stuff to Sybase and Teradata and whatnot. Now, the question I have is, did that cross the line or didn't it? These were algorithms actually developed inside Capital One. Did he have the right to use those, even if he wrote new computer code to make them run in databases? So it's more than just data, I think. It's a, well, it's a marketplace and I think that if you own something someone should not be able to take it and make money on it. But that doesn't mean you can't make an agreement with them to do that, and I think we're going to see a lot of that. IMSN gets data on prescription drugs and IRI and Nielsen gets scanner data and they pay for it and then they add value to it and they resell it. So I think that's really the issue is the use has to be understood by all the parties and the compensation has to be appropriate to the use. >> All right, so Ralph Finos. As a guy who looks at market models and handles a lot of the fundamentals for how we do our forecasting, look at this from the standpoint of how people are going to make money because clearly what we're talking about sounds like is the idea that any derivative use is embedded in algorithms. Seeing how those contracts get set up and I got a comment on that in a second, but the promise, a number of years ago, is that people are going to start selling data willy-nilly as a basis for their economic, a way of capturing value out of their economic activities or their business activities, hasn't matured yet generally. Do we see like this brand new data economy, where everybody's selling data to each other, being the way that this all plays out? >> Yeah, I'm having a hard time imagining this as a marketplace. I think we pointed at the manufacturing industries, technology industries, where some of this makes some sense. But I think from a practitioner perspective, you're looking for variables that are meaningful that are in a form you can actually use to make prediction. That you understand what the the history and the validity of that of that data is. And in a lot of cases there's a lot of garbage out there that you can't use. And the notion of paying for something that ultimately you look at and say, oh crap, it's not, this isn't really helping me, is going to be... maybe not an insurmountable barrier but it's going to create some obstacles in the market for adoption of this kind of thought process. We have to think about the utility of the data that feeds your models. >> Yeah, I think there's going to be a lot, like there's going to be a lot of legal questions raised and I recommend that people go look at a recent SiliconANGLE article written by Mike Wheatley and edited by our Editor In Chief Robert Hof about Microsoft letting technology partners own right to joint innovations. This is a quite a difference. This is quite a change for Microsoft who used to send you, if you sent an email with an idea to them, you'd often get an email back saying oh, just to let you know any correspondence we have here is the property of Microsoft. So there clearly is tension in the model about how we're going to utilize data and enable derivative use and how we're going to share, how we're going to appropriate value and share in the returns of that. I think this is going to be an absolutely central feature of business models, certainly in the digital business world for quite some time. The last thing I'll note and then I'll get to the Action Items, the last thing I'll mention here is that one of the biggest challenges in whenever we start talking about how we set up businesses and institutionalize the work that's done, is to look at the nature of the assets and the scope of the assets and in circumstances where the asset is used by two parties and it's generating a high degree of value, as measured by the transactions against those assets, there's always going to be a tendency for one party to try to take ownership of it. One party that's able to generate greater returns than the other, almost always makes move to try to take more control out of that asset and that's the basis of governance. And so everybody talks about data governance as though it's like something that you worry about with your backup and restore. Well, that's important but this notion of data governance increasingly is going to become a feature of strategy and boardroom conversations about what it really means to create data assets, sustain those data assets, get value out of them, and how we determine whether or not the right balance is being struck between the value that we're getting out of our data and third parties are getting out of our data, including customers. So with that, let's do a quick Action Item. David Floyer, I'm looking at you. Why don't we start here. David Floyer, Action Item. >> So my Action Item is for businesses, you should focus. Focus on data about your products in use by your customers, to improve, help improve the quality of your products and fuse AI into those products as one of the most efficient ways of adding value to it. And do that before your competition has a chance to come in and get data that will stop you from doing that. >> George Gilbert, Action Item. >> I guess mine would be that... in most cases you you want to embrace some amount of reuse because of the economics involved from your joint development with a solution provider. But if others are going to get some benefit from sort of reusing some of the intellectual property that informs models that you build, make sure you negotiate with your vendor that any upgrades to those models, whether they're digital twins or in other forms, that there's a canonical version that can come back and be an upgraded path for you as well. >> Jim Kobielus, Action Item. >> My Action Item is for businesses to regard your data as a product that you monetize yourself. Or if you are unable to monetize it yourself, if there is a partner, like a supplier or a customer who can monetize that data, then negotiate the terms of that monetization in your your relationship and be vigilant on that so you get a piece of that stream. Even if the bulk of the work is done by your partner. >> Neil Raden, Action Item. >> It's all based on transparency. Your data is your data. No one else can take it without your consent. That doesn't mean that you can't get involved in relationships where there's an agreement to do that. But the problem is most agreements, especially when you look at a business consumer, are so onerous that nobody reads them and nobody understands them. So the person providing the data has to have an unequivocal right to sell it to you and the person buying it has to really understand what the limits are that they can do with it. >> Ralph Finos, Action Item. You're muted Ralph. But it was brilliant, whatever it was. >> Well it was and I really can't say much more than that. (Peter laughs) But I think from a practitioner perspective and I understand that from a manufacturing perspective how the value could be there. But as a practitioner if you're fishing for data out there that someone has that might look like something you can use, chances are it's not. And you need to be real careful about spending money to get data that you're not really clear is going to help you. >> Great. All right, thanks very much team. So here's our Action Item conclusion for today. The whole concept of digital business is predicated in the idea of using data assets in a differential way to better serve your markets and improve your operations. It's your data. Increasingly, that is going to be the base for differentiation. And any weak undertaking to allow that data to get out has the potential that someone else can, through their data science and their capabilities, re-engineer much of what you regard as your differentiation. We've had conversations with leading data scientists who say that if someone were to sell customer data into a open marketplace, that it would take about four days for a great data scientist to re-engineer almost everything about your customer base. So as a consequence, we have to tread lightly here as we think about what it means to release data into the wild. Ultimately, the challenge there for any business will be: how do I establish the appropriate governance and protections, not just looking at the technology but rather looking at the overall notion of the data assets. If you don't understand how to monetize your data and nonetheless enter into a partnership with somebody else, by definition that partner is going to generate greater value out of your data than you are. There's significant information asymmetries here. So it's something that, every company must undertake an understanding of how to generate value out of their data. We don't think that there's going to be a general-purpose marketplace for sharing data in a lot of ways. This is going to be a heavily contracted arrangement but it doesn't mean that we should not take great steps or important steps right now to start doing a better job of instrumenting our products and services so that we can start collecting data about our products and services because the path forward is going to demonstrate that we're going to be able to improve, dramatically improve the quality of the goods and services we sell by reducing the assets specificities for our customers by making them more intelligent and more programmable. Finally, is this going to be a feature of a differentiated business relationship through trust? We're open to that. Personally, I'll speak for myself, I think it will. I think that there is going to be an important element, ultimately, of being able to demonstrate to a customer base, to a marketplace, that you take privacy, data ownership, and intellectual property control of data assets seriously and that you are very, very specific, very transparent, in how you're going to use those in derivative business transactions. All right. So once again, David Floyer, thank you very much here in the studio. On the phone: Neil Raden, Ralph Finos, Jim Kobielus, and George Gilbert. This has been another Wikibon Action Item. (electronic music)

Published Date : Apr 6 2018

SUMMARY :

and the products that we are utilizing And on the remote lines we have Neil Raden, You've been in the BI world as a user, as a consultant, and independently I go into the warehouse and I say, So what's your thought on how this is likely to play out? that you have clear ownership over the data. and that the ownership of it, as Neil said, That is likely to give you the best return on your money but that doesn't give you rights to then also You need the right to use that data. and the whole concept of digital twin and some of that is going to go into It's not kosher to monetize data that you don't own and the compensation has to be appropriate to the use. and handles a lot of the fundamentals and the validity of that of that data is. and that's the basis of governance. and get data that will stop you from doing that. because of the economics involved from your Even if the bulk of the work is done by your partner. and the person buying it has to really understand But it was brilliant, whatever it was. how the value could be there. and that you are very, very specific,

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
David Floyer	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Neil	PERSON	0.99+
George Gilbert	PERSON	0.99+
Peter Burris	PERSON	0.99+
George	PERSON	0.99+
Neil Raden	PERSON	0.99+
BMW	ORGANIZATION	0.99+
Mike Wheatley	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Ginni Rometty	PERSON	0.99+
IBM	ORGANIZATION	0.99+
IRI	ORGANIZATION	0.99+
Nielsen	ORGANIZATION	0.99+
April 06, 2018	DATE	0.99+
Peter	PERSON	0.99+
David	PERSON	0.99+
Ralph Finos	PERSON	0.99+
one party	QUANTITY	0.99+
two parties	QUANTITY	0.99+
Mercedes-Benz	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Mercedes Benz	ORGANIZATION	0.99+
One party	QUANTITY	0.99+
Robert Hof	PERSON	0.99+
Capital One	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Ralph	PERSON	0.99+
one	QUANTITY	0.99+
today	DATE	0.98+
One	QUANTITY	0.98+
IMSN	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
Teradata	ORGANIZATION	0.98+
next month	DATE	0.96+
Moby Dick	TITLE	0.95+
about 20 different predictive models	QUANTITY	0.95+
Sybase	ORGANIZATION	0.95+
decades	QUANTITY	0.93+
about ten years ago	DATE	0.88+
about four days	QUANTITY	0.86+
second	QUANTITY	0.83+
once	QUANTITY	0.82+
Wikibon	ORGANIZATION	0.8+
of years ago	DATE	0.77+
Action	ORGANIZATION	0.68+
SiliconANGLE	TITLE	0.66+
twins	QUANTITY	0.64+
Editor In Chief	PERSON	0.61+
Items	QUANTITY	0.58+
twin	QUANTITY	0.48+
Think	ORGANIZATION	0.46+

Wikibon Research Meeting | October 20, 2017

(electronic music) >> Hi, I'm Peter Burris and welcome once again to Wikibon's weekly research meeting from the CUBE studios in Palo Alto, California. This week we're going to build upon a conversation we had last week about the idea of different data shapes or data tiers. For those of you who watched last week's meeting, we discussed the idea that data across very complex distributed systems featuring significant amounts of work associated with the edge are going to fall into three classifications or tiers. At the primary tier, this is where the sensor data that's providing direct and specific experience about the things that the sensors are indicating, that data will then signal work or expectations or decisions to a secondary tier that aggregates it. So what is the sensor saying? And then the gateways will provide a modeling capacity, a decision making capacity, but also a signal to tertiary tiers that increasingly look across a system wide perspective on how the overall aggregate system's performing. So very, very local to the edge, gateway at the level of multiple edge devices inside a single business event, and then up to a system wide perspective on how all those business events aggregate and come together. Now what we want to do this week is we want to translate that into what it means for some of the new technologies, new analytics technologies that are going to provide much of the intelligence against each of this data. As you can imagine, the characteristics of the data is going to have an impact on the characteristics of the machine intelligence that we can expect to employ. So that's what we want to talk about this week. So Jim Kobielus, with that as a backdrop, why don't you start us off? What are we actually thinking about when we think about machine intelligence at the edge? >> Yeah, Peter, we at the edge, the edge of body, the device be in the primary tier that acquires fresh environmental data through its sensors, what happens at the edge? In the extreme model, we think about autonomous engines, let me just go there just very briefly, basically, it's a number of workloads that take place at the edge, the data workloads. The data is (mumbles) or ingested, it may be persisted locally, and that data then drives local inferences that might be using deep layer machine learning chipsets that are embedded in that device. It might also trigger various tools called actuations. Things, actions are taken at the edge. If it's the self-driving vehicle for example, an action may be to steer the car or brake the car or turn on the air conditioning or whatever it might be. And then last but not least, there might be some degree of adaptive learning or training of those algorithms at the edge, or the training might be handled more often up at the second or tertiary tier. The tertiary tier at the cloud level, which has visibility usually across a broad range of edge devices and is ingesting data that is originated from all of the many different edge devices and is the focus of modeling, of training, of the whole DevOps process, where teams of skilled professionals make sure that the models are trained to a point where they are highly effective for their intended purposes. Then those models are sent right back down to the secondary and the primary tiers, where act out inferences are made, you know, 24 by seven, based on those latest and greatest models. That's the broad framework in terms of the workloads that take place in this fabric. >> So Neil, let me talk to you, because we want to make sure that we don't confuse the nature of the data and the nature of the devices, which may be driven by economics or physics or even preferences inside of business. There is a distinction that we have to always keep track of, that some of this may go up to the Cloud, some of it may stay local. What are some of the elements that are going to indicate what types of actual physical architectures or physical infrastructures will be built out as we start to find ways to take advantage of this very worthwhile and valuable data that's going to be created across all of these different tiers? >> Well first of all, we have a long way to go with sensor technology and capability. So when we talk about sensors, we really have to define classes of sensors and what they do. However, I really believe that we'll begin to think in a way that approximates human intelligence, about the same time as airplanes start to flap their wings. (Peter laughs) So, I think, let's have our expectations and our models reflect that, so that they're useful, instead of being, you know hypothetical. >> That's a great point Neil. In fact, I'm glad you said that, because I strongly agree with you. But having said that, the sensors are going to go a long ways, when we... but there is a distinction that needs to be made. I mean, it may be that that some point in time, a lot of data moves up to a gateway, or a lot of data moves up to the Cloud. It may be that a given application demands it. It may be that the data that's being generated at the edge may have a lot of other useful applications we haven't anticipated. So we don't want to presume that there's going to be some hard wiring of infrastructure today. We do want to presume that we better understand the characteristics of the data that's being created and operated on, today. Does that make sense to you? >> Well, there's a lot of data, and we're just going to have to find a way to not touch it or handle it any more times than we have to. We can't be shifting it around from place to place, because it's too much. But I think the market is going to define a lot of that for us. >> So George, if we think about the natural place where the data may reside, the processes may reside, give us a sense of what kinds of machine learning technologies or machine intelligence technologies are likely to be especially attractive at the edge, dealing with this primary information. Okay, I think that's actually a softball which is, we've talked before about bandwidth and latency limitations, meaning we're going to have to do automated decisioning at the edge, because it's got to be fast, low latency. We can't move all the data up to the Cloud for bandwidth limitations. But, by contrast, so that's data intensive and it's fast, but up in the cloud, where we enhance our models, either continual learning of the existing ones or rethinking them entirely, that's actually augmented decisions, and augmented means it's augmenting a human in the process, where, most likely, a human is adding additional contextual data, performing simulations, and optimizing the model for different outcomes or enriching the model. >> It may in fact be a crucial element or crucial feature of the training by in fact, validating that the action taken by the system was appropriate. >> Yes, and I would add to that, actually, that you might, you used an analogy, people are going from two extremes where they say, some people say, "Okay, so all the analytics has to be done in the cloud," Wikibon and David Floyer, and Jim Kovielus have been pioneering the notion that we have to do a lot more at the client. But you might look back at client server computing where the client was focused on presentation, the server was focused on data integrity. Similarly, here, the edge or client is going to be focused on fast inferencing and the server is going to do many of the things that were associated with a DBMS and data integrity in terms of reproducibility, of decisions in the model for auditing, security, versioning, orchestration in terms of distributing updated models. So we're going to see the roles of the edge and the cloud rhyme with what we saw in server. Neither one goes away, they augment each other. >> So, Jim Kovielus, one of the key issues there is going to be the gateway, and the role that the gateway plays, and specifically here, we talked about the nature of again, the machine intelligence that's going to be operating more on the gateway. What are some of the characteristics of the work that's going to be performed at the gateway that kind of has oversight of groupings or collections of sensor and actuator devices? >> Right, good question. So the perfect example that everybody's familiar with now about a gateway in this environment, a smart home hub. A smart home hub, just for the sake of discussion, has visibility across two or more edge devices. It could be a smart speaker, could be the HVAC system is sensor equipped and so forth, what it does, the pool it performs, a smart hub of any sort, is that it acquires data from the edge devices, the edge devices might report all of their data directly to the hub, or the sensor devices might also do inferences and then pass on the results of the inferences it has given to the hub, regardless. What the hub does is A, it aggregates the data across those different edge devices over which it has this ability and control, B, it may perform it's own inferences based on models that look out across an entire home in terms of patterns of activity. Then it might take the hub, various actions autonomous by itself, without consulting an end user or anything else. It might take action in terms of beef up the security, adjust the HVAC, it adjusts the light in the house or whatever it might be, based on all that information streaming in real time. Possibly, its algorithms will allow you to determine what of that data shows an anomalous condition that deviates from historical patterns. Those kinds of determinations, whether it's anomalous or a usual pattern, are often taken at the hub level, 'cause it's maintaining sort of a homeostatic environment, as it were, within its own domain, and that hub might also communicate up the stream, to a tertiary tier that has oversight, let's say, of a smart city environment, where everybody in that city or whatever, might have a connection into some broader system that say, regulates utility usage across the entire region to avoid brownouts and that kind of thing. So that gives you an idea of what the role of a hub is in this kind of environment. It's really a controller. >> So, Neil, if we think about some of the issues that people really have to consider as they start to architect what some of these systems are going to look like, we need to factor both what is the data doing now, but also ensure that we build into the entire system enough of a buffer so that we can anticipate and take advantage of future ways of using that data. Where do we draw that fine line between we only need this data for this purpose now and geez, let's ensure that we keep our options open so that we can use as much data as we want at some point in time in the future? >> Well, that's a hard question, Peter, but I would say that if it turns out that this detailed data coming from sensors, that the historical aspect of it isn't really that important. If the things you might be using that data for are more current, then you probably don't need to capture all that. On the other hand, there have been many, many occasions historically, where data has been used other than its original purpose. My favorite example was scanners in grocery stores, where it was meant to improve the checkout process, not have to put price stickers on everything, manage inventory and so forth. It turned out that some smart people like IRI and some other companies said, "We'll buy that data from you, "and we're going to sell it to advertisers," and all sorts of things. We don't know the value of this data yet, it's too new. So I would err on the side of being conservative and capturing and saving as much as I could. >> So what we need to do is, we need to marry or we need to do an optimization of some form about how much is it going to cost to transmit the data versus what kind of future value or what kinds of options of future value might there be on that data. That is, as you said, a hard problem, but we can start to conceive of an approach to characterizing that ratio, can't we? >> I hope so. I know that, personally, when I download 10 gigabytes of data, I pay for 10 gigabytes of data, and it doesn't matter if it came from a mile away or 10,000 miles away. So there has to be adjustments for that. There's also ways of compressing data because this sensor data I'm sure is going to be fairly sparse, can be compressed, is redundant, you can do things like RLL encoding, which takes all the zeroes out and that sort of thing. There are going to be a million practices that we'll figure out. >> So as we imagine ourselves in this schemata of edge, hub, tertiary or primary, secondary and tertiary data and we start to envision the role that data's going to play and how we conduct or how we build these architectures and these infrastructures, it does raise an interesting question, and that is, from an economic standpoint, what do we anticipate is going to be the classes of devices that are going to exploit this data? David Foyer who's not here today, hope you're feeling better David, has argued pretty forcibly, that over the next few years we'll see a lot of advances made in microprocessor technology. Jim, I know you've been thinking about this a fair amount. What types of function >> Jim: Right. >> might we actually see being embedded in some of these chips that software developers are going to utilize to actually build some of these more complex and interesting systems? >> Yeah, first of all, one of the trends we're seeing in the chipset market for deep learning, just to be there for a moment, is that deep learning chipsets traditionally, when I say traditionally, the last several years the market has been dominated by GP's graphic processing unit. Invidia of course, is the primary provider of those. Of course, Invidia has been along around for a long time as a gaming solution provider. Now, what's happening with GPU technology, in fact, the latest generation of Invidia's architecture shows where it's going. The thing that is more deep learning optimized capabilities at the chipset level. They're called tensor processing, and I don't want to bore you with all the technical details, but the whole notion of-- >> Peter: Oh, no, Jim, do bore us. What is it? (Jim laughs) >> Basically deep learning is based on doing high speed, fast matrix map. So fundamentally, tensor cores do high velocity fast matrix math, and the industry as a whole is moving toward embedding more tensor cores directly into the chipset, higher density of tensor core. Invidia in its latest generation of chip has done that. They haven't totally taken out the gaming oriented GPU capabilities, but there are competitors and they have a growing list, more than a dozen competitors on the chipset side now. We're all going down a road of embedding far more technical processing units into every chip. Google is well known for something called GPU tensor processing units, their chip architecture. But they're one of many vendors that are going down that road. The bottom line is the chipset itself is becoming authenticated and being optimized for the core function that CPU and really GPU technology and even ASIX and FPGAs were not traditionally geared to do, which is just deep learning at a high speed, many cores, to do things like face recognition and video and voice recognition freakishly fast, and really, that's where the market is going in terms of enabling underlying chipset technology. What we're seeing is that, what's likely to happen in the chipsets of the year 2020 and beyond, they'll be predominantly tensor core processing units, But they'll be systemed on a chip that, and I'm just talking about future, not saying it's here now, systems on a chip that include some, a CPU, to managing real time OS, like a real time Linux or what not, and with highly dense tensor core processing unit. And in this capability, these'll be low power chips, and low cost commodity chips that'll be embedded in everything. Everything from your smart phone, to your smart appliances in your home, to your smart cars and so forth. Everything will have these commodity chips. 'Cause suddenly every edge device, everything will be an edge device, and will be able to provide more than augmentation, automation, all these things we've been talking about, in ways that are not necessarily autonomous, but can operate with a great degree of autonomy to help us human beings to live our lives in an environmentally contextual way at all points in time. >> Alright, Jim, let me cut you off there, because you said something interesting, a lot more autonomy. George, what does it mean, that we're going to dramatically expand the number of devices that we're using, but not expand the number of people that are going to be in place to manage those devices. When we think about applying software technologies to these different classes of data, we also have to figure out how we're going to manage those devices and that data. What are we looking at from an overall IT operations management approach to handling a geometrically greater increase in the number of devices and the amount of data that's being generated? (Jim starts speaking) >> Peter: Hold on, hold on, George? >> There's a couple dimensions to that. Let me start at the modeling side, which is, we need to make data scientists more productive or we need to push out to a greater, we need to democratize the ability to build models, and again, going back to the notion of simulation, there's this merging of machine learning and simulation where machine learning tells you correlations in factors that influence an answer. Whereas, the simulation actually lets you play around with those correlations, to find the causations, and by merging them, we make it much, much more productive to find the models that are both accurate and to optimize them for different outcomes. >> So that's the modeling issue. >> Yes. >> When we think about after we, which is great. Now as we think about some of the data management elements, what are we looking at from a data management standpoint? >> Well, and this is something Jim has talked about, but, you know we had DevOps for joining the, essentially merging the skills of the developers with the operations folks, so that there's joint responsibility of keeping stuff live. >> Well what about things like digital twins, automated processes, we've talked a little it about breadth versus depth, ITOM, What do you think? Are we going to build out, are all these devices going to reveal themselves, or are we going to have to put in place a capacity for handling all of these things in some consistent, coherent way? >> Oh, okay, in terms of managing. >> In terms of managing. >> Okay. So, digital twins were interesting because they pioneered or they made well known a concept called essentially, a symmetric network, or a knowledge graph, which is just a way of abstracting what is a whole bunch of data models and machine learning models that represents the structure and behavior of a device. In IIoT terminology, it was like an industrial device, like a jet engine. But that same construct, the knowledge graph and the digital twin, can be used to describe the application software and the infrastructure, both middleware and hardware, that makes up this increasingly sophisticated network of learning and inferencing applications. And the reason this is important, it sounds arcane, the reason it's important is we're building now vastly more sophisticated applications over great distances, and the only way we can manage them is to make the administrators far more productive. The state of the art today is, alerts on the performance of the applications, and alerts on the, essentially, the resource intensity of the infrastructure. By combining that type of monitoring with the digital twin, we can get a, essentially much higher fidelity reading on when something goes wrong. We don't get false positives. In other words, you don't have, if something goes wrong, it's like the fairy tale of the pea underneath the mattress, all the way up, 10 mattresses, you know it's uncomfortable. Here, it'll pinpoint exactly what gets wrong, rather than cascading all sorts of alerts, and that is the key to productivity in managing this new infrastructure. >> Alright guys, so let's go into the action item around here. What I'd like to do now is ask each of you for the action item that you think users are going to have to apply or employ to actually get some value, and start down this path of utilizing machine intelligence across these different tiers of data to build more complex, manageable application infrastructures. So, Jim, I'd like to start with you, what's your action item? >> My action item is related what George just said, modeled centrally, deployed in a decentralized fashion, machine learning, and use digital twin technology to do your modeling against device classes, in a more coherent way. There's not one model that won't fit all of the devices. Use digital twin technology to structure the modeling process to be able to tune a model to each class of device out there. >> George, action item. >> Okay, recognize that there's a big difference between edge and cloud, as Jim said. But I would elaborate, edge is automated, low latency decision making, extremely data intensive. Recognize that the cloud is not just where you trickle up a little bit of data, this is where you're going to use simulations, with a human in the loop, to augment-- >> System wide, system wide. >> System wide, with a human in the loop to augment how you evaluate new models. >> Excellent. Neil, action item. >> I would have people start on the right side of the diagram and start to think about what their strategy is and where they fit into these technologies. Be realistic about what they think they can accomplish and do the homework. >> Alright, great. So let me summarize our meeting this week. This week we talked about the role that the three tiers of data that we've described will play in the use of machine intelligence technologies as we build increasingly complex and sophisticated applications. We've talked about the difference between primary, secondary, and tertiary data. Primary data being the immediate experience of sensors. Analog being translated into digital, about a particular thing or set of things. Secondary being the data that is then aggregated off of those sensors for business event purposes, so that we can make a business decision, often automatically down at an edge scenario, as a consequence of signals that we're getting from multiple sensors. And then finally, tertiary data, that looks at a range of gateways and a range of systems, and is considering things at a system wide level, for modeling, simulation and integration purposes. Now, what's important about this is that it's not just better understanding the data and not just understanding the classes of technologies that we used, that will remain important. For example, we'll see increasingly powerful low cost device specific arm like processors pushed into the edge. And a lot of competition at the gateway, or at the secondary data tier. It's also important, however to think about the nature of the allocations and where the work is going to be performed across those different classifications. Especially as we think about machine learning, machine etiologies and deep learning. Our expectation is that we will see machine learning being used on all three levels, Where machine etiology is being used on against all forms of data to perform a variety of different work, but that the work that will be performed will be a... Will be naturally associated and related to the characteristics of the data that's being aggregated at that point. In other words, we won't see simulations, which are characteristics of tertiary data, George, at the edge itself. We will however, see edge devices often reduce significant amounts of data from a perhaps a video camera or something else to make relatively simple decisions that may involve complex technologies to allow a person into a building, for example. So our expectation is that over the next five years we're going to see significant new approaches to applying increasingly complex machine etiologies technologies across all different classes of data, but we're going to see them applied in ways that fit the patterns associated with that data, because it's the patterns that drive the applications. So our overall action item, it's absolutely essential that businesses that considering and conceptualizing what machine intelligence can do, but be careful about drawing huge generalizations about what the future machine intelligence is. The first step is to parse out the characteristics of the data driven by the devices that are going to generate it and the applications that are going to use it, and understand the relationship between the characteristics of that data and the types of machine intelligence work that can be performed. What is likely, is that an impedance mismatch between data and expectations of machine intelligence will generate a significant number of failures that often will put businesses back years in taking full advantage of some of these rich technologies. So, once again we want to thank you this week for joining us here on the Wikibon weekly research meeting. I want to thank George Gilbert who is here CUBE Studio in Palo Alto, and Jim Kobielus and Neil Raden who were both on the phone. And we want to thank you very much for joining us here today, and we look forward to talking to you again in the future. So this is Peter Burris, from the CUBE's Palo Alto Studio. Thanks again for watching Wikibon's weekly research meeting. (electronic music)

Published Date : Oct 20 2017

SUMMARY :

the characteristics of the data is going to have an impact that take place at the edge, the data workloads. that are going to indicate what types about the same time as airplanes start to flap their wings. It may be that the data that's being generated at the edge to not touch it or handle it any more times than we have to. and optimizing the model for different outcomes or crucial feature of the training and the server is going to do many of the things and the role that the gateway plays, is that it acquires data from the edge devices, and geez, let's ensure that we keep our options open that the historical aspect of it or we need to do an optimization of some form So there has to be adjustments for that. has argued pretty forcibly, that over the next few years in fact, the latest generation of Invidia's architecture What is it? in the chipsets of the year 2020 and beyond, that are going to be in place to manage those devices. that are both accurate and to optimize them Now as we think about some of the data management elements, essentially merging the skills of the developers and that is the key to productivity in managing the action item that you think to structure the modeling process to be able to tune a model Recognize that the cloud is not just where you trickle up to augment how you evaluate new models. Neil, action item. and do the homework. So our expectation is that over the next five years

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Neil	PERSON	0.99+
George	PERSON	0.99+
Neil Raden	PERSON	0.99+
Peter	PERSON	0.99+
David Floyer	PERSON	0.99+
David	PERSON	0.99+
Jim Kovielus	PERSON	0.99+
David Foyer	PERSON	0.99+
October 20, 2017	DATE	0.99+
10 gigabytes	QUANTITY	0.99+
last week	DATE	0.99+
10 mattresses	QUANTITY	0.99+
10,000 miles	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
CUBE	ORGANIZATION	0.99+
This week	DATE	0.99+
Invidia	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
second	QUANTITY	0.99+
two extremes	QUANTITY	0.99+
today	DATE	0.99+
two	QUANTITY	0.99+
Linux	TITLE	0.99+
this week	DATE	0.99+
first step	QUANTITY	0.99+
both	QUANTITY	0.98+
one model	QUANTITY	0.98+
each class	QUANTITY	0.98+
three tiers	QUANTITY	0.98+
each	QUANTITY	0.98+
24	QUANTITY	0.98+
one	QUANTITY	0.96+
a mile	QUANTITY	0.96+
more than a dozen competitors	QUANTITY	0.95+
IRI	ORGANIZATION	0.95+
Wikibon	PERSON	0.94+
seven	QUANTITY	0.94+
first	QUANTITY	0.92+
CUBE Studio	ORGANIZATION	0.86+
2020	DATE	0.85+
couple dimensions	QUANTITY	0.79+
Palo Alto Studio	LOCATION	0.78+
single business event	QUANTITY	0.75+
tertiary tier	QUANTITY	0.74+
last several years	DATE	0.71+
years	DATE	0.7+
twin	QUANTITY	0.64+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for IRI: