Action Item | Big Data SV Preview Show - Feb 2018

>> Hi, I'm Peter Burris and once again, welcome to a Wikibon Action Item. (lively electronic music) We are again broadcasting from the beautiful theCUBE Studios here in Palo Alto, California, and we're joined today by a relatively larger group. So, let me take everybody through who's here in the studio with us. David Floyer, George Gilbert, once again, we've been joined by John Furrier, who's one of the key CUBE hosts, and on the remote system is Jim Kobielus, Neil Raden, and another CUBE host, Dave Vellante. Hey guys. >> Hi there. >> Good to be here. >> Hey. >> So, one of the things we're, one of the reasons why we have a little bit larger group here is because we're going to be talking about a community gathering that's taking place in the big data universe in a couple of weeks. Large numbers of big data professionals are going to be descending upon Strata for the purposes of better understanding what's going on within the big data universe. Now we have run a CUBE show next to that event, in which we get the best thought leaders that are possible at Strata, bring them in onto theCUBE, and really to help separate the signal from the noise that Strata has historically represented. We want to use this show to preview what we think that signal's going to be, so that we can help the community better understand what to look for, where to go, what kinds of things to be talking about with each other so that it can get more out of that important event. Now, George, with that in mind, what are kind of the top level thing? If it was one thing that we'd identify as something that was different two years ago or a year ago, and it's going to be different from this show, what would we say it would be? >> Well, I think the big realization that's here is that we're starting with the end in mind. We know the modern operational analytic applications that we want to build, that anticipate or influence a user interaction or inform or automate a business transaction. And for several years, we were experimenting with big data infrastructure, but that was, it wasn't solution-centric, it was technology-centric. And we kind of realized that the do it yourself, assemble your own kit, opensource big data infrastructure created too big a burden on admins. Now we're at the point where we're beginning to see a more converged set of offerings take place. And by converged, I mean an end to end analytic pipeline that is uniform for developers, uniform for admins, and because it's pre-integrated, is lower latency. It helps you put more data through one single analytic latency budget. That's what we think people should look for. Right now, though, the hottest new tech-centric activity is around Machine Learning, and I think the big thing we have to do is recognize that we're sort of at the same maturity level as we were with big data several years ago. And people should, if they're going to work with it, start with the knowledge, for the most part, that they're going to be experimenting, 'cause the tooling isn't quite mature enough, we don't have enough data scientists for people to be building all these pipelines bespoke. And the third-party applications, we don't have a high volume of them where this is embedded yet. >> So if I can kind of summarize what you're saying, we're seeing bifurcation occur within the ecosystem associated with big data that's driving toward simplification on the infrastructure side, which increasingly is being associated with the term big data, and new technologies that can apply that infrastructure and that data to new applications, including things like AI, ML, DL, where we think about modeling and services, and a new way of building value. Now that suggests that one or the other is more or less hot, but Neil Raden, I think the practical reality is that here in Silicon Valley, we got to be careful about getting too far out in front of our skis. At the end of the day, there's still a lot of work to be done inside how you simply do things like move data from one place to the other in a lot of big enterprises. Would you agree with that? >> Oh absolutely. I've been talking to a lot clients this week and, you know, we don't talk about the fact that they're still running their business on what we would call legacy systems, and they don't know how to, you know, get out of them or transform from them. So they're still starting to plan for this, but the problem is, you know, it's like talking about the 27 rocket engines on the whatever it was that he launched into space, launching a Tesla into space. But you can talk about the engineering of those engines and that's great, but what about all the other things you're going to have to do to get that (laughs) car into space? And it's the same thing. A year ago, we were talking about Hadoop and big data and, to a certain extent, Machine Learning, maybe more data science. But now people are really starting to say, How do we actually do this, how do we secure it, how do we govern it, how do we get some sort of metadata or semantics on the data we're working with so people know what they're using. I think that's where we are in a lot of companies. >> Great, so that's great feedback, Neil. So as we look forward, Jim Kobielus, the challenges associated with what it means to better improve the facilities of your infrastructure, but also use that as a basis for increasing your capability on some of the new applications services, what are we looking for, what should folks be looking for as they explore the show in the next couple of weeks on the ML side? What new technologies, what new approaches? Going back to what George said, we're in experimentation mode. What are going to be the experiments that are going to generate greatest results over the course of the next year? >> Yeah, for the data scientists, who flock to Strata and similar conferences, automation of the Machine Learning pipeline is super hot in terms of investments by the solution providers. Everybody from Google to IBM to AWS, and others, are investing very heavily in automation of, not just the data engine, that problem's been had a long time ago. It's automation of more of the feature engineering and the trending. These very manual, often labor intensive, jobs have to be sped up and automated to a great degree to enable the magic of productivity by the data scientists in the new generation of app developers. So look for automation of Machine Learning to be a super hot focus. Related to that is, look for a new generation of development suites that focus on DevOps, speeding the Machine Learning in DL and AI from modeling through training and evaluation deployment in iteration. We've seen a fair upswing in the number of such toolkits on the market from a variety of startup vendors, like the DataRobots of the world. But also coming to say, AWS with SageMaker, for example, that's hot. Also, look for development toolkits that automate more of the cogeneration, you know, a low-code tools, but the new generation of low-code tools, as highlighted in a recent Wikibons study, use ML to drive more of the actual production of fairly decent, good enough code, as a first rough prototype for a broad range of applications. And finally we're seeing a fair amount of ML-generated code generation inside of things like robotic process automation, RPA, which I believe will probably be a super hot theme at Strata and other shows this year going forward. So there's a, you mentioned the idea of better tooling for DevOps and the relationship between big data and ML, and what not, and DevOps. One of the key things that we've been seeing over the course of the last few years, and it's consistent with the trends that we're talking about, is increasing specialization in a lot of the perspectives associated with changes within this marketplace, so we've seen other shows that have emerged that have been very, very important, that we, for example, are participating in. Places like Splunk, for example, that is the vanguard, in many respects, of a lot of these trends in big data and how big data can applied to business problems. Dave Vellante, I know you've been associated with a number of, participating in these shows, how does this notion of specialization inform what's going to happen in San Jose, and what kind of advice and counsel should we tell people to continue to explore beyond just what's going to happen in San Jose in a couple weeks? >> Well, you mentioned Splunk as an example, a very sort of narrow and specialized company that solves a particular problem and has a very enthusiastic ecosystem and customer base around that problem. LAN files to solve security problems, for example. I would say Tableau is another example, you know, heavily focused on Viz. So what you're seeing is these specialized skillsets that go deep within a particular domain. I think the thing to think about, especially when we're in San Jose next week, is as we talk about digital disruption, what are the skillsets required beyond just the domain expertise. So you're sort of seeing this bifurcated skillsets really coming into vogue, where if somebody understands, for example, traditional marketing, but they also need to understand digital marketing in great depth, and the skills that go around it, so there's sort of a two-tool player. We talk about five-tool player in baseball. At least a multidimensional skillset in digital. >> And that's likely to occur not just in a place like marketing, but across the board. David Floyer, as folks go to the show and start to look more specifically about this notion of convergence, are there particular things that they should think about that, to come back to the notion of, well, you know, hardware is going to make things more or less difficult for what the software can do, and software is going to be created that will fill up the capabilities of hardware. What are some of the underlying hardware realities that folks going to the show need to keep in mind as they evaluate, especially the infrastructure side, these different infrastructure technologies that are getting more specialized? >> Well, if we look historically at the big data area, the solution has been to put in very low cost equipment as nodes, lots of different nodes, and move the data to those nodes so that you get a parallelization of the, of the data handling. That is not the only way of doing it. There are good ways now where you can, in fact, have a single version of that data in one place in very high speed storage, on flash storage, for example, and where you can allow very fast communication from all of the nodes directly to that data. And that makes things a lot simpler from an operational point of view. So using current Batch Automation techniques that are in existence, and looking at those from a new perspective, which is I do IUs apply these to big data, how do I automate these things, can make a huge difference in just the practicality in the elapsed time for some of these large training things, for example. >> Yeah, I was going to say that to many respects, what you're talking about is bringing things like training under a more traditional >> David: Operational, yeah. >> approach and operational set of disciplines. >> David: Yes, that's right. >> Very, very important. So John Furrier, I want to come back to you, or I want to come to you, and say that there are some other technologies that, while they're the bright shiny objects and people think that they're going to be the new kind of Harry Potter technologies of magic everywhere, Blockchain is certainly going to become folded into this big data concept, because Blockchain describes how contracts, ownership, authority ultimately get distributed. What should folks look for as the, as Blockchain starts to become part of these conversations? >> That's a good point, Peter. My summary of the preview for BigData SV Silicon Valley, which includes the Strata show, is two things: Blockchain points to the future and GDPR points to the present. GDPR is probably the most, one of the most fundamental impacts to the big data market in a long time. People have been working on it for a year. It is a nightmare. The technical underpinnings of what companies have to do to comply with GDPR is a moving train, and it's complete BS. There's no real solutions out there, so if I was going to tell everyone to think about that and what to look for: What is happening with GDPR, what's the impact of the databases, what's the impact of the architectures? Everyone is faking it 'til they make it. No one really has anything, in my opinion from what I can see, so it's a technical nightmare. Where was that database? So it's going to impact how you store the data, and the sovereignty issue is another issue. So the Blockchain then points to the sovereignty issue of the data, both in terms of the company, the country, and the user. These things are going to impact software development, application development, and, ultimately, cloud choice and the IoT. So to me, GDPR is not just a one and done thing and Blockchain is kind of a future thing to look at. So I would look out of those two lenses and say, Do you have a direction or a narrative that supports me today with what GDPR will impact throughout the organization. And then, what's going on with this new decentralized infrastructure and the role of data, and the sovereignty of that data, with respect to company, country, and user. So to me, that's the big issue. >> So George Gilbert, if we think about this question of these fundamental technologies that are going to become increasingly important here, database managers are not dead as a technology. We've seen a relative explosion over the last few years in at least invention, even if it hasn't been followed with, as Neil talked about, very practical ways of bringing new types of disciplines into a lot of enterprises. What's going to happen with the database world, and what should people be looking for in a couple of weeks to better understand how some of these data management technologies are going to converge and, or involve? >> It's a topic that will be of intense interest and relevance to IT professionals, because it's become the common foundation of all modern apps. But I think what we can do is we can see, for instance, a leading indicator of what's going to happen with the legacy vendors, where we have in-memory technologies from both transaction processing and analytics, and we have more advanced analytics embedded in the database engine, including Machine Learning, the model training, as well as model serving. But the, what happened in the big data community is that we disassembled the DBMS into the data manipulation language, which is an analytic language, like, could be Spark, could be Flink, even Hive. We had the Catalog, which I think Jim has talked about or will be talking about, where we're not looking, it's not just a dictionary of what's in one DBMS, but it's a whole way of tracking and governing data across many stores. And then there's the Storage Manager, could be the file system, an object store, could be just something like Kudu, which is a MPP way of, in parallel, performing a bunch of operations on data that's stored. The reason I bring all this up is, following on David's comment about the evolution of hardware, databases are fundamentally meant to expose capabilities in the hardware and to mediate access to data, using these hardware capabilities. And now that we have this, what's emerging as this unigrid, with memory-intensive architectures and super low latency to get from any point or node on that cluster to any other node, like with only a five microsecond lag, relative to previous architectures. We can now build databases that scale up with the same knowledge base that we built databases... I'm sorry, that scale out, that we used to build databases that scale up. In other words, it democratizes the ability to build databases of enormous scale, and that means that we can have analytics and the transactions working together at very low latency. >> Without binding them. Alright, so I think it's time for the action items. We got a lot to do, so guys, keep it really tight, really simple. David Floyer, let me start with you. Action item. >> So action item on big data should be focus on technologies that are going to reduce the elapse time of solutions in the data center, and those are many and many of them, but it's a production problem, it's becoming a production problem, treat it as a production problem, and put it in the fundamental procedures and technologies to succeed. >> And look for vendors >> Who can do that, yes. >> that do that. George Gilbert, action item. >> So I talked about convergence before. The converged platform now is shifting, it's center of gravity is shifting to continuous processing, where the data lake is a reference data repository that helps inform the creation of models, but then you run the models against the streaming continuous data for the freshest insights-- >> Okay, Jim Kobielus, action item. >> Yeah, focus on developer productivity in this new era of big data analytics. Specifically focus on the next generation of developers, who are data scientists, and specifically focus on automating most of what they do, so they can focus on solving problems and sifting through data. Put all the grunt work or training, and all that stuff, take and carry it by the infrastructure, the tooling. >> Peter: Neil Raden, action item. >> Well, one thing I learned this week is that everything we're talking about is about the analytical problem, which is how do you make better decisions and take action? But companies still run on transactions, and it seems like we're running on two different tracks and no one's talking about the transactions anymore. We're like the tail wagging the dog. >> Okay, John Furrier, action item. >> Action item is dig into GDPR. It is a really big issue. If you're not proactive, it could be a nightmare. It's going to have implications that are going to be far-reaching in the technical infrastructure, and it's the Sarbanes-Oxley, what they did for public companies, this is going to be a nightmare. And evaluate the impact of Blockchains. Two things. >> David Vellante, action item. >> So we often say that digital is data, and just because your industry hasn't been upended by digital transformations, don't think it's not coming. So it's maybe comfortable to sit back and say, Well, we're going to wait and see. Don't sit back and wait and see. All industries are susceptible to digital transformation. >> Alright, so I'll give the action item for the team. We've talked a lot about what to look for in the community gathering that's taking place next week in Silicon Valley around strata. Our observations as the community, it descends upon us, and what to look for is, number one, we're seeing a bifurcation in the marketplace, in the thought leadership, and in the tooling. One set of group, one group is going more after the infrastructure, where it's focused more on simplification, convergence; another group is going more after the developer, AI, ML, where it's focused more on how to create models, training those models, and building applications with the services associated with those models. Look for that. Don't, you know, be careful about vendors who say that they do it all. Be careful about vendors that say that they don't have to participate in a converged approach to doing this. The second thing I think we need to look for, very importantly, is that the role of data is evolving, and data is becoming an asset. And the tooling for driving velocity of data through systems and applications is going to become increasingly important, and the discipline that is necessary to ensure that the business can successfully do that with a high degree of predictability, bringing new production systems are also very important. A third area that we take a look at is that, ultimately, the impact of this notion of data as an asset is going to really come home to roost in 2018 through things like GDPR. As you scan the show, ask a simple question: Who here is going to help me get up to compliance and sustain compliance, as the understanding of privacy, ownership, etc. of data, in a big data context, starts to evolve, because there's going to be a lot of specialization over the next few years. And there's a final one that we might add: When you go to the show, do not just focus on your favorite brands. There's a lot of new technology out there, including things like Blockchain. They're going to have an enormous impact, ultimately, on how this marketplace unfolds. The kind of miasma that's occurred in big data is starting to specialize, it's starting to break down, and that's creating new niches and new opportunities for new sources of technology, while at the same time, reducing the focus that we currently have on things like Hadoop as a centerpiece. A lot of convergence is going to create a lot of new niches, and that's going to require new partnerships, new practices, new business models. Once again, guys, I want to thank you very much for joining me on Action Item today. This is Peter Burris from our beautiful Palo Alto theCUBE Studio. This has been Action Item. (lively electronic music)

Published Date : Feb 24 2018

SUMMARY :

We are again broadcasting from the beautiful and it's going to be different from this show, And the third-party applications, we don't have Now that suggests that one or the other is more or less hot, but the problem is, you know, it's like talking about the What are going to be the experiments that are going to in a lot of the perspectives associated with I think the thing to think about, that folks going to the show need to keep in mind and move the data to those nodes and people think that they're going to be So the Blockchain then points to the sovereignty issue What's going to happen with the database world, in the hardware and to mediate access to data, We got a lot to do, so guys, focus on technologies that are going to that do that. that helps inform the creation of models, Specifically focus on the next generation of developers, and no one's talking about the transactions anymore. and it's the Sarbanes-Oxley, So it's maybe comfortable to sit back and say, and sustain compliance, as the understanding of privacy,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
George	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Neil Raden	PERSON	0.99+
Neil	PERSON	0.99+
Peter Burris	PERSON	0.99+
David Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
Feb 2018	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Jim	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2018	DATE	0.99+
Google	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
next week	DATE	0.99+
two things	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Splunk	ORGANIZATION	0.99+
both	QUANTITY	0.99+
A year ago	DATE	0.99+
two lenses	QUANTITY	0.99+
a year ago	DATE	0.99+
two years ago	DATE	0.99+
this week	DATE	0.99+
Palo Alto	LOCATION	0.99+
first	QUANTITY	0.99+
third area	QUANTITY	0.98+
CUBE	ORGANIZATION	0.98+
one group	QUANTITY	0.98+
second thing	QUANTITY	0.98+
27 rocket	QUANTITY	0.98+
today	DATE	0.98+
next year	DATE	0.98+
Two things	QUANTITY	0.97+
theCUBE Studios	ORGANIZATION	0.97+
two-tool player	QUANTITY	0.97+
five microsecond	QUANTITY	0.96+
One set	QUANTITY	0.96+
Tableau	ORGANIZATION	0.94+
a year	QUANTITY	0.94+
single version	QUANTITY	0.94+
one	QUANTITY	0.94+
Wikibons	ORGANIZATION	0.91+
Wikibon	ORGANIZATION	0.91+
two different tracks	QUANTITY	0.91+
five-tool player	QUANTITY	0.9+
several years ago	DATE	0.9+
this year	DATE	0.9+
Strata	TITLE	0.87+
Harry Potter	PERSON	0.85+
one thing	QUANTITY	0.84+
years	DATE	0.83+
one place	QUANTITY	0.82+

Wikibon Research Meeting | Systems at the Edge

>> Hi I'm Peter Burris and welcome once again to Wikibons's weekly research meeting on theCUBE. (funky electronic music) This week we're going to discuss something that we actually believe is extremely important. And if you listen to the recent press announcements this week from Deli MC, the industry increasingly is starting to believe is important. And that is, how are we going to build systems that are dependent upon what happens at the edge? The past 10 years have been dominated about the cloud. How are we going to build things in the cloud? How are we going to get data to the cloud? How are we going to integrate things in the cloud? While all those questions remain very relevant, increasingly, the technology's becoming available, the systems and the design elements are becoming available, and the expertise is now more easily bought together so that we can start attacking some extremely complex problems at the edge. A great example of that is the popular notion of what's happening with automated driving. That is a clear example of huge design requirements at the edge. Now to understand these issues, we have to be able to generalize certain attributes of the differences in the resources, whether they be hardware or software, but increasingly, especially from a digital business transformation standpoint, the differences in the characteristics of the data. And that's what we're going to talk about this week. How do different types of data, data that's generated at the edge, data that's generated elsewhere, going to inform decisions about the classes of infrastructure that we're going to have to build and support as we move forward with this transformation that's taking place in the industry. So to kick it off, Neil Raden I want to turn to you. What are some of those key data differences and what taxonomically do we regard as what we call primary, secondary, and tertiary data? Neil. >> Well, primary data come in from sensors. It's a little bit different than anything we've ever seen in terms of doing analytics. Now I know that operational systems do pick up primary data, credit card transactions, something like that. But, scanner data, not scanner data, I mean sensor data is really designed for analysis. It's not designed for record keeping. And because it's designed for analysis, we have to have a different way of treating it than we do other things. If you think about a data lake, everything that falls into that data lake has come from somewhere else, it's been used for something else. But this data is fresh, and that requires that we really have to treat it carefully. Now, the retention and stewardship of that requires a lot of thought. And I don't think industry has really thought of that through a great deal. But look, sensor data is not new, it's been around for a long time. But what's different now is the volume and the lack of latency in it. But any organization that wants to get involved in it really needs to be thinking about what's the business purpose of it. If you're just going into, IOT as we call it generically, to save a few bucks you might as well not bother. It really is something that will change your organization. Now, what do we do with this data is a real problem because for the most part, these senses are going to be remote, and there's going to be a lot of, that means they're going to generate a lot of data. So what do we do with it? Do we reduce it at the sight? That's been one suggestion. There's an issue that any model for reduction could conceivably lose data that may be important somewhere down the line. Can the data be reconstituted through metadata or some sort of reverse algorithms? You know, perhaps. Those are the things we really need to think about. My humble opinion is the software and the devices need to be a single unit. And for the most part, they need to be designed by vendors, not by individual ITs. >> So David Floyer, let's pick up on that. Software and devices as single unit, designed more by vendors who have specific demand expertise, turn into solutions and present it to business. What do you think? >> Absolutely, I completely concur with that. The initial attempts to using the sensors and connecting to the sensors were very simple things like for example, the nest, the thermostats. And that's worked very well. But if you look at it over time, the processing for that has gone into the home, into your Apple TV device or your Alexa or whatever it is. So, that's coming down and now it's getting even closer to the edge. In the future, our proposition is that it will get even closer and then those will put together solutions, all types of solutions that are appropriate to the edge that will be taking not just one sensor but multiple sensors, collecting that data together, just like in the autonomous car for example where you take the lidars and the radars and the cameras etcetera. We'll be taking that data, we'll be analyzing it, and we'll be making decisions based on that data at the edge. And vendors are going to play a crucial role in providing these solutions to IT and to the OT and to many other parts. And a large value will be in their expertise that they will develop in this area. >> So as a rule of thumb, when I was growing up and learned to drive, I was told always keep five car lengths between you and whatever's in front of you at whatever speed you're traveling. What you just described David is that there will be sensors and there will be processing that takes place in that automated car that isn't using that type of rule of thumb but know something about tire temperature, and therefore the coefficient of friction on the tires, know something about the brakes, knows what the stopping power needs to be at the speed and therefore what buffer needs to be between it and whatever else is around it. >> Absolutely. >> This is no longer a rule of thumb, this is physics and deep understanding of what it's going to require to stop that car. >> And on top of that, what you'll also want to know, outside from your car is, what type of car is in front of you? Is that an autonomous car, or is that somebody being driven bye Peter? In which case, you have 10 lengths behind you. >> But that's not going to be primary data. Is that what we mean by secondary data? >> No, that's still primary because you're going to set up a connection between you and that other car. That car is going to tell you I'm primary to you, that's primary data. >> Here's what I mean, correct use primary data but, from a standpoint of that the car in that case is submitting a signal, right? So even though to your car it's primary data, but one of the things from a design standpoint that's interesting, is that car is now transmitting a digital signal about it's state that's relevant to you so that you can combine that >> Correct. inside effectively, a gateway inside your car. >> Yes. >> So there's external information that is in fact digital coming in, combining with the sensors about what's happening in your car. Have I got that right? >> Absolutely. That to me is a sort of sengrey one, then you've got the tertiary data which is the big picture about the traffic conditions >> Routes. and the weather and the routes and that sort of thing which is at that much higher cloud level, yes. So David Vellante, we always have to make sure as we have these conversations. We've talked a bit about this data, we've talked a little bit about the classes of work that's going to be performed at the different levels. How do we ensure that we sustain the business problem in this conversation? >> So, I mean I think Wikibon's done some really good work on describing what this sort of data model looks like from edge devices where you have primary data, the gateways where you're doing aggregated data in the cloud where maybe the serious modeling occurs. And my assertion would be is that the technology to support that elongating and increasingly distributed data model has been maturing for a decade and the real customer challenge is not just technical, it's really understanding a number of factors and I'll name some. Where in the distributed data value chain are you going to differentiate? And how does the data that you're capturing in that data pipeline contribute to monetization? What are the data sources, who has access to that data, how do you trust that data, and interpret it, and act on it with confidence? There are significant IP ownership in data protection issues. Who owns the data? Is it the device manufacturer, is it the factory, etcetera. What's the business model that's going to allow you to succeed? What skill sets are required to win? And really importantly, what's the shape of the ecosystem that needs to form to go to market and succeed? These are the things that I think customers are really struggling with that I talk to. >> Now, the one thing I'd add to that and I want to come back to it is the idea that, and who is ultimately bonding the solution because this is going to end up in a court of law. But let's come to this IP issue, George. Let's talk about how local data is going to be, is going to enter into the flow of analytics, and that question of who owns data, because that's important and then have the question about some of the ramifications and liabilities associated with this. >> Okay well, just on the IP protection and the idea that a vendor has to take sort of whole product responsibility for the solution. That vendor is probably going to be dealing with multiple competitors when they're sort of enabling say, self-driving car or other, you know edge, or smaller devices. The key thing is that, a vendor will say, you know, the customer keeps their data and the customer gets the insights from that data. But that data is informing in the middle a black box, an analytic black box. It's flowing through it, that's where the insights come out, on the other side. But the data changes that black box as it flows through it. So, that is something where, you know, when the vendor provides a whole solution to Mercedes, that solution will be better when they come around to BMW. And the customers should make sure that what BMW gets the benefit of, goes back to Mercedes. That's on the IP thing. I want to add one more thing on the tertiary side which is, when you're close to the edge, it's much more data intensive. When we've talked about the reduction in data and the real-time analytics, at the tertiary level it's going to be more where time is a bigger factor and you're essentially running a simulation, it's more compute intensive. And so you're doing optimizations of the model and those flow back as context to inform both the gateway and the edge. >> David Floyer I want to turn it to you. So we've talked a little bit about the characteristics of the data, great list of Dave Vellante about some of the business considerations, we will get very quickly in a second to some of the liability issues cause that's going to be important. But take us through how, which George just said about the tertiary elements. Now we've got all the data laid out, how is that going to map to the classes of devices? And we'll then talk a bit about some of the impacts on the industry. What's it going to look like? >> So if we take the primary edge first, and you take that as a unit, you'll have a number of senses within that. >> So just released, this is data about the real world that's coming into the system to be processed? >> Yes. So it'll have, for example, cameras. If we take a simple example of making sure that bad people don't get into your site. You'll have a camera there which will be facial recognition. They'll have a badge of some sort, so you'll read that badge, you may want to take their weight, you may want to have a infrared sensor on them so that you can tell their exact distance. So, a whole set of sensors that the vendor will put together for the job of insuring you don't get bad guys in there. And what you're insuring is that bad guys don't get in there, that's obviously one, very important, and also, that you don't go and- >> Stop good guys from going in. stop good guys from going in there. So those are the two characteristics >> The false-positive problem. the false-positives. Those are the two things you're trying to design that- >> At the primary edge. at the primary edge. And there's a mass amount of data going into that, which is only going to be reduced to very, very little data coming up to the next level which is this guy came here, this was his characteristics, he didn't look well today, maybe you should see a nurse, or whatever other information you can gather from that will go up to that secondary level, and then that'll also be a record of to HR maybe, about who has arrived there or what time they arrived, to the manufacturing systems about who is there and who has those skills to do a particular job. There are multiple uses of that data which can then be used for differentiation for whatever else from that secondary layer into local systems and then equally they can be pushed up to the higher level which is, how much power should be generating today, what are the higher levels. >> We now have 4,000 people in the building, air condition therefore is going to look like this, or, it could be combined with other types of data like over time we're going to need new capacity, or payroll, or whatever else it might be. >> And each level will have its own type of AI. So you've got AI at the edge, which is to produce a specific result, and then there's AI to optimize at the secondary level and then the AI optimize bigger things at the tertiary level. >> So we're going to talk more about some of the AI next week, but for right now we're talking about classes of devices that are high performance, high bandwidth, cheap, constrained, proximate to the event. >> Yep. >> Gateways that are capable of taking that information and start to synthesize it for the business, for other business types of things, and then tertiary systems, true private cloud for example, although we may have very sizable things at the gateway as well, >> There will be true private clouds. that are capable of integrating data in a more broad way. What's the impact in the industry? Are we going to see IT firms roll in and control this sweeping, (man chuckles) as Neil said, trillions of new devices. Is this all going to be intel? Is it all going to be, you know, looking like clients and PCs? >> My strong advice is, that the devices themselves will be done by extreme specialists in those areas that they will need a set of very deep technology understanding of the devices themselves, the senses themselves, the AI software relevant to that. Those are the people that are going to make money in that area. And you're much better off partnering with those people and letting them solve the problems, and you solve, as Dave said earlier, the ones that can differentiate you within your processes, within your business. So yes, leave that to other people is my strong advice. And from an IT's point of view, just don't do it yourself. >> Well the gateway's, sound like you're suggesting, the gateway is where that boundary's going to be. >> Yes. That's where the boundary is. >> And the IT technologies may increasingly go down to the edge, but it's not clear that the IT vendor expertise goes down to the edge >> Correct. at the same degree. >> Correct. >> So, Neil let's come back to you. When we think about this arrangement of data, you know, how the use cases are going to play out, and where the vendors are, we still have to address this fundamental challenge that Dave Vellante bought up. Who's going to end up being responsible for this? Now you've worked in insurance, what does that mean from an overall business standpoint? What kinds of failure weights are we going to accommodate? How is this going to play out? What do you think? >> Well, I'd like to point out that I worked in insurance 30 years ago. (men chuckling) >> Male Voice: I didn't want to date ya Neil. (men chuckling) >> Yeah the old reliable life insurance company. Anyway, one of the things David was just discussing sounded a lot to me like complex event processing. And I'm wondering where the logical location event needs to be, because it needs some prior data to do CEP, you have to have something to compare it against. But if you're pushing it all back to the tertiary level, there's going to be a lot of latency. And the whole idea was CEP was, you know, right now. So, that I'm a little curious about. But I'm sorry, what was your question? >> Well no, let's address that. So CEP David, I agree. But I don't want to turn this into a general discussion and CEP. It's got its own set of issues. >> It's clear there have got to be complex models created. And those are going to be created in a large environment, almost certainly in a tertiary type environment. And those are going to be created by the vendors of those particular problem solvers at the primary edge. To a large extent, they're going to provide solutions in that area. And they're going to have to update those. And so, they are going to have to have lots and lots of test data for themselves and maybe some companies will provide test data if it's convenient for those, for a fee or whatever it is, to those vendors. But the primary model itself is going to be in the tertiary level, and that's going to be pushed down to the primary level itself. >> I'm going to make an assertion here that the, the way I think about this Neil is that the data coming off at the primary level is going to be the sensor data, the sensor said it was good. Then that is recorded as an event, we let somebody in the building. And that's going to be a key feature of what happens at the secondary level. I think a lot of complex processing is likely to end up at that secondary level. >> Absolutely. >> Then the data gets pushed up to the tertiary level and it becomes part of an overall social understanding of the business, it's behavior data. So increasingly, what did we do as a consequence of letting this person in the building? Oh we tried to stop him. That's going to be more of the behavioral data that ends up at the tertiary level, will still do complex event processing there. It's going to be interesting to see whether or not we end up with CEP directly in the sensor tower. Might under certain circumstances, that's a cost question though. So let me now turn it in the last few minutes here Neil back to you. At the end of the day, we've seen for years the question of how much security is enough security? And businesses said, "Oh I want to be 100% secure." And sometimes see-so said "We got that. You gave me the money, we've now made you 100% secure." But we know it's not true. Same thing is going to exist here. How much fidelity is enough fidelity down at the edge? How do we ensure that business decisions can be translated into design decisions that lead to an appropriate and optimized overall approach to the way the system operates? From a business standpoint back, what types of conversations are going to take place in the boardroom that the rest of the organization's going to have to translate into design decisions? >> You know, boy, bad actors are going to be bad actors. I don't think you can do anything to eliminate it. The best you can do is use the best processes and the best techniques to keep it from happening and hope for the best. I'm sorry, that's all I can really say about it. >> There's quite a lot of work going on at the moment from Arm, in particular. They've got a security device image ability. So, there's a lot of work going on in that very space. It's obviously interesting from an IT perspective is how do you link the different security systems, both from an Arm point of view and then from a X86 as you go further up the chain. How are they going to be controlled and how's that going to be managed? That's going to be a big IT issue. >> Yeah, I think the transmission is the weak point. >> Male Voice: What do you mean by that Neil? >> Well the data has to flow across networks, that would be the easiest place for someone to intercept it and, you know, and do something nefarious. >> Right yeah, so that's purely in a security thing. I was trying to use that as an analogy. So, at the end of the day, the business is going to have to decide how much data do we have to capture off the edge to ensure that we have the kinds of models we want, so that we can realize the specificity of actions and behaviors that we want in our business? That's partly a technology question, partly a cost question. Different sensors are able to operate at different speeds for example. But ultimately, we have to be able to bring those, that list of decisions or business issues that Dave Vellante raised, down to some of the design questions. But it's not going to be throw a $300 micro processor everything. There's going to be very, very concrete decisions that have to take place. So, George do you agree with that? >> Yes, two issues though. One, there's the existing devices that can't get re-instrumented, that they already have their software, hardware stack. >> There's a legacy in place? >> Yes. But there's another thing which is, some of the most advanced research that's been going on that produced much of today's distributed computing and big data infrastructure, like the Berkeley Analytics lab, and say their contributions spark in related technologies. They're saying we have to throw everything out and start over for secure real-time systems. That you have to build from hardware all the way up. In other words, you're starting from the sand to re-think something that's secure and real-time that you can't layer it on. >> So very quickly David, that's a great point George. Building on what George has said very quickly, the primary responsibility for bonding the behavior or the attributes of these devices are going to be with the vendor. >> Of creating the solution? >> Correct. >> That's going to be the primary responsibility. But obviously from an IT point of view, you need to make sure that that device is doing the job that's important for your business, not too much, not too little, is doing that job, and that you are able to collect the necessary data from it that is going to be of value to you. So that's a question of qualification of the devices themselves. >> Alright so, David Vellante, Neil Raden, David Floyer, George Gilbert, action item round. I want one action item from you guys from this conversation. Keep it quick, keep it short, keep it to the point. David Floyer, what's your action item? >> So my action item is don't go into areas that you don't need to. You do not need to become experts, IT in general does not need to become experts at the edge itself. Rely on partners, rely on vendors to do that unless of course you're one of those vendors. In which case, you'll need very, very deep knowledge. >> Or you choose that that's where you're value stream your differentiations is going to be which means you just became one of those values. >> Yes, exactly. >> George Gilbert. >> I would build on that and I would say that if you look at the skills required to build these full stack solutions, there's data science, there's application development, there's the analytics. Very few of those solutions are going to have skills all in one company. So the go-to market model for building these is going to be something that, at least at this point in time, we're going to have to look to like combinations like IBM working with sort of supply chain masters. >> Good. Neil Raden, action item. >> The question is not necessarily one of technology because that's going to evolve. But I think as an organization, you need to look at it from this end which is, would employing this create a new business opportunity for us? Something we're not already doing. Or number two, change our operations in some significant way. Or number three, you know, the old red queen thing. We have to do it to keep up with the competition. >> Male Voice: David Vellante, action item. >> Okay well look, at the risk of sounding trite, you got to start the planning process from the customer on in, and so often people don't. You got to understand where you're going to add value for customers and constructing and external and internal ecosystem that can really juice that value creation. >> Alright, fantastic guys. So let me quickly summarize. This week on the Wikibon Friday research meeting in the cube, we discussed a new way of thinking about data characteristics that will inform system design and a business value that's created. We observe that data is not all the same when we think about these very complex, highly distributed, and decentralized systems that we're going to build. That there's a difference between primary data, secondary data, and tertiary data. Primary data is data that is generated from real world events or measurements and then turned into signals that can be acted upon very proximate to that real world set of conditions. A lot of sensors will be there, a lot of processing will be moved down there, and a lot of actuators and actions will take place without referencing other locations within the cloud. However, we will see circumstances where the events that are taken, or the decisions that are taken on those vents, will be captured in some sort of secondary tier that will then record something about the characteristics of the actions and events that were taken, and then summarized and then pushed up to a tertiary tier where that data can then be further integrated in other attributes and elements of the business. The technology to do this is broadly available but not universally successfully applied. We expect to see a lot of new combinations of edge-related device to work with primary data. That is going to be a combination of currently successful firms in the OT or operational technology world, most likely in partnership with a lot of other vendors that have demonstrated significant expertise and understanding the problems, especially the business problems, associated with the fidelity of what happens at the edge. The IT industry is going to approach very aggressively and very close to this at that secondary level, through gateways and other types of technologies. And even though we'll see IT technology continue to move down to the primary level, it's not clear exactly how vendors will be able to follow that. More likely, we'll see the adoption of IT approaches to doing things at the primary level by vendors that have the main expertise in how that level works. We will however see significantly interesting true private cloud and public cloud data end up from the tertiary level end up with a whole new sets of systems that are going to be very important from an administration and management standpoint because they have to work within the context of the fidelity of this overall system together. The final point we want to make is that these are not technology problems by themselves. While significant technology problems are on the horizon about how we think about handling this distribution of data, managing it appropriately, our ability, ultimately, to present the appropriate authority at different levels within that distributive fabric to ensure the proper working condition in a way that nonetheless we can recreate if we need to. But these are, at bottom, fundamentally business problems. They're business problems related to who owns the intellectual property that's being created, they're business problem related to what level in that stack do I want to show my differentiation to my customers and they're business problems from a liability and legal standpoint as well. The action item is, all firms will in one form or another be impacted by the emergence of the edge as a dominate design as consideration for their infrastructure but also for their business. Three ways, or a taxonomy that looks at three classes of data, primary, secondary, and tertiary, will help businesses sort out who's responsible, what partnerships I need to put in place, what technologies and I going to employ, and very importantly, what overall business exposure I'm going to accommodate as I think ultimately about the nature of the processing and business promises that I'm making to my marketplace. Once again, this has been the Wikibon Friday research meeting here on theCUBE. I want to thank all the analysts who were here today, but especially thank you for paying attention and working with us. And by all means, let's hear those comments back about how we're doing and what you think about this important question of different classes of data driven by different needs of the edge. (funky electronic music)

Published Date : Oct 13 2017

SUMMARY :

A great example of that is the popular notion And for the most part, they need to be designed present it to business. that are appropriate to the edge that will be taking and learned to drive, I was told of what it's going to require to stop that car. Is that an autonomous car, or is that But that's not going to be primary data. That car is going to tell you I'm primary inside your car. Have I got that right? the big picture about the traffic conditions and the weather and the routes What's the business model that's going to allow you to succeed? Now, the one thing I'd add to that the benefit of, goes back to Mercedes. of the liability issues cause that's going to be important. and you take that as a unit, and also, that you don't go and- So those are the two characteristics Those are the two things you're trying to design that- and then that'll also be a record of to HR maybe, air condition therefore is going to look like this, a specific result, and then there's AI to optimize high bandwidth, cheap, constrained, proximate to the event. Is it all going to be, you know, looking like clients and PCs? Those are the people that are going to make money in that area. Well the gateway's, sound like you're suggesting, at the same degree. How is this going to play out? Well, I'd like to point out that I worked in insurance Male Voice: I didn't want to date ya Neil. And the whole idea was CEP was, you know, right now. But I don't want to turn this into be in the tertiary level, and that's going to be And that's going to be a key feature of That's going to be more of the behavioral data and the best techniques to keep it from happening and how's that going to be managed? Well the data has to flow across networks, capture off the edge to ensure that we have can't get re-instrumented, that they already have their some of the most advanced research that's been going on are going to be with the vendor. the necessary data from it that is going to be of value to you. Keep it quick, keep it short, keep it to the point. IT in general does not need to Or you choose that that's where you're is going to be something that, at least at this point in time, Neil Raden, action item. We have to do it to keep up with the competition. You got to understand where you're going to add value sets of systems that are going to be very important

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David Floyer	PERSON	0.99+
Neil	PERSON	0.99+
Neil Raden	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
David	PERSON	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Peter Burris	PERSON	0.99+
Mercedes	ORGANIZATION	0.99+
BMW	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
$300	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
10 lengths	QUANTITY	0.99+
two characteristics	QUANTITY	0.99+
Berkeley Analytics	ORGANIZATION	0.99+
next week	DATE	0.99+
4,000 people	QUANTITY	0.99+
two issues	QUANTITY	0.99+
Peter	PERSON	0.99+
today	DATE	0.99+
each level	QUANTITY	0.99+
One	QUANTITY	0.99+
one suggestion	QUANTITY	0.98+
Three ways	QUANTITY	0.98+
five car	QUANTITY	0.98+
both	QUANTITY	0.98+
This week	DATE	0.97+
two things	QUANTITY	0.97+
this week	DATE	0.97+
30 years ago	DATE	0.97+
one	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.97+
Wikibons	ORGANIZATION	0.97+
trillions of new devices	QUANTITY	0.97+
single unit	QUANTITY	0.97+
one sensor	QUANTITY	0.96+
one form	QUANTITY	0.96+
first	QUANTITY	0.94+
one company	QUANTITY	0.94+
Apple TV	COMMERCIAL_ITEM	0.92+
one action item	QUANTITY	0.92+
three classes	QUANTITY	0.91+
intel	ORGANIZATION	0.89+
Wikibon	EVENT	0.86+
one more	QUANTITY	0.79+
second	QUANTITY	0.76+
past 10 years	DATE	0.75+
CEP	ORGANIZATION	0.75+
Deli MC	ORGANIZATION	0.73+
CEP	TITLE	0.68+
Arm	ORGANIZATION	0.65+
Wikibon Friday	EVENT	0.64+
Alexa	TITLE	0.64+
years	QUANTITY	0.62+
few bucks	QUANTITY	0.6+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Wikibons: