Wikibon | Action Item, Feb 2018

>> Hi I'm Peter Burris, welcome to Action Item. (electronic music) There's an enormous net new array of software technologies that are available to businesses and enterprises to tend to some new classes of problems and that means that there's an explosion in the number of problems that people perceive as could be applied, or could be solved, with software approaches. The whole world of how we're going to automate things differently in artificial intelligence and any number of other software technologies, are all being brought to bear on problems in ways that we never envisioned or never thought possible. That leads ultimately to a comparable explosion in the number of approaches to how we're going to solve some of these problems. That means new tooling, new models, new any number of other structures, conventions, and artifacts that are going to have to be factored by IT organizations and professionals in the technology industry as they conceive and put forward plans and approaches to solving some of these problems. Now, George that leads to a question. Are we going to see an ongoing ever-expanding array of approaches or are we going to see some new kind of steady-state that kind of starts to simplify what happens, or how enterprises conceive of the role of software and solving problems. >> Well, we've had... probably four decades of packaged applications being installed and defining really the systems of record, which first handled the ordered cash process and then layered around that. Once we had more CRM capabilities we had the sort of the opportunity to lead capability added in there. But systems of record fundamentally are backward looking, they're tracking about the performance of the business. The opportunity-- >> Peter: Recording what has happened? >> Yes, recording what has happened. The opportunity we have is now to combine what the big Internet companies pioneered, with systems of engagement. Where you had machine learning anticipating and influencing interactions. You can now combine those sorts of analytics with systems of record to inform and automate decisions in the form of transactions. And the question is now, how are we going to do this? Is there some way to simplify or, not completely standardized, but can we make it so that we have at least some conventions and design patterns for how to do that? >> And David, we've been working on this problem for quite some time but the notion of convergence has been extent in the hardware and the services, or in the systems business for quite some time. Take us through what convergence means and how it is going to set up new ways of thinking about software. >> So there's a hardware convergence and it's useful to define a few terms. There's converged systems, those are systems which have some management software that have been brought into it and then on top of that they have traditional SANs and networks. There's hyper-converged systems, which started off in the cloud systems and now have come to enterprise as well. And those bring software networking, software storage, software-- >> Software defined, so it's a virtualizing of those converged systems. >> David: Absolutely, and in the future is going to bring also automated operational stuff as well, AI in the operational side. And then there's full stack conversions. Where we start to put in the software, the application software, to begin with the database side of things and then the application itself on top of the database. And finally these, what you are talking about, the systems of intelligence. Where we can combine both the systems of record, the systems of engagement, and the real-time analytics as a complete stack. >> Peter: Let's talk about this for a second because ultimately what I think you're saying is, that we've got hardware convergence in the form of converged infrastructure, hyper-converged in the forms of virtualization of that, new ways of thinking about how the stack comes together, and new ways of thinking about application components. But what seems to be the common thread, through all of this, is data. >> David: Yes. >> So it's basically what we're seeing is a convergence or a rethinking of how software elements revolve around the data, is that kind of the centerpiece of this? >> David: That's the centerpiece of it and we had very serious constraints about accessing data. Those will improve with flash but there's still a lot of room for improvement. And the architecture that we are saying is going to come forward, which really helps this a lot, is the unit grid architecture. Where we offload the networking and the storage from the processor. This is already happening in the hyper scale clouds, they're putting a lot of effort into doing this. But we're at the same time allowing any processor to access any data in a much more fluid way and we can grow that to thousands of processes. Now that type of architecture gives us the ability to converge the traditional systems of record, and there are a lot of them obviously, and the systems of engagement and the the real-time analytics for the first time. >> But the focal point of that convergence is not the licensing of the software, the focal point is convergence around the data. >> The data. >> But that has some pretty significant implications when we think about how software has always been sold, how organizations to run software have been structured, the way that funding is set up within businesses. So George, what does it mean to talk about converging software around data from a practical standpoint over the next few years? >> Okay, so let me take that and interpret that as converging the software around data in the context of adding intelligence to our existing application portfolio and then the new applications that follow on. And basically, when we want to inject an intelligence enough to inform and anticipate and inform interactions or inform or automate transactions, we have a bunch of steps that need to get done. Where we're ingesting essentially contextual or ambient information. Often this is information about a user or the business process. And this data, it's got to go through a pipeline where there's both a Design Time and a Run Time. In addition to ingesting it, you have to sort of enrich it and make it ready for analysis. Then the analysis has essentially picking out of all that data and calculating the features that you plug into a machine learning model. And then that, produces essentially an inference based on all that data, that says well this is the probable value and it sounds like, sounds like it's in the weeds but the point is it's actually a standardized set of steps. Then the question is, do you put that all together in one product across that whole pipeline? Can one piece of infrastructure software manage that ? Or do you have a bunch of pieces each handing off to the next? And-- >> Peter: But let me stop you so because I want to make sure that we kind of follow this thread. So we've argued that hardware convergence and the ability to scale the role the data plays or how data is used, is happening and that opens up new opportunities to think about data. Now what we've got is we are centering a lot of the software convergence around the use of data through copies and other types of mechanisms for handling snapshots and whatnot and things like uni grid. What you're, let's start with this. It sounds like what you're saying is we need to think of new classes of investments in technologies that are specifically set up to handling the processing of data in a more distributed application way, right? If I got that right, that's kind of what we mean by pipelines? >> George: Yes. >> Okay, so once we do that, once we establish those conventions, once we establish organizationally institutionally how that's going to work. Now we take the next step of saying, are we going to default to a single set of products or are we going to do best to breed and what kind of convergence are we going to see there? >> And there's no-- >> First of all, have I got that right? >> Yes, but there's no right answer. And I think there's a bunch of variables that we have to play with that depend on who the customer is. For instance, the very largest and most sophisticated tech companies are more comfortable taking multiple pieces each that's very specialized and putting them together in a pipeline. >> Facebook, Yahoo, Google-- >> George: LinkedIn. >> Got it. >> George: Those guys. And the knobs that they're playing with, that everyone's playing with, are three, basically on the software side. There's your latency budget, which is how much time do you have to produce an answer. So that drives the transaction or the interaction. And it's not, that itself is not just a single answer because... It's not, the goal isn't to get it as short as possible. The goal is to get as much information into the analysis within the budgeted latency. >> Peter: So it's packing the latency budget with data? >> George: Yes, because the more data that goes into making the inference, the better the inference. >> Got it. >> The example that someone used actually on Fareed Zakaria GPS, one show about it was, if he had 300 attributes describing a person he could know more about that person then that person did (laughs) in terms of inferring other attributes. So the the point is, once you've got your latency budget, the other two knobs that you can play with are development complexity and admin complexity. And the idea is on development complexity, there's a bunch of abstractions that you have to deal with. If it's all one product you're going to have one data model, one address and namespace convention, one programming model, one way of persisting data, a whole bunch of things. That's simplicity. And that makes it more accessible to mainstream organizations. Similarly there's a bunch of, let me just add that, there's probably two or three times as many constructs that admins would have to deal with. So again, if you're dealing with one product, it's a huge burden off the admin and we know they struggled with Hadoop. >> So convergence, decisions about how to enact convergence is going to be partly or strongly influenced by those three issues. Latency budget, development complexity or simplicity, and administrative, David-- >> I'd like to add one more to that, and that is location of data. Because you want to be able to, you want to be able to look at the data that is most relevant to solving that particular problem. Now, today a lot of the data is inside the enterprise. There's a lot of data outside that but they're still, you will want to, in the best possible way, combine that data one way or another. >> But isn't that a variable on the latency budget? >> David: Well there's, I would think it's very useful to split the latency budget, which is to do with inference mainly, and development with the machine learning. So there is a development cycle with machine learning that is much longer. That is days, could be weeks, could be months. >> I would still done in Bash. >> It is or will be done, wait a second. It will be done in Bash, it is done in Bash, and it's. You need to test it and then deliver it as an inference engine to the applications that you're talking about. Now that's going to be very close together, that inference, then the rest of it has to be all physically very close together. But the data itself is spread out and you want to have mechanisms that can combine those datas, move application to those datas, bring those together in the best possible way. That is still a Bash process. That can run where the data is, in the cloud locally, wherever it is. >> George: And I think you brought up a great point, which I would tend to include in latency budget because... no matter what kind of answers you're looking for, some of the attributes are going to be pre computed and those could be-- >> David: Absolutely. >> External data. >> David: Yes. >> And you're not going to calculate everything in real time, there's just-- >> You can't. >> Yes you can't. >> But is the practical reality that the convergence of, so again, the argument. We've got all these new problems, all new kinds of new people that are claiming that they know how to solve the problems, each of them choosing different classes of tools to solve the problem, an explosion across the board in the approaches, which can lead to enormous downstream integration and complexity costs. You've used the example of Cloudera, for example. Some of the distro companies who claim that 50 plus percent of their development budget is dedicated to just integrating these pieces. That's a non-starter for a lot of enterprises. Are we fundamentally saying that the degree of complexity or the degree of simplicity and convergence, it's possible in software, is tied to the degree of convergence in the data? >> You're honing in on something really important, give me-- >> Peter: Thank you! (laughs) >> George: Give an example of the convergence of data that you're talking about. >> Peter: I'll let David do it because I think he's going to jump on it. >> David: Yes so let me take examples, for example. If you have a small business, there's no way that you want to invest yourself in any of the normal levels of machine learning and applications like that. You want to outsource that. So big software companies are going to do that for you and they're going to do it especially for the specific business processes which are unique to them, which give them digital differentiation of some sort or another. So for all of those type of things, software will come in from vendors, from SAP or son of SAP, which will help you solve those problems. And having data brokers which are collecting the data, putting them together, helping you with that. That seems to me the way things are going. In the same way that there's a lot of inference engines which will be out at the IOT level. Those will have very rapid analytics given to them. Again, not by yourself but by companies that specialize in facial recognition or specialize in making warehouse-- >> Wait a minute, are you saying that my customers aren't special, that require special facial recognition? (laughs) So I agree with David but I want to come back to this notion because-- >> David: The point I was getting at is, there's going to be lots and lots of room for software to be developed, to help in specific cases. >> Peter: And large markets to sell that software into. >> Very large markets. >> Whether it's a software, but increasingly also with services. But I want to come back to this notion of convergence because we talked about hardware convergence and we're starting to talk about the practical limits on software convergence. But somewhere in between I would argue, and I think you guys would agree, that really the catalyst for, or the thing that's going to determine the rate of change and the degree of convergence is going to be how we deal with data. Now you've done a lot of research on this, I'm going to put something out there and you tell me if I'm wrong. But at the end of the day, when we start thinking about uni grid, when we start thinking about some of these new technologies, and the ability to have single copies or single sources of data, multiple copies, in many respects what we're talking about is the virtualization of data without loss. >> David: Yes. >> Not loss of the characters, the fidelity of the data, or the state of the data. I got that right? >> Knowing the state of the data. >> Peter: Or knowing state of the data. >> If you take a snapshot, that's a point in time, you know what that point of time is, and you can do a lot of analytics for example on, and you want to do them on a certain time of day or whatever-- >> Peter: So is it wrong to say that we're seeing, we've moved through the virtualization of hardware and we're now in a hyper scale or hyper-converged, which is very powerful stuff. We're seeing this explosion in the amount of software that's being you know, the way we approach problems and whatnot. But that a forcing function, something that's going to both constrain how converged that can be, but also force or catalyze some convergence, is the idea that we're moving into an era where we can start to think about virtualized data through some of these distributed file systems-- >> David: That's right, and the metadata that goes with it. The most important thing about the data is, and it's increasing much more rapidly than data itself, is the metadata around it. But I want to just, make one point on this, all data isn't useful. There's a huge amount of data that we capture that we're just going to have to throw away. The idea that we can look at every piece of data for every decision is patently false. There's a lovely example of this in... fluid mechanics. >> Peter: Fluid dynamics. >> David: Fluid dynamics, if you're trying to, if you're trying to have simulation at a very very low level, the amount of-- >> Peter: High fidelity. >> High fidelity, you run out of capacity very very very quickly indeed. So you have to make trade-offs about everything and all of that data that you're doing in that simulation, you're not going to keep that. All the data from IOT, you can't keep that. >> Peter: And that's not just a statement about the performance or the power or the capabilities of the hardware, there's some physical realities-- >> David: Absolutely, yes. >> That are going to limit what you can do with the simulation. But, and we've talked. We've talked about this in other action items, There is this notion of options on data value, where the value of today's data is maybe-- >> David: Is much higher. >> Peter: Well it's higher from at a time standpoint for the problems that we understand and are trying to solve now but there may be future problems where we still want to ensure that we have some degree of data where we can be better at attending those future problems. But I want to come back to this point because in all honesty, I haven't heard anybody else talking about this and maybe's because I'm not listening. But this notion of again, your research that the notion of virtualized data inside these new architectures being a catalyst for a simplification of a lot of the sharing subsystem. >> David: It's essentially sharing of data. So instead of having the traditional way of doing it within a data center, which is I have my systems of record, I make a copy, it gets delivered to the data warehouse, for example. That's the way that's being done. That is too slow, moving data is incredibly slow. So another way of doing it is to share that data, make a virtual copy of it, and technologies allowing you to do that because the access density has gone up by thousands of times-- >> Peter: Because? >> Because. (laughs) Because of flash, because of new technologies at that level, >> Peter: High performance interfaces, high performance networks. >> David: All of that stuff is now allowing things, which just couldn't be even conceived. However, there is still a constraint there. It may be a thousand times bigger but there is still an absolute constraint to the amount of data that you can actually process. >> And that constraint is provided by latency. >> Latency. >> Peter: Speed of light. >> Speed of light and speed of the processes themselves. >> George: Let me add something that may help explain the sort of the virtualization of data and how it ties into the convergence or non convergence of the software around it. Which is, when we're building these analytic pipelines, essentially we've disassembled what used to be a DBMS. And so out of that we've got a storage engine, we've got query optimizers, we've got data manipulation languages which have grown into full-blown analytic languages, data definition language. Now the system catalog used to be just, a way to virtualize all the tables in the database and tell you where all the stuff was, and the indexes and things like that. Now, what we're seeing is since data is now spread out over so many places and products, we're seeing an emergence of a new of catalog. Whether that's from Elation or Dremio or on AWS, it's the Glue catalog, and I think there's something equivalent coming on Asure. But the point is, we're beginning, those are beginning to get useful enough to be the entry point for analytic products and maybe eventually even for transactional products to update, or at least to analyze the data in these pipelines that we're putting together out of these components of what was a disassembled database. Now, we could be-- >> I would make a difference there there between the development of analytics and again, the real-time use of those analytics within systems of intelligence. >> George: Yeah but when you're using them-- >> David: There's a different, problems they have to solve. >> George: But there's a Design Time and a Run Time, there's actually four pipelines for the sort of analytic pipeline itself. There's Design Time and Run Time, and then for the inference engine and the modeling that goes behind it, there's also a Design Time and Run Time. But I guess where. I'm not disagreeing that you could have one converged product to manage the Run Time analytic pipeline. I'm just saying that the pieces that you assemble could come from one vendor. >> Yeah but I think David's point, I think it's accurate and this has been since the beginning of time. (laughs) Certainly predated UNIVAC. That at the end of the day, read/write ratios and the characteristics of the data are going to have an enormous impact on the choices that you make. And high write to read ratios almost dictate the degree of convergence, and we used to call that SMP, or you know scale-up database managers. And for those types of applications, with those types of workloads, it's not necessarily obvious that that's going to change. Now we can still find ways to relax that but you're talking about, George, the new characteristics >> Injecting the analytics. >> Injecting the analytics where we're doing more reading as opposed to writing. We may still be writing into an application that has these characteristics-- >> That's a small amount of data. >> But a significant portion of the new function is associated with these new pipelines. >> Right. And it's actually... what data you create is generally derived data. So you're not stepping on something that's already there. >> All right, so let me get some action items here. David, I want to start with you. What's the action item? >> David: So for me, about conversions, there's two levels of conversions. First of all, converge as much as possible and give the work to the vendor, would be my action item. The more that you can go full stack, the more that you can get the software services from a single point, single throat to choke, single hand to shake, the more you have out source your problems to them. >> Peter: And that has a speed implication, time to value. >> Time to value, it has a, you don't have to do undifferentiated work. So that's the first level of convergence and then the second level of convergence is to look hard about how you can bring additional value to your existing systems of record by putting in automation or a real-time analytics. Which leads to automation, that is the second one, for me, where the money is. Automation, reduction in the number of things that people have to do. >> Peter: George, action item. >> So my action item is that you have to evaluate, you the customer have to evaluate sort of your skills as much as your existing application portfolio. And if more of your greenfield apps can start in the cloud and you're not religious about open source but you're more religious about the admin burden and development burden and your latency budget, then start focusing on the services that the cloud vendors originally created that were standalone, but they are increasingly integrating because the customers are leading them there. And then for those customers who you know, have decades and decades of infrastructure and applications on Prem and need a pathway to the cloud, some of the vendors formerly known as Hadoop vendors. But for that matter, any on Prem software vendor is providing customers a way to run workloads in a hybrid environment or to migrate data across platforms. >> All right, so let me give this a final action item here. Thank you David Foyer, George Gilbert. Neil Raiden and Jim Kobielus and the rest of the Wikibon team is with customers today. We talked today about convergence at the software level. What we've observed over the course of the last few years is an expanding array of software technologies, specifically AI, big data, machine learning, etc. That are allowing enterprises to think differently about the types of problems that they can solve with technology. That's leading to an explosion and a number of problems that folks are looking at, the number of individuals participating in making those decisions and thinking those issues through. And very importantly, an explosion of the number of vendors with piecemeal solutions about what they regard, their best approach to doing things. However, that is going to have a significant burden that could have enormous implications for years and so the question is, will we see a degree of convergence in the approach to doing software, in the form of pipelines and applications and whatnot, driven by a combination of: what the hardware is capable of doing, what the skills are or make possible, and very importantly, the natural attributes of the data. And we think that there will be. There will always be tension in the model if you try to invent new software but one of the factors that's going to bring it all back to a degree of simplicity, will be a combination of what the hardware can do, what people can do, and what the data can do. And so we believe, pretty strongly, that ultimately the issues surrounding data whether it be latency or location, as well as the development complexity and administrative complexity, are going to be a range of factors that are going to dictate ultimately of how some of these solutions start to converge and simplify within enterprises. As we look forward, our expectation is that we're going to see an enormous net new investment over the next few years in pipelines, because pipelines are a first-level set of investments on how we're going to handle data within the enterprise. And they'll look like, in certain respects, how DBMS used to look but just in a disaggregated way but conceptually and administratively and then from a product selection and service election standpoint, the expectation is that they themselves have to come together so the developers can have a consistent view of the data that's going to run inside the enterprise. Want to thank David Floyer, want to thank George Gilbert. Once again, this has been Wikibon Action Item and we look forward to seeing you on our next Action Item. (electronic music)

Published Date : Feb 16 2018

SUMMARY :

in the number of approaches to how we're going the sort of the opportunity to lead And the question is now, how are we going to do this? has been extent in the hardware and the services, and now have come to enterprise as well. of those converged systems. David: Absolutely, and in the future is going to bring hyper-converged in the forms of virtualization of that, and the the real-time analytics for the first time. the licensing of the software, the way that funding is set up within businesses. the features that you plug into a machine learning model. and the ability to scale how that's going to work. that we have to play with that It's not, the goal isn't to get it as short as possible. George: Yes, because the more data that goes the other two knobs that you can play with is going to be partly or strongly that is most relevant to solving that particular problem. to split the latency budget, that inference, then the rest of it has to be all some of the attributes are going to be pre computed But is the practical reality that the convergence of, George: Give an example of the convergence of data because I think he's going to jump on it. in any of the normal levels of there's going to be lots and lots of room for and the ability to have single copies Not loss of the characters, the fidelity of the data, the way we approach problems and whatnot. David: That's right, and the metadata that goes with it. and all of that data that you're doing in that simulation, That are going to limit what you can for the problems that we understand So instead of having the traditional way of doing it Because of flash, because of new technologies at that level, Peter: High performance interfaces, to the amount of data that you can actually process. and the indexes and things like that. the development of analytics and again, I'm just saying that the pieces that you assemble on the choices that you make. Injecting the analytics where we're doing But a significant portion of the new function is what data you create is generally derived data. What's the action item? the more that you can get the software services So that's the first level of convergence and applications on Prem and need a pathway to the cloud, of convergence in the approach to doing software,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
David Floyer	PERSON	0.99+
George	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
George Gilbert	PERSON	0.99+
Peter	PERSON	0.99+
David Foyer	PERSON	0.99+
George Gilber	PERSON	0.99+
Feb 2018	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
Neil Raiden	PERSON	0.99+
two	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
300 attributes	QUANTITY	0.99+
Bash	TITLE	0.99+
three	QUANTITY	0.99+
second level	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
two knobs	QUANTITY	0.99+
today	DATE	0.99+
two levels	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.99+
first level	QUANTITY	0.99+
each	QUANTITY	0.98+
three issues	QUANTITY	0.98+
First	QUANTITY	0.98+
first time	QUANTITY	0.98+
one point	QUANTITY	0.98+
one product	QUANTITY	0.98+
both	QUANTITY	0.98+
UNIVAC	ORGANIZATION	0.98+
50 plus percent	QUANTITY	0.98+
decades	QUANTITY	0.98+
second one	QUANTITY	0.98+
single point	QUANTITY	0.97+
three times	QUANTITY	0.97+
one way	QUANTITY	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for George Gilber: