Hybrid Cloud Taxonomy | CUBEConversation, February 2019

(orchestral music) >> Hi, I'm Peter Burris, and welcome to another Cube conversation, from our awesome studios in beautiful Palo Alto, California. With every Cube conversation, we pick a topic, find someone to talk about. The topic today is hybrid cloud. A lot of conversation. AWS introduced Outposts, we've got Microsoft Azure talking about centralize, as well as distributed cloud offerings. Oracle is doing the same thing. A lot of conversation about hybrid cloud and what it means. To have that conversation, we've got David Floyer with us. David is the CTO of Wikibon. David, welcome back to theCUBE. >> Thank you very much, Peter. >> David, let's start by saying, that there has to be a way of representing different options when we think about hybrid cloud. You've done a lot of research in this domain. How are you representing the continuum, the taxonomy of hybrid cloud for customers? >> On the slide, it shows that there are essentially, five different multiple clouds or hybrid clouds. From left to right, it's multi-cloud, and at the bottom of the slide, it says that this essentially a set of clouds, with an integrated network. And then the next is loosely-coupled hybrid cloud, and that adds in the data plane, where we look after storage, and data protection, data management, et cetera. The middle one is tightly-coupled hybrid cloud, and that's where the control plane, is now tightly integrated along with everything else. The next one is "true" distributed hybrid cloud, and those are the ones that you were talking about. Those are the AWS Outposts, the Azure Stack, the Oracle Cloud at Customer-type environments. Also, you could put IBM, some of IBM's recent announcements into that as well. Last but not least, and certainly one of the most interesting and different, is the autonomous stand-alone clouds, are going to be at the edge. They have to be autonomous, because they can't guarantee network availability to them. >> So, five classes of cloud, each distinguished by the degree, to which they share different types of resources, including state, integration, automation, and the degree to which the application is going to be common across each of these cloud types. >> That's correct. >> Have I got that right? >> Yeah, absolutely. >> Obviously, while this is theoretical. >> Yeah. >> In a sense that we're trying to create some way, so understanding about how to represent these things. It's based on some practical observations, about where we are within the industry. >> Yeah. >> Let's start talking about multicloud. Who do you place into that bucket, of multicloud hybrid cloud styles? >> If we talk first of all about the cloud themselves, there would be clouds from AWS or Azure, or IBM or Google. Those are the clouds that you start with, you might have one on premise, but the connection between them is just on a network basis. The people who are doing that would be clearly, Cisco is one of the leading people in that area, where they already have a lot of enterprise equipment, and experience of dealing with clouds, across the whole of the area. They would be the people, that are going to be a foremost vendor, in connecting those different clouds together, on a network plane. >> Okay, let's move to the right, and talk about the loosely-coupled hybrid clouds. Now here we're having more than network, common network. We're having a common data plane, which really boils down to a common set of data services, that are rendered commonly. >> Right, yeah. >> Across different cloud instances. >> Right. >> Who's there? >> To do that, you've got to be able to have your data services, actually on each of the clouds. You have to have it in software on AWS, or Azure, or IBM, or whatever it is. Two of the people that's probably leading the charge in that area are IBM themselves. They've gone completely software, with all of their spectrum line of software in that area, and Pure. Pure Storage have been very aggressive again, in putting things up, so that they can be reflected in each of the clouds. >> And there's other vendors, that are coming in from a data protection standpoint. >> Sure. >> Data security standpoint, and they may-- Some people like Veeam. >> -not have the full set of services. >> Yes. But they are looking at how they can apply their services. >> Correct. >> Across multiple cloud instances. >> And there's a lot of vendors there. People like Veeam or Rubric, or Cohesity. DellEMC. >> Et cetera, yes. >> Okay, so let's move to the right. Now we've moved from loosely-coupled, to tightly-coupled hybrid clouds, where we're starting to share a common automation framework, more control, sharing control data so that we can start to understand, the state of applications in multiple different locations. >> Yes. >> Who's leading there? >> Some of the leads in this area, are some of the traditional ones, like IBM for example. IBM Sysplex, which came out what, 20 years ago. >> We're not. >> That is where you have state being, time and state being shared, across a whole number of different instances, or notion within that Sysplex. >> Yeah, let's talk about that specifically. So, we're talking about a global shared memory notion. >> Yes. >> More than just a name space, but actually-- >> Correct. >> -a control plane, that has global incite into where resources are, has names for them. >> Yeah. >> They may be multiple name spaces, but it's bringing a common set of controls to that global set of resources. >> Yes, and time is obviously a key aspect to help stay-- >> Well, it's got to be synchronized. >> Yes. >> Exactly. >> That's right. >> If we move to the right to true distributed hybrid cloud, in the tightly-coupled, we have a common control plane, but not necessarily common software. >> Correct. >> Common code. >> Correct. >> At the compile level. We're still utilizing distribution formats, maybe specific, et cetera. But now in a true hybrid, or true distributor hybrid cloud, it's common-common. >> Yes. >> Who's there? >> Yes, it's common code. It can run on any node without having to be recompiled, or retested. You know it's going to work. The people in there, are the people that we were talking about earlier. It's people like AWS with Outposts, Microsoft with Azure Stack, Cloud at Customer from Oracle. Three large vendors, who are using this to use a cloud first-type model, in which they can grow, the central cloud, as quickly as possible, add things to it, and push that down into the Cloud at Customer, or the Outposts, or the Stacks. >> To be clear, we're not talking about a common cloud experience, we're talking about absolute common cloud services. >> Correct. >> All the way down to the executables, so that the same software can run wherever it needs to run. >> Yes. >> Finally, let's move one step further to the right. This is the autonomous stand-alone clouds. >> Yes, this is at the edge. >> Who's there? >> This is the most different of all of these. It has to be autonomous. If you think about mobile vehicles or planes, or even think about a factory or a nuclear power plant. You have to be able to run that, assuming that the network is not going to get through. It's on the edge, so it's the most vulnerable to network. It has to be autonomous, therefore it has to be able to run by itself. That sort of cloud is mainly concerned with the state, the state of that edge. All of the devices in that edge, the windmills in that edge, or the factory robotics in that edge. In military terms, the automated units in that edge, or the drones. Whatever it is, you're concerned about the state of that. >> But specifically, sustaining local control of state. >> Correct. >> Against a common understanding. >> Yes. >> Of how these things interact with each other. >> Right. >> It brings almost a network realtime of flavor to it. >> It is realtime. It has to be realtime so it's a shared state across. For example, across the city, in terms of the traffic lights. You would see multiple of these small clouds, in different parts of a large city, for example. Which need to communicate with each other. So, you have devices, which have an inference code running on them, and they're dealing with the device, on to which it's attached. And then you have connecting all of those devices together, to make this overall system representation of the sate. >> Okay, so we've got five classes of hybrid cloud. How is a CIO going to use this taxonomy, to make better decisions? >> Clearly by making this decision, what we're doing from a taxonomy point of view, is making each one individual, and different from the others. There's no sharing between them. That means that from a description point of view, we can describe the whole of this industry. We can say how much is going on in each one, who are winners and losers in each one. >> We'll use this to size different classifications. >> Right, and give that-- >> Talk about leader, describe competition and all that stuff. >> Yes. >> But if I'm a CIO, do I think, oh, I got a business problem that's associated with applications, on various levels of common data sharing, control sharing, et cetera. Do I use this to help me chose the specific architecture that I use? >> The best way that I think that CIO's are going to use this to say, "Where am I aiming to be? What is most important to me and my business? If it is the edge, then how am I going to go through these? Because I'm not going to get to the edge on day one. How am I going to chose my vendors and my protocols, and my standards, and my data planes, and my control planes, such that I can get to that particular end point?" Within each one, you'd want to look at them individually, because you're going to put together a, first of all, in a multi-cloud environment. But you should be looking into the future, as to how you want to traverse across this, and who your major partners and vendors will be. Or, strategic partners and vendors. >> And we'll use this as you said, we'll use this specifically to size the market, describe the competitive factors, et cetera. >> Correct, yeah. All right. David Floyer, thanks very much for being on theCUBE. >> Thanks very much, indeed. >> Once again, I'm Peter Burris, and we have been talking about Cube conversations, related to true hybrid cloud taxonomies. Wikibon research. Thanks very much for watching, and until our next Cube conversation. (orchestral music)

Published Date : Feb 21 2019

SUMMARY :

David is the CTO of Wikibon. that there has to be a way of representing and that adds in the data plane, and the degree to which the application In a sense that we're trying to create some way, Who do you place into that bucket, Cisco is one of the leading people in that area, and talk about the loosely-coupled hybrid clouds. Two of the people that's probably leading the charge that are coming in from a data protection standpoint. and they may-- Yes. People like Veeam or Rubric, the state of applications in multiple different locations. Some of the leads in this area, That is where you have state being, Yeah, let's talk about that specifically. that has global incite into where resources are, to that global set of resources. in the tightly-coupled, At the compile level. and push that down into the Cloud at Customer, we're not talking about a common cloud experience, so that the same software can run wherever it needs to run. This is the autonomous stand-alone clouds. assuming that the network is not going to get through. It has to be realtime so it's a shared state across. How is a CIO going to use this taxonomy, and different from the others. describe competition and all that stuff. the specific architecture that I use? such that I can get to that particular end point?" describe the competitive factors, et cetera. David Floyer, thanks very much for being on theCUBE. related to true hybrid cloud taxonomies.

ENTITIES

Entity	Category	Confidence
David Floyer	PERSON	0.99+
Peter Burris	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
David	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter	PERSON	0.99+
February 2019	DATE	0.99+
Two	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Veeam	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
five	QUANTITY	0.99+
five classes	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
Rubric	ORGANIZATION	0.98+
each	QUANTITY	0.98+
Cohesity	ORGANIZATION	0.98+
each one	QUANTITY	0.98+
20 years ago	DATE	0.97+
Three large vendors	QUANTITY	0.97+
one	QUANTITY	0.97+
DellEMC	ORGANIZATION	0.97+
Pure Storage	ORGANIZATION	0.96+
Azure Stack	TITLE	0.94+
day one	QUANTITY	0.93+
Pure	ORGANIZATION	0.91+
today	DATE	0.91+
Azure	ORGANIZATION	0.9+
Outposts	ORGANIZATION	0.89+
one step	QUANTITY	0.88+
first	QUANTITY	0.8+
Azure	TITLE	0.77+
Sysplex	ORGANIZATION	0.76+
IBM Sysplex	ORGANIZATION	0.71+
AWS Outposts	ORGANIZATION	0.69+
theCUBE	ORGANIZATION	0.65+
Oracle Cloud	TITLE	0.6+
Cube	COMMERCIAL_ITEM	0.59+
Cube	ORGANIZATION	0.57+
vendors	QUANTITY	0.57+
people	QUANTITY	0.54+
Cube	TITLE	0.52+

Sam Lightstone, IBM | Machine Learning Everywhere 2018

>> Narrator: Live from New York, it's the Cube. Covering Machine Learning Everywhere: Build Your Ladder to AI. Brought to you by IBM. >> And welcome back here to New York City. We're at IBM's Machine Learning Everywhere: Build Your Ladder to AI, along with Dave Vellante, John Walls, and we're now joined by Sam Lightstone, who is an IBM fellow in analytics. And Sam, good morning. Thanks for joining us here once again on the Cube. >> Yeah, thanks a lot. Great to be back. >> Yeah, great. Yeah, good to have you here on kind of a moldy New York day here in late February. So we're talking, obviously data is the new norm, is what certainly, have heard a lot about here today and of late here from IBM. Talk to me about, in your terms, of just when you look at data and evolution and to where it's now become so central to what every enterprise is doing and must do. I mean, how do you do it? Give me a 30,000-foot level right now from your prism. >> Sure, I mean, from a super, if you just stand back, like way far back, and look at what data means to us today, it's really the thing that is separating companies one from the other. How much data do they have and can they make excellent use of it to achieve competitive advantage? And so many companies today are about data and only data. I mean, I'll give you some like really striking, disruptive examples of companies that are tremendously successful household names and it's all about the data. So the world's largest transportation company, or personal taxi, can't call it taxi, but (laughs) but, you know, Uber-- >> Yeah, right. >> Owns no cars, right? The world's largest accommodation company, Airbnb, owns no hotels, right? The world's largest distributor of motion pictures owns no movie theaters. So these companies are disrupting because they're focused on data, not on the material stuff. Material stuff is important, obviously. Somebody needs to own a car, somebody needs to own a way to view a motion picture, and so on. But data is what differentiates companies more than anything else today. And can they tap into the data, can they make sense of it for competitive advantage? And that's not only true for companies that are, you know, cloud companies. That's true for every company, whether you're a bricks and mortars organization or not. Now, one level of that data is to simply look at the data and ask questions of the data, the kinds of data that you already have in your mind. Generating reports, understanding who your customers are, and so on. That's sort of a fundamental level. But the deeper level, the exciting transformation that's going on right now, is the transformation from reporting and what we'll call business intelligence, the ability to take those reports and that insight on data and to visualize it in the way that human beings can understand it, and go much deeper into machine learning and AI, cognitive computing where we can start to learn from this data and learn at the pace of machines, and to drill into the data in a way that a human being cannot because we can't look at bajillions of bytes of data on our own, but machines can do that and they're very good at doing that. So it is a huge, that's one level. The other level is, there's so much more data now than there ever was because there's so many more devices that are now collecting data. And all of us, you know, every one of our phones is collecting data right now. Your cars are collecting data. I think there's something like 60 sensors on every car that rolls of the manufacturing line today. 60. So it's just a wild time and a very exciting time because there's so much untapped potential. And that's what we're here about today, you know. Machine learning, tapping into that unbelievable potential that's there in that data. >> So you're absolutely right on. I mean the data is foundational, or must be foundational in order to succeed in this sort of data-driven world. But it's not necessarily the center of the universe for a lot of companies. I mean, it is for the big data, you know, guys that we all know. You know, the top market cap companies. But so many organizations, they're sort of, human expertise is at the center of their universe, and data is sort of, oh yeah, bolt on, and like you say, reporting. >> Right. >> So how do they deal with that? Do they get one big giant DB2 instance and stuff all the data in there, and infuse it with MI? Is that even practical? How do they solve this problem? >> Yeah, that's a great question. And there's, again, there's a multi-layered answer to that. But let me start with the most, you know, one of the big changes, one of the massive shifts that's been going on over the last decade is the shift to cloud. And people think of the shift to cloud as, well, I don't have to own the server. Someone else will own the server. That's actually not the right way to look at it. I mean, that is one element of cloud computing, but it's not, for me, the most transformative. The big thing about the cloud is the introduction of fully-managed services. It's not just you don't own the server. You don't have to install, configure, or tune anything. Now that's directly related to the topic that you just raised, because people have expertise, domains of expertise in their business. Maybe you're a manufacturer and you have expertise in manufacturing. If you're a bank, you have expertise in banking. You may not be a high-tech expert. You may not have deep skills in tech. So one of the great elements of the cloud is that now you can use these fully managed services and you don't have to be a database expert anymore. You don't have to be an expert in tuning SQL or JSON, or yadda yadda. Someone else takes care of that for you, and that's the elegance of a fully managed service, not just that someone else has got the hardware, but they're taking care of all the complexity. And that's huge. The other thing that I would say is, you know, the companies that are really like the big data houses, they got lots of data, they've spent the last 20 years working so hard to converge their data into larger and larger data lakes. And some have been more successful than others. But everybody has found that that's quite hard to do. Data is coming in many places, in many different repositories, and trying to consolidate, you know, rip the data out, constantly ripping it out and replicating into some data lake where you, or data warehouse where you can do your analytics, is complicated. And it means in some ways you're multiplying your costs because you have the data in its original location and now you're copying it into yet another location. You've got to pay for that, too. So you're multiplying costs. So one of the things I'm very excited about at IBM is we've been working on this new technology that we've now branded it as IBM Queryplex. And that gives us the ability to query data across all of these myriad sources as if they are in one place. As if they are a single consolidated data lake, and make it all look like (snaps) one repository. And not only to the application appear as one repository, but actually tap into the processing power of every one of those data sources. So if you have 1,000 of them, we'll bring to bear the power 1,000 data sources and all that computing and all that memory on these analytics problems. >> Well, give me an example why that matters, of what would be a real-world application of that. >> Oh, sure, so there, you know, there's a couple of examples. I'll give you two extremes, two different extremes. One extreme would be what I'll call enterprise, enterprise data consolidation or virtualization, where you're a large institution and you have several of these repositories. Maybe you got some IBM repositories like DB2. Maybe you've got a little bit of Oracle and a little bit of SQL Server. Maybe you've got some open source stuff like Postgres or MySQL. You got a bunch of these and different departments use different things, and it develops over decades and to some extent you can't even control it, (laughs) right? And now you just want to get analytics on that. You just, what's this data telling me? And as long as all that data is sitting in these, you know, dozens or hundreds of different repositories, you can't tell, unless you copy it all out into a big data lake, which is expensive and complicated. So Queryplex will solve that problem. >> So it's sort of a virtual data store. >> Yeah, and one of the terms, many different terms that are used, but one of the terms that's used in the industry is data virtualization. So that would be a suitable terminology here as well. To make all that data in hundreds, thousands, even millions of possible data sources, appear as one thing, it has to tap into the processing power of all of them at once. Now, that's one extreme. Let's take another extreme, which is even more extreme, which is the IoT scenario, Internet of Things, right? Internet of Things. Imagine you've, have devices, you know, shipping containers and smart meters on buildings. You could literally have 100,000 of these or a million of these things. They're usually small; they don't usually have a lot of data on them. But they can store, usually, couple of months of data. And what's fascinating about that is that most analytics today are really on the most recent you know, 48 hours or four weeks, maybe. And that time is getting shorter and shorter, because people are doing analytics more regularly and they're interested in, just tell me what's going on recently. >> I got to geek out here, for a second. >> Please, well thanks for the warning. (laughs) >> And I know you know things, but I'm not a, I'm not a technical person, but I've been a molt. I've been around a long time. A lot of questions on data virtualization, but let me start with Queryplex. The name is really interesting to me. When I, and you're a database expert, so I'm going to tap your expertise. When I read the Google Spanner paper, I called up my colleague David Floyer, who's an ex-IBM, I said, "This is like global Sysplex. "It's a global distributed thing," And he goes, "Yeah, kind of." And I got very excited. And then my eyes started bleeding when I read the paper, but the name, Queryplex, is it a play on Sysplex? Is there-- >> It's actually, there's a long story. I don't think I can say the story on-air, but we, suffice it to say we wanted to get a name that was legally usable and also descriptive. >> Dave: Okay. >> And we went through literally hundreds and hundreds of permutations of words and we finally landed on Queryplex. But, you know, you mentioned Google Spanner. I probably should spend a moment to differentiate how what we're doing is-- >> Great, if you would. >> A different kind of thing. You know, on Google Spanner, you put data into Google Spanner. With Queryplex, you don't put data into it. >> Dave: Don't have to move it. >> You don't have to move it. You leave it where it is. You can have your data in DB2, you can have it in Oracle, you can have it in a flat file, you can have an Excel spreadsheet, and you know, think about that. An Excel spreadsheet, a collection of text files, comma delimited text files, SQL Server, Oracle, DB2, Netezza, all these things suddenly appear as one database. So that's the transformation. It's not about we'll take your data and copy it into our system, this is about leave your data where it is, and we're going to tap into your (snaps) existing systems for you and help you see them in a unified way. So it's a very different paradigm than what others have done. Part of the reason why we're so excited about it is we're, as far as we know, nobody else is really doing anything quite like this. >> And is that what gets people to the 21st century, basically, is that they have all these legacy systems and yet the conversion is much simpler, much more economical for them? >> Yeah, exactly. It's economical, it's fast. (snaps) You can deploy this in, you know, a very small amount of time. And we're here today talking about machine learning and it's a very good segue to point out in order to get to high-quality AI, you need to have a really strong foundation of an information architecture. And for the industry to show up, as some have done over the past decade, and keep telling people to re-architect their data infrastructure, keep modifying their databases and creating new databases and data lakes and warehouses, you know, it's just not realistic. And so we want to provide a different path. A path that says we're going to make it possible for you to have superb machine learning, cognitive computing, artificial intelligence, and you don't have to rebuild your information architecture. We're going to make it possible for you to leverage what you have and do something special. >> This is exciting. I wasn't aware of this capability. And we were talking earlier about the cloud and the managed service component of that as a major driver of lowering cost and complexity. There's another factor here, which is, we talked about moving data-- >> Right. >> And that's one of the most expensive components of any infrastructure. If I got to move data and the transmission costs and the latency, it's virtually impossible. Speed of light's still up. I know you guys are working on speed of light, but (Sam laughs) you'll eventually get there. >> Right. >> Maybe. But the other thing about cloud economics, and this relates to sort of Queryplex. There's this API economy. You've got virtually zero marginal costs. When you were talking, I was writing these down. You got global scale, it's never down, you've got this network effect working for you. Are you able to, are the standards there? Are you able to replicate those sort of cloud economics the APIs, the standards, that scale, even though you're not in control of this, there's not a single point of control? Can you explain sort of how that magic works? >> Yeah, well I think the API economy is for real and it's very important for us. And it's very important that, you know, we talk about API standards. There's a beautiful quote I once heard. The beautiful thing about standards is there's so many to choose from. (All laugh) And the reality is that, you know, you have standards that are official standards, and then you have the de facto standards because something just catches on and nobody blessed it. It just got popular. So that's a big part of what we're doing at IBM is being at the forefront of adopting the standards that matter. We made a big, a big investment in being Spark compatible, and, in fact, even with Queryplex. You can issue Spark SQL against Queryplex even though it's not a Spark engine, per se, but we make it look and feel like it can be Spark SQL. Another critical point here, when we talk about the API economy, and the speed of light, and movement to the cloud, and these topics you just raised, the friction of the Internet is an unbelievable friction. (John laughs) It's unbelievable. I mean, you know, when you go and watch a movie over the Internet, your home connection is just barely keeping up. I mean, you're pushing it, man. So a gigabyte, you know, a gigabyte an hour or something like that, right? Okay, and if you're a big company, maybe you have a fatter pipe. But not a lot fatter. I mean, not orders of, you're talking incredible friction. And what that means is that it is difficult for people, for companies, to en masse, move everything to the cloud. It's just not happening overnight. And, again, in the interest of doing the best possible service to our customers, that's why we've made it a fundamental element of our strategy in IBM to be a hybrid, what we call hybrid data management company, so that the APIs that we use on the cloud, they are compatible with the APIs that we use on premises. And whether that's software or private cloud. You've got software, you've got private cloud, you've got public cloud. And our APIs are going to be consistent across, and applications that you code for one will run on the other. And you can, that makes it a lot easier to migrate at your leisure when you're ready. >> Makes a lot of sense. That way you can bring cloud economics and the cloud operating model to your data, wherever the data exists. Listening to you speak, Sam, it reminds me, do you remember when Bob Metcalfe who I used to work with at IDG, predicted the collapse of the Internet? He predicted that year after year after year, in speech after speech, that it was so fragile, and you're bringing back that point of, guys, it's still, you know, a lot of friction. So that's very interesting, (laughs) as an architect. >> You think Bob's going to be happy that you brought up that he predicted the Internet was going to be its own demise? (Sam laughs) >> Well, he did it in-- >> I'm just saying. >> I'm staying out of it, man. >> He did it as a lightning rod. >> As a talking-- >> To get the industry to respond, and he had a big enough voice so he could do that. >> That it worked, right. But so I want to get back to Queryplex and the secret sauce. Somehow you're creating this data virtualization capability. What's the secret sauce behind it? >> Yeah, so I think, we're not the first to try, by the way. Actually this problem-- >> Hard problem. >> Of all these data sources all over the place, you try to make them look like one thing. People have been trying to figure out how to do that since like the '70s, okay, so, but-- >> Dave: Really hasn't worked. >> And it hasn't worked. And really, the reason why it hasn't worked is that there's been two fundamental strategies. One strategy is, you have a central coordinator that tries to speak to each of these data sources. So I've got, let's say, 10,000 data sources. I want to have one coordinator tap into each of them and have a dialogue. And what happens is that that coordinator, a server, an agent somewhere, becomes a network bottleneck. You were talking about the friction of the Internet. This is a great example of friction. One coordinator trying to speak to, you know, and collaborators becomes a point of friction. And it also becomes a point of friction not only in the Internet, but also in the computation, because he ends up doing too much of the work. There's too many things that cannot be done at the, at these edge repositories, aggregations, and joins, and so on. So all the aggregations and joins get done by this one sucker who can't keep up. >> Dave: The queue. >> Yeah, so there's a big queue, right. So that's one strategy that didn't work. The other strategy that people tried was sort of an end squared topology where every data source tries to speak to every other data source. And that doesn't scale as well. So what we've done in Queryplex is something that we think is unique and much more organic where we try to organize the universe or constellation of these data sources so that every data source speaks to a small number of peers but not a large number of peers. And that way no single source is a bottleneck, either in network or in computation. That's one trick. And the second trick is we've designed algorithms that can truly be distributed. So you can do joins in a distributed manner. You can do aggregation in a distributed manner. These are things, you know, when I say aggregation, I'm talking about simple things like a sum or an average or a median. These are super popular in, in analytic queries. Everybody wants to do a sum or an average or a median, right? But in the past, those things were hard to do in a distributed manner, getting all the participants in this universe to do some small incremental piece of the computation. So it's really these two things. Number one, this organic, dynamically forming constellation of devices. Dynamically forming a way that is latency aware. So if I'm a, if I represent a data source that's joining this universe or constellation, I'm going to try to find peers who I have a fast connection with. If all the universe of peers were out there, I'll try to find ones that are fast. And the second is having algorithms that we can all collaborate on. Those two things change the game. >> We're getting the two minute sign, and this is fascinating stuff. But so, how do you deal with the data consistency problem? You hear about eventual consistency and people using atomic clocks and-- Right, so Queryplex, you know, there's a reason we call it Queryplex not Dataplex. Queryplex is really a read-only operation. >> Dave: Oh, there you go. >> You've got all these-- >> Problem solved. (laughs) >> Problem solved. You've got all these data sources. They're already doing their, they already have data's coming in how it's coming in. >> Dave: Simple and brilliant. >> Right, and we're not changing any of that. All we're saying is, if you want to query them as one, you can query them as one. I should say a few words about the machine learning that we're doing here at the conference. We've talked about the importance of an information architecture and how that lays a foundation for machine learning. But one of the things that we're showing and demonstrating at the conference today, or at the showcase today, is how we're actually putting machine learning into the database. Create databases that learn and improve over time, learn from experience. In 1952, Arthur Samuel was a researcher at IBM who first, had one of the most fundamental breakthroughs in machine learning when he created a machine learning algorithm that will play checkers. And he programmed this checker playing game of his so it would learn over time. And then he had a great idea. He programmed it so it would play itself, thousands and thousands and thousands of times over, so it would actually learn from its own mistakes. And, you know, the evolution since then. Deep Blue playing chess and so on. The Watson Jeopardy game. We've seen tremendous potential in machine learning. We're putting into the database so databases can be smarter, faster, more consistent, and really just out of the box (snaps) performing. >> I'm glad you brought that up. I was going to ask you, because the legend Steve Mills once said to me, I had asked him a question about in-memory databases. He said ever databases have been around, in-memory databases have been around. But ML-infused databases are new. >> Sam: That's right, something totally new. >> Dave: Yeah, great. >> Well, you mentioned Deep Blue. Looking forward to having Garry Kasparov on a little bit later on here. And I know he's speaking as well. But fascinating stuff that you've covered here, Sam. We appreciate the time here. >> Thank you, thanks for having me. >> And wish you continued success, as well. >> Thank you very much. >> Sam Lightstone, IBM fellow joining us here live on the Cube. We're back with more here from New York City right after this. (electronic music)

Published Date : Feb 27 2018

SUMMARY :

Brought to you by IBM. and we're now joined by Sam Lightstone, Great to be back. Yeah, good to have you here on kind of a moldy New York day and it's all about the data. the kinds of data that you already have in your mind. I mean, it is for the big data, you know, and trying to consolidate, you know, rip the data out, of what would be a real-world application of that. and you have several of these repositories. Yeah, and one of the terms, Please, well thanks for the warning. And I know you know things, but I'm not a, suffice it to say we wanted to get a name that was But, you know, you mentioned Google Spanner. With Queryplex, you don't put data into it. and you know, think about that. And for the industry to show up, and the managed service component of that And that's one of the most expensive components and this relates to sort of Queryplex. And the reality is that, you know, and the cloud operating model to your data, To get the industry What's the secret sauce behind it? Yeah, so I think, we're not the first to try, by the way. you try to make them look like one thing. And really, the reason why it hasn't worked is that And the second trick is Right, so Queryplex, you know, Problem solved. You've got all these data sources. and really just out of the box (snaps) performing. because the legend Steve Mills once said to me, Well, you mentioned Deep Blue. live on the Cube.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Justin Warren	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Clarke	PERSON	0.99+
David Floyer	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Dave Volante	PERSON	0.99+
George	PERSON	0.99+
Dave	PERSON	0.99+
Diane Greene	PERSON	0.99+
Michele Paluso	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Sam Lightstone	PERSON	0.99+
Dan Hushon	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Teresa Carlson	PERSON	0.99+
Kevin	PERSON	0.99+
Andy Armstrong	PERSON	0.99+
Michael Dell	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Kevin Sheehan	PERSON	0.99+
Leandro Nunez	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Keith	PERSON	0.99+
Bob Metcalfe	PERSON	0.99+
VMware	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Sam	PERSON	0.99+
Larry Biagini	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Brendan	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Clarke Patterson	PERSON	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Sysplex: