Machine Learning Applied to Computationally Difficult Problems in Quantum Physics
>> My name is Franco Nori. Is a great pleasure to be here and I thank you for attending this meeting and I'll be talking about some of the work we are doing within the NTT-PHI group. I would like to thank the organizers for putting together this very interesting event. The topics studied by NTT-PHI are very exciting and I'm glad to be part of this great team. Let me first start with a brief overview of just a few interactions between our team and other groups within NTT-PHI. After this brief overview or these interactions then I'm going to start talking about machine learning and neural networks applied to computationally difficult problems in quantum physics. The first one I would like to raise is the following. Is it possible to have decoherence free interaction between qubits? And the proposed solution was a postdoc and a visitor and myself some years ago was to study decoherence free interaction between giant atoms made of superconducting qubits in the context of waveguide quantum electrodynamics. The theoretical prediction was confirmed by a very nice experiment performed by Will Oliver's group at MIT was probably so a few months ago in nature and it's called waveguide quantum electrodynamics with superconducting artificial giant atoms. And this is the first joint MIT Michigan nature paper during this NTT-PHI grand period. And we're very pleased with this. And I look forward to having additional collaborations like this one also with other NTT-PHI groups, Another collaboration inside NTT-PHI regards the quantum hall effects in a rapidly rotating polarity and condensates. And this work is mainly driven by two people, a Michael Fraser and Yoshihisa Yamamoto. They are the main driving forces of this project and this has been a great fun. We're also interacting inside the NTT-PHI environment with the groups of marandI Caltech, like McMahon Cornell, Oliver MIT, and as I mentioned before, Fraser Yamamoto NTT and others at NTT-PHI are also very welcome to interact with us. NTT-PHI is interested in various topics including how to use neural networks to solve computationally difficult and important problems. Let us now look at one example of using neural networks to study computationally difficult and hard problems. Everything we'll be talking today is mostly working progress to be extended and improve in the future. So the first example I would like to discuss is topological quantum phase transition retrieved through manifold learning, which is a variety of version of machine learning. This work is done in collaboration with Che, Gneiting and Liu all members of the group. preprint is available in the archive. Some groups are studying a quantum enhanced machine learning where machine learning is supposed to be used in actual quantum computers to use exponential speed-up and using quantum error correction we're not working on these kind of things we're doing something different. We're studying how to apply machine learning applied to quantum problems. For example how to identify quantum phases and phase transitions. We shall be talking about right now. How to achieve, how to perform quantum state tomography in a more efficient manner. That's another work of ours which I'll be showing later on. And how to assist the experimental data analysis which is a separate project which we recently published. But I will not discuss today because the experiments can produce massive amounts of data and machine learning can help to understand these huge tsunami of data provided by these experiments. Machine learning can be either supervised or unsupervised. Supervised is requires human labeled data. So we have here the blue dots have a label. The red dots have a different label. And the question is the new data corresponds to either the blue category or the red category. And many of these problems in machine learning they use the example of identifying cats and dogs but this is typical example. However, there are the cases which are also provides with there are no labels. So you're looking at the cluster structure and you need to define a metric, a distance between the different points to be able to correlate them together to create these clusters. And you can manifold learning is ideally suited to look at problems we just did our non-linearities and unsupervised. Once you're using the principle component analysis along this green axis here which are the principal axis here. You can actually identify a simple structure with linear projection when you increase the axis here, you get the red dots in one area, and the blue dots down here. But in general you could get red green, yellow, blue dots in a complicated manner and the correlations are better seen when you do an nonlinear embedding. And in unsupervised learning the colors represent similarities are not labels because there are no prior labels here. So we are interested on using machine learning to identify topological quantum phases. And this requires looking at the actual phases and their boundaries. And you start from a set of Hamiltonians or wave functions. And recall that this is difficult to do because there is no symmetry breaking, there is no local order parameters and in complicated cases you can not compute the topological properties analytically and numerically is very hard. So therefore machine learning is enriching the toolbox for studying topological quantum phase transitions. And before our work, there were quite a few groups looking at supervised machine learning. The shortcomings that you need to have prior knowledge of the system and the data must be labeled for each phase. This is needed in order to train the neural networks . More recently in the past few years, there has been increased push on looking at all supervised and Nonlinear embeddings. One of the shortcomings we have seen is that they all use the Euclidean distance which is a natural way to construct the similarity matrix. But we have proven that it is suboptimal. It is not the optimal way to look at distance. The Chebyshev distances provides better performance. So therefore the difficulty here is how to detect topological quantifies transition is a challenge because there is no local order parameters. Few years ago we thought well, three or so years ago machine learning may provide effective methods for identifying topological Features needed in the past few years. The past two years several groups are moving this direction. And we have shown that one type of machine learning called manifold learning can successfully retrieve topological quantum phase transitions in momentum and real spaces. We have also Shown that if you use the Chebyshev distance between data points are supposed to Euclidean distance, you sharpen the characteristic features of these topological quantum phases in momentum space and the afterwards we do so-called diffusion map, Isometric map can be applied to implement the dimensionality reduction and to learn about these phases and phase transition in an unsupervised manner. So this is a summary of this work on how to characterize and study topological phases. And the example we used is to look at the canonical famous models like the SSH model, the QWZ model, the quenched SSH model. We look at this momentous space and the real space, and we found that the metal works very well in all of these models. And moreover provides a implications and demonstrations for learning also in real space where the topological invariants could be either or known or hard to compute. So it provides insight on both momentum space and real space and its the capability of manifold learning is very good especially when you have the suitable metric in exploring topological quantum phase transition. So this is one area we would like to keep working on topological faces and how to detect them. Of course there are other problems where neural networks can be useful to solve computationally hard and important problems in quantum physics. And one of them is quantum state tomography which is important to evaluate the quality of state production experiments. The problem is quantum state tomography scales really bad. It is impossible to perform it for six and a half 20 qubits. If you have 2000 or more forget it, it's not going to work. So now we're seeing a very important process which is one here tomography which cannot be done because there is a computationally hard bottleneck. So machine learning is designed to efficiently handle big data. So the question we're asking a few years ago is chemistry learning help us to solve this bottleneck which is quantum state tomography. And this is a project called Eigenstate extraction with neural network tomography with a student Melkani , research scientists of the group Clemens Gneiting and I'll be brief in summarizing this now. The specific machine learning paradigm is the standard artificial neural networks. They have been recently shown in the past couple of years to be successful for tomography of pure States. Our approach will be to carry this over to mixed States. And this is done by successively reconstructing the eigenStates or the mixed states. So it is an iterative procedure where you can slowly slowly get into the desired target state. If you wish to see more details, this has been recently published in phys rev A and has been selected as a editor suggestion. I mean like some of the referees liked it. So tomography is very hard to do but it's important and machine learning can help us to do that using neural networks and these to achieve mixed state tomography using an iterative eigenstate reconstruction. So why it is so challenging? Because you're trying to reconstruct the quantum States from measurements. You have a single qubit, you have a few Pauli matrices there are very few measurements to make when you have N qubits then the N appears in the exponent. So the number of measurements grows exponentially and this exponential scaling makes the computation to be very difficult. It's prohibitively expensive for large system sizes. So this is the bottleneck is these exponential dependence on the number of qubits. So by the time you get to 20 or 24 it is impossible. It gets even worst. Experimental data is noisy and therefore you need to consider maximum-likelihood estimation in order to reconstruct the quantum state that kind of fits the measurements best. And again these are expensive. There was a seminal work sometime ago on ion-traps. The post-processing for eight qubits took them an entire week. There were different ideas proposed regarding compressed sensing to reduce measurements, linear regression, et cetera. But they all have problems and you quickly hit a wall. There's no way to avoid it. Indeed the initial estimate is that to do tomography for 14 qubits state, you will take centuries and you cannot support a graduate student for a century because you need to pay your retirement benefits and it is simply complicated. So therefore a team here sometime ago we're looking at the question of how to do a full reconstruction of 14-qubit States with in four hours. Actually it was three point three hours Though sometime ago and many experimental groups were telling us that was very popular paper to read and study because they wanted to do fast quantum state tomography. They could not support the student for one or two centuries. They wanted to get the results quickly. And then because we need to get these density matrices and then they need to do these measurements here. But we have N qubits the number of expectation values go like four to the N to the Pauli matrices becomes much bigger. A maximum likelihood makes it even more time consuming. And this is the paper by the group in Inns brook, where they go this one week post-processing and they will speed-up done by different groups and hours. Also how to do 14 qubit tomography in four hours, using linear regression. But the next question is can machine learning help with quantum state tomography? Can allow us to give us the tools to do the next step to improve it even further. And then the standard one is this one here. Therefore for neural networks there are some inputs here, X1, X2 X3. There are some weighting factors when you get an output function PHI we just call Nonlinear activation function that could be heavy side Sigmon piecewise, linear logistic hyperbolic. And this creates a decision boundary and input space where you get let's say the red one, the red dots on the left and the blue dots on the right. Some separation between them. And you could have either two layers or three layers or any number layers can do either shallow or deep. This cannot allow you to approximate any continuous function. You can train data via some cost function minimization. And then there are different varieties of neural nets. We're looking at some sequel restricted Boltzmann machine. Restricted means that the input layer speeds are not talking to each other. The output layers means are not talking to each other. And we got reasonably good results with the input layer, output layer, no hidden layer and the probability of finding a spin configuration called the Boltzmann factor. So we try to leverage Pure-state tomography for mixed-state tomography. By doing an iterative process where you start here. So there are the mixed States in the blue area the pure States boundary here. And then the initial state is here with the iterative process you get closer and closer to the actual mixed state. And then eventually once you get here, you do the final jump inside. So you're looking at a dominant eigenstate which is closest pure state and then computer some measurements and then do an iterative algorithm that to make you approach this desire state. And after you do that then you can essentially compare results with some data. We got some data for four to eight trapped-ion qubits approximate W States were produced and they were looking at let's say the dominant eigenstate is reliably recorded for any equal four, five six, seven, eight for the ion-state, for the eigenvalues we're still working because we're getting some results which are not as accurate as we would like to. So this is still work in progress, but for the States is working really well. So there is some cost scaling which is beneficial, goes like NR as opposed to N squared. And then the most relevant information on the quality of the state production is retrieved directly. This works for flexible rank. And so it is possible to extract the ion-state within network tomography. It is cost-effective and scalable and delivers the most relevant information about state generation. And it's an interesting and viable use case for machine learning in quantum physics. We're also now more recently working on how to do quantum state tomography using Conditional Generative Adversarial Networks. Usually the masters student are analyzed in PhD and then two former postdocs. So this CGANs refers to this Conditional Generative Adversarial Networks. In this framework you have two neural networks which are essentially having a dual, they're competing with each other. And one of them is called generator another one is called discriminator. And there they're learning multi-modal models from the data. And then we improved these by adding a cost of neural network layers that enable the conversion of outputs from any standard neural network into physical density matrix. So therefore to reconstruct the density matrix, the generator layer and the discriminator networks So the two networks, they must train each other on data using standard gradient-based methods. So we demonstrate that our quantum state tomography and the adversarial network can reconstruct the optical quantum state with very high fidelity which is orders of magnitude faster and from less data than a standard maximum likelihood metals. So we're excited about this. We also show that this quantum state tomography with these adversarial networks can reconstruct a quantum state in a single evolution of the generator network. If it has been pre-trained on similar quantum States. so requires some additional training. And all of these is still work in progress where some preliminary results written up but we're continuing. And I would like to thank all of you for attending this talk. And thanks again for the invitation.
SUMMARY :
And recall that this is difficult to do
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michael Fraser | PERSON | 0.99+ |
Franco Nori | PERSON | 0.99+ |
Yoshihisa Yamamoto | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
NTT-PHI | ORGANIZATION | 0.99+ |
two people | QUANTITY | 0.99+ |
two layers | QUANTITY | 0.99+ |
Clemens Gneiting | ORGANIZATION | 0.99+ |
20 | QUANTITY | 0.99+ |
MIT | ORGANIZATION | 0.99+ |
three hours | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
three layers | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
one week | QUANTITY | 0.99+ |
Melkani | PERSON | 0.99+ |
14 qubits | QUANTITY | 0.99+ |
today | DATE | 0.98+ |
one area | QUANTITY | 0.98+ |
first example | QUANTITY | 0.98+ |
Inns brook | LOCATION | 0.98+ |
six and a half 20 qubits | QUANTITY | 0.98+ |
24 | QUANTITY | 0.98+ |
four hours | QUANTITY | 0.98+ |
Will Oliver | PERSON | 0.98+ |
two centuries | QUANTITY | 0.98+ |
Few years ago | DATE | 0.98+ |
first joint | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
each phase | QUANTITY | 0.97+ |
three point | QUANTITY | 0.96+ |
Fraser Yamamoto | PERSON | 0.96+ |
two networks | QUANTITY | 0.96+ |
first one | QUANTITY | 0.96+ |
2000 | QUANTITY | 0.96+ |
six | QUANTITY | 0.95+ |
five | QUANTITY | 0.94+ |
14 qubit | QUANTITY | 0.94+ |
Boltzmann | OTHER | 0.94+ |
a century | QUANTITY | 0.93+ |
one example | QUANTITY | 0.93+ |
eight qubits | QUANTITY | 0.92+ |
Caltech | ORGANIZATION | 0.91+ |
NTT | ORGANIZATION | 0.91+ |
centuries | QUANTITY | 0.91+ |
few months ago | DATE | 0.91+ |
single | QUANTITY | 0.9+ |
Oliver | PERSON | 0.9+ |
two former postdocs | QUANTITY | 0.9+ |
single qubit | QUANTITY | 0.89+ |
few years ago | DATE | 0.88+ |
14-qubit | QUANTITY | 0.86+ |
NTT-PHI | TITLE | 0.86+ |
eight | QUANTITY | 0.86+ |
Michigan | LOCATION | 0.86+ |
past couple of years | DATE | 0.85+ |
two neural | QUANTITY | 0.84+ |
seven | QUANTITY | 0.83+ |
eight trapped- | QUANTITY | 0.83+ |
three or so years ago | DATE | 0.82+ |
Liu | PERSON | 0.8+ |
Pauli | OTHER | 0.79+ |
one type | QUANTITY | 0.78+ |
past two years | DATE | 0.77+ |
some years ago | DATE | 0.73+ |
Cornell | PERSON | 0.72+ |
McMahon | ORGANIZATION | 0.71+ |
Gneiting | PERSON | 0.69+ |
Chebyshev | OTHER | 0.68+ |
few years | DATE | 0.67+ |
phys rev | TITLE | 0.65+ |
past few years | DATE | 0.64+ |
NTT | EVENT | 0.64+ |
Che | PERSON | 0.63+ |
CGANs | ORGANIZATION | 0.61+ |
Boltzmann | PERSON | 0.57+ |
Euclidean | LOCATION | 0.57+ |
marandI | ORGANIZATION | 0.5+ |
Hamiltonians | OTHER | 0.5+ |
each | QUANTITY | 0.5+ |
NTT | TITLE | 0.44+ |
-PHI | TITLE | 0.31+ |
PHI | ORGANIZATION | 0.31+ |
Neuromorphic in Silico Simulator For the Coherent Ising Machine
>>Hi everyone, This system A fellow from the University of Tokyo before I thought that would like to thank you she and all the stuff of entity for the invitation and the organization of this online meeting and also would like to say that it has been very exciting to see the growth of this new film lab. And I'm happy to share with you today or some of the recent works that have been done either by me or by character of Hong Kong Noise Group indicating the title of my talk is a neuro more fic in silica simulator for the commenters in machine. And here is the outline I would like to make the case that the simulation in digital Tektronix of the CME can be useful for the better understanding or improving its function principles by new job introducing some ideas from neural networks. This is what I will discuss in the first part and then I will show some proof of concept of the game in performance that can be obtained using dissimulation in the second part and the production of the performance that can be achieved using a very large chaos simulator in the third part and finally talk about future plans. So first, let me start by comparing recently proposed izing machines using this table there is adapted from a recent natural tronics paper from the Village Back hard People. And this comparison shows that there's always a trade off between energy efficiency, speed and scalability that depends on the physical implementation. So in red, here are the limitation of each of the servers hardware on, Interestingly, the F p G, a based systems such as a producer, digital, another uh Toshiba purification machine, or a recently proposed restricted Bozeman machine, FPD eight, by a group in Berkeley. They offer a good compromise between speed and scalability. And this is why, despite the unique advantage that some of these older hardware have trust as the currency proposition influx you beat or the energy efficiency off memory sisters uh P. J. O are still an attractive platform for building large theorizing machines in the near future. The reason for the good performance of Refugee A is not so much that they operate at the high frequency. No, there are particle in use, efficient, but rather that the physical wiring off its elements can be reconfigured in a way that limits the funding human bottleneck, larger, funny and phenols and the long propagation video information within the system in this respect, the f. D. A s. They are interesting from the perspective, off the physics off complex systems, but then the physics of the actions on the photos. So to put the performance of these various hardware and perspective, we can look at the competition of bringing the brain the brain complete, using billions of neurons using only 20 watts of power and operates. It's a very theoretically slow, if we can see. And so this impressive characteristic, they motivate us to try to investigate. What kind of new inspired principles be useful for designing better izing machines? The idea of this research project in the future collaboration it's to temporary alleviates the limitations that are intrinsic to the realization of an optical cortex in machine shown in the top panel here. By designing a large care simulator in silicone in the bottom here that can be used for suggesting the better organization principles of the CIA and this talk, I will talk about three neuro inspired principles that are the symmetry of connections, neural dynamics. Orphan, chaotic because of symmetry, is interconnectivity. The infrastructure. No neck talks are not composed of the reputation of always the same types of non environments of the neurons, but there is a local structure that is repeated. So here's a schematic of the micro column in the cortex. And lastly, the Iraqi co organization of connectivity connectivity is organizing a tree structure in the brain. So here you see a representation of the Iraqi and organization of the monkey cerebral cortex. So how can these principles we used to improve the performance of the icing machines? And it's in sequence stimulation. So, first about the two of principles of the estimate Trian Rico structure. We know that the classical approximation of the Cortes in machine, which is a growing toe the rate based on your networks. So in the case of the icing machines, uh, the okay, Scott approximation can be obtained using the trump active in your position, for example, so the times of both of the system they are, they can be described by the following ordinary differential equations on in which, in case of see, I am the X, I represent the in phase component of one GOP Oh, Theo F represents the monitor optical parts, the district optical parametric amplification and some of the good I JoJo extra represent the coupling, which is done in the case of the measure of feedback cooking cm using oh, more than detection and refugee A then injection off the cooking time and eso this dynamics in both cases of CME in your networks, they can be written as the grand set of a potential function V, and this written here, and this potential functionally includes the rising Maccagnan. So this is why it's natural to use this type of, uh, dynamics to solve the icing problem in which the Omega I J or the Eyes in coping and the H is the extension of the rising and attorney in India and expect so. >>Not that this potential function can only be defined if the Omega I j. R. A. Symmetric. So the well known problem of >>this approach is that this potential function V that we obtain is very non convicts at low temperature, and also one strategy is to gradually deformed this landscape, using so many in process. But there is no theorem. Unfortunately, that granted convergence to the global minimum of there's even 20 and using this approach. And so this is >>why we propose toe introduce a macro structure the system or where one analog spin or one D o. P. O is replaced by a pair off one and knock spin and one error on cutting. Viable. And the addition of this chemical structure introduces a symmetry in the system, which in terms induces chaotic dynamics, a chaotic search rather than a >>learning process for searching for the ground state of the icing. Every 20 >>within this massacre structure the role of the ER variable eyes to control the amplitude off the analog spins to force the amplitude of the expense toe, become equal to certain target amplitude. A Andi. This is known by moderating the strength off the icing complaints or see the the error variable e I multiply the icing complain here in the dynamics off UH, D o p o on Then the dynamics. The whole dynamics described by this coupled equations because the e I do not necessarily take away the same value for the different, I think introduces a >>symmetry in the system, which in turn creates chaotic dynamics, which I'm showing here for solving certain current size off, um, escape problem, Uh, in which the exiled from here in the i r. From here and the value of the icing energy is shown in the bottom plots. And you see this Celtics search that visit various local minima of the as Newtonian and eventually finds the local minima Um, >>it can be shown that this modulation off the target opportunity can be used to destabilize all the local minima off the icing hamiltonian so that we're gonna do not get stuck in any of them. On more over the other types of attractors, I can eventually appear, such as the limits of contractors or quality contractors. They can also be destabilized using a moderation of the target amplitude. And so we have proposed in the past two different motivation of the target constitute the first one is a moderation that ensure the 100 >>reproduction rate of the system to become positive on this forbids the creation of any non tree retractors. And but in this work I will talk about another modulation or Uresti moderation, which is given here that works, uh, as well as this first, uh, moderation, but is easy to be implemented on refugee. >>So this couple of the question that represent the current the stimulation of the cortex in machine with some error correction, they can be implemented especially efficiently on an F B G. And here I show the time that it takes to simulate three system and eso in red. You see, at the time that it takes to simulate the X, I term the EI term, the dot product and the rising everything. Yet for a system with 500 spins analog Spain's equivalent to 500 g. O. P. S. So in f b d a. The nonlinear dynamics which, according to the digital optical Parametric amplification that the Opa off the CME can be computed in only 13 clock cycles at 300 yards. So which corresponds to about 0.1 microseconds. And this is Toby, uh, compared to what can be achieved in the measurements tobacco cm in which, if we want to get 500 timer chip Xia Pios with the one she got repetition rate through the obstacle nine narrative. Uh, then way would require 0.5 microseconds toe do this so the submission in F B J can be at least as fast as, ah one gear repression to replicate the post phaser CIA. Um, then the DOT product that appears in this differential equation can be completed in 43 clock cycles. That's to say, one microseconds at 15 years. So I pieced for pouring sizes that are larger than 500 speeds. The dot product becomes clearly the bottleneck, and this can be seen by looking at the the skating off the time the numbers of clock cycles a text to compute either the non in your optical parts, all the dog products, respect to the problem size. And and if we had a new infinite amount of resources and PGA to simulate the dynamics, then the non in optical post can could be done in the old one. On the mattress Vector product could be done in the low carrot off, located off scales as a low carrot off end and while the kite off end. Because computing the dot product involves the summing, all the terms in the products, which is done by a nephew, Jay by another tree, which heights scares a logarithmic any with the size of the system. But this is in the case if we had an infinite amount of resources on the LPGA food but for dealing for larger problems off more than 100 spins, usually we need to decompose the metrics into ah smaller blocks with the block side that are not you here. And then the scaling becomes funny non inner parts linear in the and over you and for the products in the end of you square eso typically for low NF pdf cheap P a. You know you the block size off this matrix is typically about 100. So clearly way want to make you as large as possible in order to maintain this scanning in a log event for the numbers of clock cycles needed to compute the product rather than this and square that occurs if we decompose the metrics into smaller blocks. But the difficulty in, uh, having this larger blocks eyes that having another tree very large Haider tree introduces a large finding and finance and long distance started path within the refugee. So the solution to get higher performance for a simulator of the contest in machine eyes to get rid of this bottleneck for the dot product. By increasing the size of this at the tree and this can be done by organizing Yeah, click the extra co components within the F p G A in order which is shown here in this right panel here in order to minimize the finding finance of the system and to minimize the long distance that the path in the in the fpt So I'm not going to the details of how this is implemented the PGA. But just to give you a new idea off why the Iraqi Yahiko organization off the system becomes extremely important toe get good performance for simulator organizing mission. So instead of instead of getting into the details of the mpg implementation, I would like to give some few benchmark results off this simulator, uh, off the that that was used as a proof of concept for this idea which is can be found in this archive paper here and here. I should result for solving escape problems, free connected person, randomly person minus one, spin last problems and we sure, as we use as a metric the numbers >>of the mattress Victor products since it's the bottleneck of the computation, uh, to get the optimal solution of this escape problem with Nina successful BT against the problem size here and and in red here there's propose F B J implementation and in ah blue is the numbers of retrospective product that are necessary for the C. I am without error correction to solve this escape programs and in green here for noisy means in an evening which is, uh, behavior. It's similar to the car testing machine >>and security. You see that the scaling off the numbers of metrics victor product necessary to solve this problem scales with a better exponents than this other approaches. So so So that's interesting feature of the system and next we can see what is the real time to solution. To solve this, SK instances eso in the last six years, the time institution in seconds >>to find a grand state of risk. Instances remain answers is possibility for different state of the art hardware. So in red is the F B G. A presentation proposing this paper and then the other curve represent ah, brick, a local search in in orange and center dining in purple, for example, and So you see that the scaring off this purpose simulator is is rather good and that for larger politicizes, we can get orders of magnitude faster than the state of the other approaches. >>Moreover, the relatively good scanning off the time to search in respect to problem size uh, they indicate that the FBT implementation would be faster than risk Other recently proposed izing machine, such as the Hope you know network implemented on Memory Sisters. That is very fast for small problem size in blue here, which is very fast for small problem size. But which scanning is not good on the same thing for the >>restricted Bosman machine implemented a PGA proposed by some group in Brooklyn recently again, which is very fast for small promise sizes. But which canning is bad So that, uh, this worse than the purpose approach so that we can expect that for promise sizes larger than, let's say, 1000 spins. The purpose, of course, would be the faster one. >>Let me jump toe this other slide and another confirmation that the scheme scales well that you can find the maximum cut values off benchmark sets. The G sets better cut values that have been previously found by any other >>algorithms. So they are the best known could values to best of our knowledge. And, um, or so which is shown in this paper table here in particular, the instances, Uh, 14 and 15 of this G set can be We can find better converse than previously >>known, and we can find this can vary is 100 times >>faster than the state of the art algorithm and cp to do this which is a recount. Kasich, it s not that getting this a good result on the G sets, they do not require ah, particular hard tuning of the parameters. So the tuning issuing here is very simple. It it just depends on the degree off connectivity within each graph. And so this good results on the set indicate that the proposed approach would be a good not only at solving escape problems in this problems, but all the types off graph sizing problems on Mexican province in communities. >>So given that the performance off the design depends on the height of this other tree, we can try to maximize the height of this other tree on a large F p g A onda and carefully routing the trickle components within the P G A. And and we can draw some projections of what type of performance we can achieve in >>the near future based on the, uh, implementation that we are currently working. So here you see projection for the time to solution way, then next property for solving this escape problems respect to the prime assize. And here, compared to different with such publicizing machines, particularly the digital and, you know, free to is shown in the green here, the green >>line without that's and, uh and we should two different, uh, prosthesis for this productions either that the time to solution scales as exponential off n or that >>the time of social skills as expression of square root off. So it seems according to the data, that time solution scares more as an expression of square root of and also we can be sure >>on this and this production showed that we probably can solve Prime Escape Program of Science 2000 spins to find the rial ground state of this problem with 99 success ability in about 10 seconds, which is much faster than all the other proposed approaches. So one of the future plans for this current is in machine simulator. So the first thing is that we would like to make dissimulation closer to the rial, uh, GOP or optical system in particular for a first step to get closer to the system of a measurement back. See, I am. And to do this, what is, uh, simulate Herbal on the p a is this quantum, uh, condoms Goshen model that is proposed described in this paper and proposed by people in the in the Entity group. And so the idea of this model is that instead of having the very simple or these and have shown previously, it includes paired all these that take into account out on me the mean off the awesome leverage off the, uh, European face component, but also their violence s so that we can take into account more quantum effects off the g o p. O, such as the squeezing. And then we plan toe, make the simulator open access for the members to run their instances on the system. There will be a first version in September that will >>be just based on the simple common line access for the simulator and in which will have just a classical approximation of the system. We don't know Sturm, binary weights and Museum in >>term, but then will propose a second version that would extend the current arising machine to Iraq off eight f p g. A. In which we will add the more refined models truncated bigger in the bottom question model that just talked about on the supports in which he valued waits for the rising problems and support the cement. So we will announce >>later when this is available, and Farah is working hard to get the first version available sometime in September. Thank you all, and we'll be happy to answer any questions that you have.
SUMMARY :
know that the classical approximation of the Cortes in machine, which is a growing toe So the well known problem of And so this is And the addition of this chemical structure introduces learning process for searching for the ground state of the icing. off the analog spins to force the amplitude of the expense toe, symmetry in the system, which in turn creates chaotic dynamics, which I'm showing here is a moderation that ensure the 100 reproduction rate of the system to become positive on this forbids the creation of any non tree in the in the fpt So I'm not going to the details of how this is implemented the PGA. of the mattress Victor products since it's the bottleneck of the computation, uh, You see that the scaling off the numbers of metrics victor product necessary to solve So in red is the F B G. A presentation proposing Moreover, the relatively good scanning off the But which canning is bad So that, scheme scales well that you can find the maximum cut values off benchmark the instances, Uh, 14 and 15 of this G set can be We can find better faster than the state of the art algorithm and cp to do this which is a recount. So given that the performance off the design depends on the height the near future based on the, uh, implementation that we are currently working. the time of social skills as expression of square root off. And so the idea of this model is that instead of having the very be just based on the simple common line access for the simulator and in which will have just a classical to Iraq off eight f p g. A. In which we will add the more refined models any questions that you have.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Brooklyn | LOCATION | 0.99+ |
September | DATE | 0.99+ |
100 times | QUANTITY | 0.99+ |
Berkeley | LOCATION | 0.99+ |
Hong Kong Noise Group | ORGANIZATION | 0.99+ |
CIA | ORGANIZATION | 0.99+ |
300 yards | QUANTITY | 0.99+ |
1000 spins | QUANTITY | 0.99+ |
India | LOCATION | 0.99+ |
15 years | QUANTITY | 0.99+ |
second version | QUANTITY | 0.99+ |
first version | QUANTITY | 0.99+ |
Farah | PERSON | 0.99+ |
second part | QUANTITY | 0.99+ |
first part | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
500 spins | QUANTITY | 0.99+ |
Toshiba | ORGANIZATION | 0.99+ |
first step | QUANTITY | 0.99+ |
20 | QUANTITY | 0.99+ |
more than 100 spins | QUANTITY | 0.99+ |
Scott | PERSON | 0.99+ |
University of Tokyo | ORGANIZATION | 0.99+ |
500 g. | QUANTITY | 0.98+ |
Mexican | LOCATION | 0.98+ |
both | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Kasich | PERSON | 0.98+ |
first version | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
Iraq | LOCATION | 0.98+ |
third part | QUANTITY | 0.98+ |
13 clock cycles | QUANTITY | 0.98+ |
43 clock cycles | QUANTITY | 0.98+ |
first thing | QUANTITY | 0.98+ |
0.5 microseconds | QUANTITY | 0.97+ |
Jay | PERSON | 0.97+ |
Haider | LOCATION | 0.97+ |
15 | QUANTITY | 0.97+ |
one microseconds | QUANTITY | 0.97+ |
Spain | LOCATION | 0.97+ |
about 10 seconds | QUANTITY | 0.97+ |
LPGA | ORGANIZATION | 0.96+ |
each | QUANTITY | 0.96+ |
500 timer | QUANTITY | 0.96+ |
one strategy | QUANTITY | 0.96+ |
both cases | QUANTITY | 0.95+ |
one error | QUANTITY | 0.95+ |
20 watts | QUANTITY | 0.95+ |
Nina | PERSON | 0.95+ |
about 0.1 microseconds | QUANTITY | 0.95+ |
nine | QUANTITY | 0.95+ |
each graph | QUANTITY | 0.93+ |
14 | QUANTITY | 0.92+ |
CME | ORGANIZATION | 0.91+ |
Iraqi | OTHER | 0.91+ |
billions of neurons | QUANTITY | 0.91+ |
99 success | QUANTITY | 0.9+ |
about 100 | QUANTITY | 0.9+ |
larger than 500 speeds | QUANTITY | 0.9+ |
Vector | ORGANIZATION | 0.89+ |
spins | QUANTITY | 0.89+ |
Victor | ORGANIZATION | 0.89+ |
last six years | DATE | 0.86+ |
one | QUANTITY | 0.85+ |
one analog | QUANTITY | 0.82+ |
hamiltonian | OTHER | 0.82+ |
Simulator | TITLE | 0.8+ |
European | OTHER | 0.79+ |
three neuro inspired principles | QUANTITY | 0.78+ |
Bosman | PERSON | 0.75+ |
three system | QUANTITY | 0.75+ |
trump | PERSON | 0.74+ |
Xia Pios | COMMERCIAL_ITEM | 0.72+ |
100 | QUANTITY | 0.7+ |
one gear | QUANTITY | 0.7+ |
P. | QUANTITY | 0.68+ |
FPD eight | COMMERCIAL_ITEM | 0.66+ |
first one | QUANTITY | 0.64+ |
Escape Program of Science 2000 | TITLE | 0.6+ |
Celtics | OTHER | 0.58+ |
Toby | PERSON | 0.56+ |
Machine | TITLE | 0.54+ |
Refugee A | TITLE | 0.54+ |
couple | QUANTITY | 0.53+ |
Tektronix | ORGANIZATION | 0.51+ |
Opa | OTHER | 0.51+ |
P. J. O | ORGANIZATION | 0.51+ |
Bozeman | ORGANIZATION | 0.48+ |
Photonic Accelerators for Machine Intelligence
>>Hi, Maya. Mr England. And I am an associate professor of electrical engineering and computer science at M I T. It's been fantastic to be part of this team that Professor Yamamoto put together, uh, for the entity Fire program. It's a great pleasure to report to you are update from the first year I will talk to you today about our recent work in photonic accelerators for machine intelligence. You can already get a flavor of the kind of work that I'll be presenting from the photonic integrated circuit that services a platonic matrix processor that we are developing to try toe break some of the bottle next that we encounter in inference, machine learning tasks in particular tasks like vision, games control or language processing. This work is jointly led with Dr Ryan heavily, uh, scientists at NTT Research, and he will have a poster that you should check out. Uh, in this conference should also say that there are postdoc positions available. Um, just take a look at announcements on Q P lab at m i t dot eu. So if you look at these machine learning applications, look under the hood. You see that a common feature is that they used these artificial neural networks or a and ends where you have an input layer of, let's say, and neurons and values that is connected to the first layer of, let's Say, also and neurons and connecting the first to the second layer would, if you represented it biomatrix requiring and biomatrix that has of order and squared free parameters. >>Okay, now, in traditional machine learning inference, you would have to grab these n squared values from memory. And every time you do that, it costs quite a lot of energy. Maybe you can match, but it's still quite costly in energy, and moreover, each of the input values >>has to be multiplied by that matrix. And if you multiply an end by one vector by an end square matrix, you have to do a border and squared multiplication. Okay, now, on a digital computer, you therefore have to do a voter in secret operations and memory access, which could be quite costly. But the proposition is that on a photonic integrated circuits, perhaps we could do that matrix vector multiplication directly on the P. I C itself by encoding optical fields on sending them through a programmed program into parameter and the output them would be a product of the matrix multiplied by the input vector. And that is actually the experiment. We did, uh, demonstrating that That this is, you know, in principle, possible back in 2017 and a collaboration with Professor Marine Soldier Judge. Now, if we look a little bit more closely at the device is shown here, this consists of a silicon layer that is pattern into wave guides. We do this with foundry. This was fabricated with the opposite foundry, and many thanks to our collaborators who helped make that possible. And and this thing guides light, uh, on about of these wave guides to make these two by two transformations Maxine and the kilometers, as they called >>input to input wave guides coming in to input to output wave guides going out. And by having to phase settings here data and five, we can control any arbitrary, uh, s U two rotation. Now, if I wanna have any modes coming in and modes coming out that could be represented by an S u N unitary transformation, and that's what this kind of trip allows you to dio and That's the key ingredient that really launched us in in my group. I should at this point, acknowledge the people who have made this possible and in particular point out Leon Bernstein and Alex lots as well as, uh, Ryan heavily once more. Also, these other collaborators problems important immigrant soldier dish and, of course, to a funding in particular now three entity research funding. So why optics optics has failed many times before in building computers. But why is this different? And I think the difference is that we now you know, we're not trying to build an entirely new computer out of optics were selective in how we apply optics. We should use optics for what it's good at. And that's probably not so much from non linearity, unnecessarily I mean, not memory, um, communication and fan out great in optics. And as we just said, linear algebra, you can do in optics. Fantastic. Okay, so you should make use of these things and then combine it judiciously with electronic processing to see if you can get an advantage in the entire system out of it, okay. And eso before I move on. Actually, based on the 2017 paper, uh, to startups were created, like intelligence and like, matter and the two students from my group, Nick Harris. And they responded, uh, co started this this this jointly founded by matter. And just after, you know, after, like, about two years, they've been able to create their first, uh, device >>the first metrics. Large scale processor. This is this device has called Mars has 64 input mode. 64 Promodes and the full program ability under the hood. Okay. So because they're integrating wave guides directly with Seamus Electron ICS, they were able to get all the wiring complexity, dealt with all the feedback and so forth. And this device is now able to just process a 64 or 64 unitary majors on the sly. Okay, parameters are three wants total power consumption. Um, it has ah, late and see how long it takes for a matrix to be multiplied by a factor of less than a nanosecond. And because this device works well over a pretty large 20 gigahertz, you could put many channels that are individually at one big hurts, so you can have tens of S U two s u 65 or 64 rotations simultaneously that you could do the sort of back in the envelope. Physics gives you that per multiply accumulate. You have just tens of Tempted jewels. Attn. A moment. So that's very, very competitive. That's that's awesome. Okay, so you see, plan and potentially the breakthroughs that are enabled by photonics here And actually, more recently, they actually one thing that made it possible is very cool Eyes thes My face shifters actually have no hold power, whereas our face shifters studios double modulation. These use, uh, nano scale mechanical modulators that have no hold power. So once you program a unitary, you could just hold it there. No energy consumption added over >>time. So photonics really is on the rise in computing on demand. But once again, you have to be. You have to be careful in how you compare against a chance to find where is the game to be had. So what I've talked so far about is wait stationary photonic processing. Okay, up until here. Now what tronics has that also, but it doesn't have the benefits of the coherence of optical fields transitioning through this, uh, to this to this matrix nor the bandwidth. Okay, Eso So that's Ah, that is, I think a really exciting direction. And these companies are off and they're they're building these trips and we'll see the next couple of months how well this works. Uh, on the A different direction is to have an output stationary matrix vector multiplication. And for this I want to point to this paper we wrote with Ryan, Emily and the other team members that projects the activation functions together with the weight terms onto a detector array and by the interference of the activation function and the weight term by Hamad and >>Affection. It's possible if you think about Hamad and affection that it actually automatically produces the multiplication interference turn between two optical fields gives you the multiplication between them. And so that's what that is making use of. I wanna talk a little bit more about that approach. So we actually did a careful analysis in the P R X paper that was cited in the last >>page and that analysis of the energy consumption show that this device and principal, uh, can compute at at an energy poor multiply accumulate that is below what you could theoretically dio at room temperature using irreversible computer like like our digital computers that we use in everyday life. Um, so I want to illustrate that you can see that from this plot here, but this is showing. It's the number of neurons that you have per layer. And on the vertical axis is the energy per multiply accumulate in terms of jewels. And when we make use of the massive fan out together with this photo electric multiplication by career detection, we estimate that >>we're on this curve here. So the more right. So since our energy consumption scales us and whereas for a for a digital computer it skills and squared, we, um we gain mawr as you go to a larger matrices. So for largest matrices like matrices of >>scale 1,005,000, even with present day technology, we estimate that we would hit and energy per multiply accumulate of about a center draw. Okay, But if we look at if we imagine a photonic device that >>uses a photonic system that uses devices that have already been demonstrated individually but not packaged in large system, you know, individually in research papers, we would be on this curve here where you would very quickly dip underneath the lander, a limit which corresponds to the thermodynamic limit for doing as many bit operations that you would have to do to do the same depth of neural network as we do here. And I should say that all of these numbers were computed for this simulated >>optical neural network, um, for having the equivalent, our rate that a fully digital computer that a digital computer would have and eso equivalent in the error rate. So it's limited in the error by the model itself rather than the imperfections of the devices. Okay. And we benchmark that on the amnesty data set. So that was a theoretical work that looked at the scaling limits and show that there's great, great hope to to really gain tremendously in the energy per bit, but also in the overall latency and throughput. But you shouldn't celebrate too early. You have to really do a careful system level study comparing, uh, electronic approaches, which oftentimes happened analogous approach to the optical approaches. And we did that in the first major step in this digital optical neural network. Uh, study here, which was done together with the PNG who is an electron ICS designer who actually works on, uh, tronics based on c'mon specifically made for machine on an acceleration. And Professor Joel, member of M I t. Who is also a fellow at video And what we studied there in particular, is what if we just replaced on Lee the communication part with optics, Okay. And we looked at, you know, getting the same equivalent error rates that you would have with electronic computer. And that showed that that way should have a benefit for large neural networks, because large neural networks will require lots of communication that eventually do not fit on a single Elektronik trip anymore. At that point, you have to go longer distances, and that's where the optical connections start to win out. So for details, I would like to point to that system level study. But we're now applying more sophisticated studies like this, uh, like that simulate full system simulation to our other optical networks to really see where the benefits that we might have, where we can exploit thes now. Lastly, I want to just say What if we had known nominee Garrity's that >>were actually reversible. There were quantum coherent, in fact, and we looked at that. So supposed to have the same architectural layout. But rather than having like a sexual absorption absorption or photo detection and the electronic non linearity, which is what we've done so far, you have all optical non linearity, okay? Based, for example, on a curve medium. So suppose that we had, like, a strong enough current medium so that the output from one of these transformations can pass through it, get an intensity dependent face shift and then passes into the next layer. Okay, What we did in this case is we said okay. Suppose that you have this. You have multiple layers of these, Uh um accent of the parameter measures. Okay. These air, just like the ones that we had before. >>Um, and you want to train this to do something? So suppose that training is, for example, quantum optical state compression. Okay, you have an optical quantum optical state you'd like to see How much can I compress that to have the same quantum information in it? Okay. And we trained that to discover a efficient algorithm for that. We also trained it for reinforcement, learning for black box, quantum simulation and what? You know what is particularly interesting? Perhaps in new term for one way corner repeaters. So we said if we have a communication network that has these quantum optical neural networks stationed some distance away, you come in with an optical encoded pulse that encodes an optical cubit into many individual photons. How do I repair that multi foot on state to send them the corrected optical state out the other side? This is a one way error correcting scheme. We didn't know how to build it, but we put it as a challenge to the neural network. And we trained in, you know, in simulation we trained the neural network. How toe apply the >>weights in the Matrix transformations to perform that Andi answering actually a challenge in the field of optical quantum networks. So that gives us motivation to try to build these kinds of nonlinear narratives. And we've done a fair amount of work. Uh, in this you can see references five through seven. Here I've talked about thes programmable photonics already for the the benchmark analysis and some of the other related work. Please see Ryan's poster we have? Where? As I mentioned we where we have ongoing work in benchmarking >>optical computing assed part of the NTT program with our collaborators. Um And I think that's the main thing that I want to stay here, you know, at the end is that the exciting thing, really is that the physics tells us that there are many orders of magnitude of efficiency gains, uh, that are to be had, Uh, if we you know, if we can develop the technology to realize it. I was being conservative here with three orders of magnitude. This could be six >>orders of magnitude for larger neural networks that we may have to use and that we may want to use in the future. So the physics tells us there are there is, like, a tremendous amount of gap between where we are and where we could be and that, I think, makes this tremendously exciting >>and makes the NTT five projects so very timely. So with that, you know, thank you for your attention and I'll be happy. Thio talk about any of these topics
SUMMARY :
It's a great pleasure to report to you are update from the first year I And every time you do that, it costs quite a lot of energy. And that is actually the experiment. And as we just said, linear algebra, you can do in optics. rotations simultaneously that you could do the sort of back in the envelope. You have to be careful in how you compare So we actually did a careful analysis in the P R X paper that was cited in the last It's the number of neurons that you have per layer. So the more right. Okay, But if we look at if we many bit operations that you would have to do to do the same depth of neural network And we looked at, you know, getting the same equivalent Suppose that you have this. And we trained in, you know, in simulation we trained the neural network. Uh, in this you can see references five through seven. Uh, if we you know, if we can develop the technology to realize it. So the physics tells us there are there is, you know, thank you for your attention and I'll be happy.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
2017 | DATE | 0.99+ |
Joel | PERSON | 0.99+ |
Ryan | PERSON | 0.99+ |
Nick Harris | PERSON | 0.99+ |
Emily | PERSON | 0.99+ |
Maya | PERSON | 0.99+ |
Yamamoto | PERSON | 0.99+ |
two students | QUANTITY | 0.99+ |
NTT Research | ORGANIZATION | 0.99+ |
Hamad | PERSON | 0.99+ |
Alex | PERSON | 0.99+ |
first | QUANTITY | 0.99+ |
second layer | QUANTITY | 0.99+ |
20 gigahertz | QUANTITY | 0.99+ |
less than a nanosecond | QUANTITY | 0.99+ |
first layer | QUANTITY | 0.99+ |
64 | QUANTITY | 0.99+ |
first metrics | QUANTITY | 0.99+ |
Lee | PERSON | 0.99+ |
today | DATE | 0.99+ |
tens | QUANTITY | 0.98+ |
seven | QUANTITY | 0.98+ |
England | PERSON | 0.98+ |
six | QUANTITY | 0.98+ |
1,005,000 | QUANTITY | 0.98+ |
five | QUANTITY | 0.98+ |
two | QUANTITY | 0.98+ |
64 Promodes | QUANTITY | 0.98+ |
two transformations | QUANTITY | 0.98+ |
five projects | QUANTITY | 0.97+ |
each | QUANTITY | 0.97+ |
Leon Bernstein | PERSON | 0.97+ |
M I T. | ORGANIZATION | 0.96+ |
Garrity | PERSON | 0.96+ |
NTT | ORGANIZATION | 0.96+ |
PNG | ORGANIZATION | 0.96+ |
about two years | QUANTITY | 0.94+ |
one thing | QUANTITY | 0.94+ |
one way | QUANTITY | 0.94+ |
Thio | PERSON | 0.92+ |
Marine | PERSON | 0.92+ |
two optical fields | QUANTITY | 0.89+ |
first year | QUANTITY | 0.88+ |
64 rotations | QUANTITY | 0.87+ |
one vector | QUANTITY | 0.87+ |
three entity | QUANTITY | 0.85+ |
three orders | QUANTITY | 0.84+ |
one | QUANTITY | 0.84+ |
single | QUANTITY | 0.84+ |
Professor | PERSON | 0.83+ |
next couple of months | DATE | 0.81+ |
three | QUANTITY | 0.79+ |
tens of Tempted jewels | QUANTITY | 0.74+ |
M I t. | ORGANIZATION | 0.73+ |
Andi | PERSON | 0.71+ |
Dr | PERSON | 0.68+ |
first major step | QUANTITY | 0.67+ |
Seamus Electron ICS | ORGANIZATION | 0.63+ |
65 | QUANTITY | 0.62+ |
Who | PERSON | 0.59+ |
Q P | ORGANIZATION | 0.57+ |
P | TITLE | 0.48+ |
orders | QUANTITY | 0.47+ |
Mars | LOCATION | 0.43+ |
Maxine | PERSON | 0.41+ |
UNLIST TILL 4/2 - Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning
>> Paige: Hello, everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled "Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning." I'm Paige Roberts, Opensource Relations Manager at Vertica, and I'll be your host for this session. Joining me is Vertica Software Engineer, George Larionov. >> George: Hi. >> Paige: (chuckles) That's George. So, before we begin, I encourage you guys to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. So, as soon as a question occurs to you, go ahead and type it in, and there will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to get to during that time. Any questions we don't get to, we'll do our best to answer offline. Now, alternatively, you can visit Vertica Forum to post your questions there, after the session. Our engineering team is planning to join the forums to keep the conversation going, so you can ask an engineer afterwards, just as if it were a regular conference in person. Also, reminder, you can maximize your screen by clicking the double-arrow button in the lower right corner of the slides. And, before you ask, yes, this virtual session is being recorded, and it will be available to view by the end this week. We'll send you a notification as soon as it's ready. Now, let's get started, over to you, George. >> George: Thank you, Paige. So, I've been introduced. I'm a Software Engineer at Vertica, and today I'm going to be talking about a new feature, Vertica's Integration with TensorFlow. So, first, I'm going to go over what is TensorFlow and what are neural networks. Then, I'm going to talk about why integrating with TensorFlow is a useful feature, and, finally, I am going to talk about the integration itself and give an example. So, as we get started here, what is TensorFlow? TensorFlow is an opensource machine learning library, developed by Google, and it's actually one of many such libraries. And, the whole point of libraries like TensorFlow is to simplify the whole process of working with neural networks, such as creating, training, and using them, so that it's available to everyone, as opposed to just a small subset of researchers. So, neural networks are computing systems that allow us to solve various tasks. Traditionally, computing algorithms were designed completely from the ground up by engineers like me, and we had to manually sift through the data and decide which parts are important for the task and which are not. Neural networks aim to solve this problem, a little bit, by sifting through the data themselves, automatically and finding traits and features which correlate to the right results. So, you can think of it as neural networks learning to solve a specific task by looking through the data without having human beings have to sit and sift through the data themselves. So, there's a couple necessary parts to getting a trained neural model, which is the final goal. By the way, a neural model is the same as a neural network. Those are synonymous. So, first, you need this light blue circle, an untrained neural model, which is pretty easy to get in TensorFlow, and, in edition to that, you need your training data. Now, this involves both training inputs and training labels, and I'll talk about exactly what those two things are on the next slide. But, basically, you need to train your model with the training data, and, once it is trained, you can use your trained model to predict on just the purple circle, so new training inputs. And, it will predict the training labels for you. You don't have to label it anymore. So, a neural network can be thought of as... Training a neural network can be thought of as teaching a person how to do something. For example, if I want to learn to speak a new language, let's say French, I would probably hire some sort of tutor to help me with that task, and I would need a lot of practice constructing and saying sentences in French. And a lot of feedback from my tutor on whether my pronunciation or grammar, et cetera, is correct. And, so, that would take me some time, but, finally, hopefully, I would be able to learn the language and speak it without any sort of feedback, getting it right. So, in a very similar manner, a neural network needs to practice on, example, training data, first, and, along with that data, it needs labeled data. In this case, the labeled data is kind of analogous to the tutor. It is the correct answers, so that the network can learn what those look like. But, ultimately, the goal is to predict on unlabeled data which is analogous to me knowing how to speak French. So, I went over most of the bullets. A neural network needs a lot of practice. To do that, it needs a lot of good labeled data, and, finally, since a neural network needs to iterate over the training data many, many times, it needs a powerful machine which can do that in a reasonable amount of time. So, here's a quick checklist on what you need if you have a specific task that you want to solve with a neural network. So, the first thing you need is a powerful machine for training. We discussed why this is important. Then, you need TensorFlow installed on the machine, of course, and you need a dataset and labels for your dataset. Now, this dataset can be hundreds of examples, thousands, sometimes even millions. I won't go into that because the dataset size really depends on the task at hand, but if you have these four things, you can train a good neural network that will predict whatever result you want it to predict at the end. So, we've talked about neural networks and TensorFlow, but the question is if we already have a lot of built-in machine-learning algorithms in Vertica, then why do we need to use TensorFlow? And, to answer that question, let's look at this dataset. So, this is a pretty simple toy dataset with 20,000 points, but it shows, it simulates a more complex dataset with some sort of two different classes which are not related in a simple way. So, the existing machine-learning algorithms that Vertica already has, mostly fail on this pretty simple dataset. Linear models can't really draw a good line separating the two types of points. Naïve Bayes, also, performs pretty badly, and even the Random Forest algorithm, which is a pretty powerful algorithm, with 300 trees gets only 80% accuracy. However, a neural network with only two hidden layers gets 99% accuracy in about ten minutes of training. So, I hope that's a pretty compelling reason to use neural networks, at least sometimes. So, as an aside, there are plenty of tasks that do fit the existing machine-learning algorithms in Vertica. That's why they're there, and if one of your tasks that you want to solve fits one of the existing algorithms, well, then I would recommend using that algorithm, not TensorFlow, because, while neural networks have their place and are very powerful, it's often easier to use an existing algorithm, if possible. Okay, so, now that we've talked about why neural networks are needed, let's talk about integrating them with Vertica. So, neural networks are best trained using GPUs, which are Graphics Processing Units, and it's, basically, just a different processing unit than a CPU. GPUs are good for training neural networks because they excel at doing many, many simple operations at the same time, which is needed for a neural network to be able to iterate through the training data many times. However, Vertica runs on CPUs and cannot run on GPUs at all because that's not how it was designed. So, to train our neural networks, we have to go outside of Vertica, and exporting a small batch of training data is pretty simple. So, that's not really a problem, but, given this information, why do we even need Vertica? If we train outside, then why not do everything outside of Vertica? So, to answer that question, here is a slide that Philips was nice enough to let us use. This is an example of production system at Philips. So, it consists of two branches. On the left, we have a branch with historical device log data, and this can kind of be thought of as a bunch of training data. And, all that data goes through some data integration, data analysis. Basically, this is where you train your models, whether or not they are neural networks, but, for the purpose of this talk, this is where you would train your neural network. And, on the right, we have a branch which has live device log data coming in from various MRI machines, CAT scan machines, et cetera, and this is a ton of data. So, these machines are constantly running. They're constantly on, and there's a bunch of them. So, data just keeps streaming in, and, so, we don't want this data to have to take any unnecessary detours because that would greatly slow down the whole system. So, this data in the right branch goes through an already trained predictive model, which need to be pretty fast, and, finally, it allows Philips to do some maintenance on these machines before they actually break, which helps Philips, obviously, and definitely the medical industry as well. So, I hope this slide helped explain the complexity of a live production system and why it might not be reasonable to train your neural networks directly in the system with the live device log data. So, a quick summary on just the neural networks section. So, neural networks are powerful, but they need a lot of processing power to train which can't really be done well in a production pipeline. However, they are cheap and fast to predict with. Prediction with a neural network does not require GPU anymore. And, they can be very useful in production, so we do want them there. We just don't want to train them there. So, the question is, now, how do we get neural networks into production? So, we have, basically, two options. The first option is to take the data and export it to our machine with TensorFlow, our powerful GPU machine, or we can take our TensorFlow model and put it where the data is. In this case, let's say that that is Vertica. So, I'm going to go through some pros and cons of these two approaches. The first one is bringing the data to the analytics. The pros of this approach are that TensorFlow is already installed, running on this GPU machine, and we don't have to move the model at all. The cons, however, are that we have to transfer all the data to this machine and if that data is big, if it's, I don't know, gigabytes, terabytes, et cetera, then that becomes a huge bottleneck because you can only transfer in small quantities. Because GPU machines tend to not be that big. Furthermore, TensorFlow prediction doesn't actually need a GPU. So, you would end up paying for an expensive GPU for no reason. It's not parallelized because you just have one GPU machine. You can't put your production system on this GPU, as we discussed. And, so, you're left with good results, but not fast and not where you need them. So, now, let's look at the second option. So, the second option is bringing the analytics to the data. So, the pros of this approach are that we can integrate with our production system. It's low impact because prediction is not processor intensive. It's cheap, or, at least, it's pretty much as cheap as your system was before. It's parallelized because Vertica was always parallelized, which we'll talk about in the next slide. There's no extra data movement. You get the benefit from model management in Vertica, meaning, if you import multiple TensorFlow models, you can keep track of their various attributes, when they were imported, et cetera. And, the results are right where you need them, inside your production pipeline. So, two cons are that TensorFlow is limited to just prediction inside Vertica, and, if you want to retrain your model, you need to do that outside of Vertica and, then, reimport. So, just as a recap of parallelization. Everything in Vertica is parallelized and distributed, and TensorFlow is no exception. So, when you import your TensorFlow model to your Vertica cluster, it gets copied to all the nodes, automatically, and TensorFlow will run in fenced mode which means that it the TensorFlow process fails for whatever reason, even though it shouldn't, but if it does, Vertica itself will not crash, which is obviously important. And, finally, prediction happens on each node. There are multiple threads of TensorFlow processes running, processing different little bits of data, which is faster, much faster, than processing the data line by line because it happens all in a parallelized fashion. And, so, the result is fast prediction. So, here's an example which I hope is a little closer to what everyone is used to than the usual machine learning TensorFlow example. This is the Boston housing dataset, or, rather, a small subset of it. Now, on the left, we have the input data to go back to, I think, the first slide, and, on the right, is the training label. So, the input data consists of, each line is a plot of land in Boston, along with various attributes, such as the level of crime in that area, how much industry is in that area, whether it's on the Charles River, et cetera, and, on the right, we have as the labels the median house value in that plot of land. And, so, the goal is to put all this data into the neural network and, finally, get a model which can train... I don't know, which can predict on new incoming data and predict a good housing value for that data. Now, I'm going to go through, step by step, how to actually use TensorFlow models in Vertica. So, the first step I won't go into much detail on because there are countless tutorials and resources online on how to use TensorFlow to train a neural network, so that's the first step. Second step is to save the model in TensorFlow's 'frozen graph' format. Again, this information is available online. The third step is to create a small, simple JSON file describing the inputs and outputs of the model, and what data type they are, et cetera. And, this is needed for Vertica to be able to translate from TensorFlow land into Vertica equal land, so that it can use a sequel table instead of the input set TensorFlow usually takes. So, once you have your model file and your JSON file, you want to put both of those files in a directory on a node, any node, in a Vertica cluster, and name that directory whatever you want your model to ultimately be called inside of Vertica. So, once you do that you can go ahead and import that directory into Vertica. So, this import model's function already exists in Vertica. All we added was a new category to be able to import. So, what you need to do is specify the pass to your neural network directory and specify that the category that the model is is a TensorFlow model. Once you successfully import, in order to predict, you run this brand new predict TensorFlow function, so, in this case, we're predicting on everything from the input table, which is what the star means. The model name is Boston housing net which is the name of your directory, and, then, there's a little bit of boilerplate. And, the two ID and value after the as are just the names of the columns of your outputs, and, finally, the Boston housing data is whatever sequel table you want to predict on that fits the import type of your network. And, this will output a bunch of predictions. In this case, values of houses that the network thinks are appropriate for all the input data. So, just a quick summary. So, we talked about what is TensorFlow and what are neural networks, and, then, we discussed that TensorFlow works best on GPUs because it needs very specific characteristics. That is TensorFlow works best for training on GPUs while Vertica is designed to use CPUs, and it's really good at storing and accessing a lot of data quickly. But, it's not very well designed for having neural networks trained inside of it. Then, we talked about how neural models are powerful, and we want to use them in our production flow. And, since prediction is fast, we can go ahead and do that, but we just don't want to train there, and, finally, I presented Vertica TensorFlow integration which allows importing a trained neural model, a trained neural TensorFlow model, into Vertica and predicting on all the data that is inside Vertica with few simple lines of sequel. So, thank you for listening. I'm going to take some questions, now.
SUMMARY :
and I'll be your host for this session. So, as soon as a question occurs to you, So, the second option is bringing the analytics to the data.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Vertica | ORGANIZATION | 0.99+ |
Philips | ORGANIZATION | 0.99+ |
Boston | LOCATION | 0.99+ |
George | PERSON | 0.99+ |
99% | QUANTITY | 0.99+ |
20,000 points | QUANTITY | 0.99+ |
second option | QUANTITY | 0.99+ |
Charles River | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
thousands | QUANTITY | 0.99+ |
Paige Roberts | PERSON | 0.99+ |
third step | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
George Larionov | PERSON | 0.99+ |
first option | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Second step | QUANTITY | 0.99+ |
Paige | PERSON | 0.99+ |
each line | QUANTITY | 0.99+ |
two branches | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
two options | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
300 trees | QUANTITY | 0.99+ |
two approaches | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
first slide | QUANTITY | 0.99+ |
TensorFlow | TITLE | 0.99+ |
Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning | TITLE | 0.99+ |
two types | QUANTITY | 0.99+ |
two different classes | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
Vertica | TITLE | 0.99+ |
first one | QUANTITY | 0.98+ |
two cons | QUANTITY | 0.97+ |
about ten minutes | QUANTITY | 0.97+ |
two hidden layers | QUANTITY | 0.97+ |
French | OTHER | 0.96+ |
each node | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
end this week | DATE | 0.94+ |
two ID | QUANTITY | 0.91+ |
four things | QUANTITY | 0.89+ |
Wrap | Machine Learning Everywhere 2018
>> Narrator: Live from New York, it's theCUBE. Covering machine learning everywhere. Build your ladder to AI. Brought to you by IBM. >> Welcome back to IBM's Machine Learning Everywhere. Build your ladder to AI, along with Dave Vellante, John Walls here, wrapping up here in New York City. Just about done with the programming here in Midtown. Dave, let's just take a step back. We've heard a lot, seen a lot, talked to a lot of folks today. First off, tell me, AI. We've heard some optimistic outlooks, some, I wouldn't say pessimistic, but some folks saying, "Eh, hold off." Not as daunting as some might think. So just your take on the artificial intelligence conversation we've heard so far today. >> I think generally, John, that people don't realize what's coming. I think the industry, in general, our industry, technology industry, the consumers of technology, the businesses that are out there, they're steeped in the past, that's what they know. They know what they've done, they know the history and they're looking at that as past equals prologue. Everybody knows that's not the case, but I think it's hard for people to envision what's coming, and what the potential of AI is. Having said that, Jennifer Shin is a near-term pessimist on the potential for AI, and rightly so. There are a lot of implementation challenges. But as we said at the open, I'm very convinced that we are now entering a new era. The Hadoop big data industry is going to pale in comparison to what we're seeing. And we're already seeing very clear glimpses of it. The obvious things are Airbnb and Uber, and the disruptions that are going on with Netflix and over-the-top programming, and how Google has changed advertising, and how Amazon is changing and has changed retail. But what you can see, and again, the best examples are Apple getting into financial services, moving into healthcare, trying to solve that problem. Amazon buying a grocer. The rumor that I heard about Amazon potentially buying Nordstrom, which my wife said is a horrible idea. (John laughs) But think about the fact that they can do that is a function of, that they are a digital-first company. Are built around data, and they can take those data models and they can apply it to different places. Who would have thought, for example, that Alexa would be so successful? That Siri is not so great? >> Alexa's become our best friend. >> And it came out of the blue. And it seems like Google has a pretty competitive piece there, but I can almost guarantee that doing this with our thumbs is not the way in which we're going to communicate in the future. It's going to be some kind of natural language interface that's going to rely on artificial intelligence and machine learning and the like. And so, I think it's hard for people to envision what's coming, other than fast forward where machines take over the world and Stephen Hawking and Elon Musk say, "Hey, we should be concerned." Maybe they're right, not in the next 10 years. >> You mentioned Jennifer, we were talking about her and the influencer panel, and we've heard from others as well, it's a combination of human intelligence and artificial intelligence. That combination's more powerful than just artificial intelligence, and so, there is a human component to this. So, for those who might be on the edge of their seat a little bit, or looking at this from a slightly more concerning perspective, maybe not the case. Maybe not necessary, is what you're thinking. >> I guess at the end of the day, the question is, "Is the world going to be a better place with all this AI? "Are we going to be more prosperous, more productive, "healthier, safer on the roads?" I am an optimist, I come down on the side of yes. I would not want to go back to the days where I didn't have GPS. That's worth it to me. >> Can you imagine, right? If you did that now, you go back five years, just five years from where we are now, back to where we were. Waze was nowhere, right? >> All the downside of these things, I feel is offset by that. And I do think it's incumbent upon the industry to try to deal with the problem, especially with young people, the blue light problem. >> John: The addictive issue. >> That's right. But I feel like those downsides are manageable, and the upsides are of enough value that society is going to continue to move forward. And I do think that humans and machines are going to continue to coexist, at least in the near- to mid- reasonable long-term. But the question is, "What can machines "do that humans can't do?" And "What can humans do that machines can't do?" And the answer to that changes every year. It's like I said earlier, not too long ago, machines couldn't climb stairs. They can now, robots can climb stairs. Can they negotiate? Can they identify cats? Who would've imagined that all these cats on the Internet would've led to facial recognition technology. It's improving very, very rapidly. So, I guess my point is that that is changing very rapidly, and there's no question it's going to have an impact on society and an impact on jobs, and all those other negative things that people talk about. To me, the key is, how do we embrace that and turn it into an opportunity? And it's about education, it's about creativity, it's about having multi-talented disciplines that you can tap. So we talked about this earlier, not just being an expert in marketing, but being an expert in marketing with digital as an understanding in your toolbox. So it's that two-tool star that I think is going to emerge. And maybe it's more than two tools. So that's how I see it shaping up. And the last thing is disruption, we talked a lot about disruption. I don't think there's any industry that's safe. Colin was saying, "Well, certain industries "that are highly regulated-" In some respects, I can see those taking longer. But I see those as the most ripe for disruption. Financial services, healthcare. Can't we solve the HIPAA challenge? We can't get access to our own healthcare information. Well, things like artificial intelligence and blockchain, we were talking off-camera about blockchain, those things, I think, can help solve the challenge of, maybe I can carry around my health profile, my medical records. I don't have access to them, it's hard to get them. So can things like artificial intelligence improve our lives? I think there's no question about it. >> What about, on the other side of the coin, if you will, the misuse concerns? There are a lot of great applications. There are a lot of great services. As you pointed out, a lot of positive, a lot of upside here. But as opportunities become available and technology develops, that you run the risk of somebody crossing the line for nefarious means. And there's a lot more at stake now because there's a lot more of us out there, if you will. So, how do you balance that? >> There's no question that's going to happen. And it has to be managed. But even if you could stop it, I would say you shouldn't because the benefits are going to outweigh the risks. And again, the question we asked the panelists, "How far can we take machines? "How far can we go?" That's question number one, number two is, "How far should we go?" We're not even close to the "should we go" yet. We're still on the, "How far can we go?" Jennifer was pointing out, I can't get my password reset 'cause I got to call somebody. That problem will be solved. >> So, you're saying it's more of a practical consideration now than an ethical one, right now? >> Right now. Moreso, and there's certainly still ethical considerations, don't get me wrong, but I see light at the end of the privacy tunnel, I see artificial intelligence as, well, analytics is helping us solve credit card fraud and things of that nature. Autonomous vehicles are just fascinating, right? Both culturally, we talked about that, you know, we learned how to drive a stick shift. (both laugh) It's a funny story you told me. >> Not going to worry about that anymore, right? >> But it was an exciting time in our lives, so there's a cultural downside of that. I don't know what the highway death toll number is, but it's enormous. If cell phones caused that many deaths, we wouldn't be using them. So that's a problem that I think things like artificial intelligence and machine intelligence can solve. And then the other big thing that we talked about is, I see a huge gap between traditional companies and these born-in-the-cloud, born-data-oriented companies. We talked about the top five companies by market cap. Microsoft, Amazon, Facebook, Alphabet, which is Google, who am I missing? >> John: Apple. >> Apple, right. And those are pretty much very much data companies. Apple's got the data from the phones, Google, we know where they get their data, et cetera, et cetera. Traditional companies, however, their data resides in silos. Jennifer talked about this, Craig, as well as Colin. Data resides in silos, it's hard to get to. It's a very human-driven business and the data is bolted on. With the companies that we just talked about, it's a data-driven business, and the humans have expertise to exploit that data, which is very important. So there's a giant skills gap in existing companies. There's data silos. The other thing we touched on this is, where does innovation come from? Innovation drives value drives disruption. So the innovation comes from data. He or she who has the best data wins. It comes from artificial intelligence, and the ability to apply artificial intelligence and machine learning. And I think something that we take for granted a lot, but it's cloud economics. And it's more than just, and somebody, one of the folks mentioned this on the interview, it's more than just putting stuff in the cloud. It's certainly managed services, that's part of it. But it's also economies of scale. It's marginal economics that are essentially zero. It's speed, it's low latency. It's, and again, global scale. You combine those things, data, artificial intelligence, and cloud economics, that's where the innovation is going to come from. And if you think about what Uber's done, what Airbnb have done, where Waze came from, they were picking and choosing from the best digital services out there, and then developing their own software from this, what I say my colleague Dave Misheloff calls this matrix. And, just to repeat, that matrix is, the vertical matrix is industries. The horizontal matrix are technology platforms, cloud, data, mobile, social, security, et cetera. They're building companies on top of that matrix. So, it's how you leverage the matrix is going to determine your future. Whether or not you get disrupted, whether your the disruptor or the disruptee. It's not just about, we talked about this at the open. Cloud, SaaS, mobile, social, big data. They're kind of yesterday's news. It's now new artificial intelligence, machine intelligence, deep learning, machine learning, cognitive. We're still trying to figure out the parlance. You could feel the changes coming. I think this matrix idea is very powerful, and how that gets leveraged in organizations ultimately will determine the levels of disruption. But every single industry is at risk. Because every single industry is going digital, digital allows you to traverse industries. We've said it many times today. Amazon went from bookseller to content producer to grocer- >> John: To grocer now, right? >> To maybe high-end retailer. Content company, Apple with Apple Pay and companies getting into healthcare, trying to solve healthcare problems. The future of warfare, you live in the Beltway. The future of warfare and cybersecurity are just coming together. One of the biggest issues I think we face as a country is we have fake news, we're seeing the weaponization of social media, as James Scott said on theCUBE. So, all these things are coming together that I think are going to make the last 10 years look tame. >> Let's just switch over to the currency of AI, data. And we've talked to, Sam Lightstone today was talking about the database querying that they've developed with the Plex product. Some fascinating capabilities now that make it a lot richer, a lot more meaningful, a lot more relevant. And that seems to be, really, an integral step to making that stuff come alive and really making it applicable to improving your business. Because they've come up with some fantastic new ways to squeeze data that's relevant out, and get it out to the user. >> Well, if you think about what I was saying earlier about data as a foundational core and human expertise around it, versus what most companies are, is human expertise with data bolted on or data in silos. What was interesting about Queryplex, I think they called it, is it essentially virtualizes the data. Well, what does that mean? That means i can have data in place, but I can have access to that data, I can democratize that data, make it accessible to people so that they can become data-driven, data is the core. Now, what I don't know, and I don't know enough, just heard about it today, I missed that announcement, I think they announced it a year ago. He mentioned DB2, he mentioned Netezza. Most of the world is not on DB2 and Netezza even though IBM customers are. I think they can get to Hadoop data stores and other data stores, I just don't know how wide that goes, what the standards look like. He joked about the standards as, the great thing about standards is- >> There are a lot of 'em. (laughs) >> There's always another one you can pick if this one fails. And he's right about that. So, that was very interesting. And so, this is again, the question, can traditional companies close that machine learning, machine intelligence, AI gap? Close being, close the gap that the big five have created. And even the small guys, small guys like Uber and Airbnb, and so forth, but even those guys are getting disrupted. The Airbnbs and the Ubers, right? Again, blockchain comes in and you say, "Why do I need a trusted third party called Uber? "Why can't I do this on the blockchain?" I predict you're going to see even those guys get disrupted. And I'll say something else, it's hard to imagine that a Google or a Facebook can be unseated. But I feel like we may be entering an era where this is their peak. Could be wrong, I'm an Apple customer. I don't know, I'm not as enthralled as I used to be. They got trillions in the bank. But is it possible that opensource and blockchain and the citizen developer, the weekend and nighttime developers, can actually attack that engine of growth for the last 10 years, 20 years, and really break that monopoly? The Internet has basically become an oligopoly where five companies, six companies, whatever, 10 companies kind of control things. Is it possible that opensource software, AI, cryptography, all this activity could challenge the status quo? Being in this business as long as I have, things never stay the same. Leaders come, leaders go. >> I just want to say, never say never. You don't know. >> So, it brings it back to IBM, which is interesting to me. It was funny, I was asking Rob Thomas a question about disruption, and I think he misinterpreted it. I think he was thinking that I was saying, "Hey, you're going to get disrupted by all these little guys." IBM's been getting disrupted for years. They know how to reinvent. A lot of people criticize IBM, how many quarters they haven't had growth, blah, blah, blah, but IBM's made some big, big bets on the future. People criticizing Watson, but it's going to be really interesting to see how all this investment that IBM has made is going to pay off. They were early on. People in the Valley like to say, "Well, the Facebooks, and even Amazon, "Google, they got the best AI. "IBM is not there with them." But think about what IBM is trying to do versus what Google is doing. They're very consumer-oriented, solving consumer problems. Consumers have really led the consumerization of IT, that's true, but none of those guys are trying to solve cancer. So IBM is talking about some big, hairy, audacious goals. And I'm not as pessimistic as some others you've seen in the trade press, it's popular to do. So, bringing it back to IBM, I saw IBM as trying to disrupt itself. The challenge IBM has, is it's got a lot of legacy software products that have purchased over the years. And it's got to figure out how to get through those. So, things like Queryplex allow them to create abstraction layers. Things like Bluemix allow them to bring together their hundreds and hundreds and hundreds of SaaS applications. That takes time, but I do see IBM making some big investments to disrupt themselves. They've got a huge analytics business. We've been covering them for quite some time now. They're a leader, if not the leader, in that business. So, their challenge is, "Okay, how do we now "apply all these technologies to help "our customers create innovation?" What I like about the IBM story is they're not out saying, "We're going to go disrupt industries." Silicon Valley has a bifurcated disruption agenda. On the one hand, they're trying to, cloud, and SaaS, and mobile, and social, very disruptive technologies. On the other hand, is Silicon Valley going to disrupt financial services, healthcare, government, education? I think they have plans to do so. Are they going to be able to execute that dual disruption agenda? Or are the consumers of AI and the doers of AI going to be the ones who actually do the disrupting? We'll see, I mean, Uber's obviously disrupted taxis, Silicon Valley company. Is that too much to ask Silicon Valley to do? That's going to be interesting to see. So, my point is, IBM is not trying to disrupt its customers' businesses, and it can point to Amazon trying to do that. Rather, it's saying, "We're going to enable you." So it could be really interesting to see what happens. You're down in DC, Jeff Bezos spent a lot of time there at the Washington Post. >> We just want the headquarters, that's all we want. We just want the headquarters. >> Well, to the point, if you've got such a growing company monopoly, maybe you should set up an HQ2 in DC. >> Three of the 20, right, for a DC base? >> Yeah, he was saying the other day that, maybe we should think about enhancing, he didn't call it social security, but the government, essentially, helping people plan for retirement and the like. I heard that and said, "Whoa, is he basically "telling us he's going to put us all out of jobs?" (both laugh) So, that, if I'm a customer of Amazon's, I'm kind of scary. So, one of the things they should absolutely do is spin out AWS, I think that helps solve that problem. But, back to IBM, Ginni Rometty was very clear at the World of Watson conference, the inaugural one, that we are not out trying to compete with our customers. I would think that resonates to a lot of people. >> Well, to be continued, right? Next month, back with IBM again? Right, three days? >> Yeah, I think third week in March. Monday, Tuesday, Wednesday, theCUBE's going to be there. Next week we're in the Bahamas. This week, actually. >> Not as a group taking vacation. Actually a working expedition. >> No, it's that blockchain conference. Actually, it's this week, what am I saying next week? >> Although I'm happy to volunteer to grip on that shoot, by the way. >> Flying out tomorrow, it's happening fast. >> Well, enjoyed this, always good to spend time with you. And good to spend time with you as well. So, you've been watching theCUBE, machine learning everywhere. Build your ladder to AI. Brought to you by IBM. Have a good one. (techno music)
SUMMARY :
Brought to you by IBM. talked to a lot of folks today. and they can apply it to different places. And so, I think it's hard for people to envision and so, there is a human component to this. I guess at the end of the day, the question is, back to where we were. to try to deal with the problem, And the answer to that changes every year. What about, on the other side of the coin, because the benefits are going to outweigh the risks. of the privacy tunnel, I see artificial intelligence as, And then the other big thing that we talked about is, And I think something that we take that I think are going to make the last 10 years look tame. And that seems to be, really, an integral step I can democratize that data, make it accessible to people There are a lot of 'em. The Airbnbs and the Ubers, right? I just want to say, never say never. People in the Valley like to say, We just want the headquarters, that's all we want. Well, to the point, if you've got such But, back to IBM, Ginni Rometty was very clear Monday, Tuesday, Wednesday, theCUBE's going to be there. Actually a working expedition. No, it's that blockchain conference. to grip on that shoot, by the way. And good to spend time with you as well.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Diane Greene | PERSON | 0.99+ |
Eric Herzog | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Jeff Hammerbacher | PERSON | 0.99+ |
Diane | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Mark Albertson | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Colin | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Cisco | ORGANIZATION | 0.99+ |
Rob Hof | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Tricia Wang | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Singapore | LOCATION | 0.99+ |
James Scott | PERSON | 0.99+ |
Scott | PERSON | 0.99+ |
Ray Wang | PERSON | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
Brian Walden | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Verizon | ORGANIZATION | 0.99+ |
Jeff Bezos | PERSON | 0.99+ |
Rachel Tobik | PERSON | 0.99+ |
Alphabet | ORGANIZATION | 0.99+ |
Zeynep Tufekci | PERSON | 0.99+ |
Tricia | PERSON | 0.99+ |
Stu | PERSON | 0.99+ |
Tom Barton | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Sandra Rivera | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Ginni Rometty | PERSON | 0.99+ |
France | LOCATION | 0.99+ |
Jennifer Lin | PERSON | 0.99+ |
Steve Jobs | PERSON | 0.99+ |
Seattle | LOCATION | 0.99+ |
Brian | PERSON | 0.99+ |
Nokia | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Scott Raynovich | PERSON | 0.99+ |
Radisys | ORGANIZATION | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Eric | PERSON | 0.99+ |
Amanda Silver | PERSON | 0.99+ |
Machine Learning Panel | Machine Learning Everywhere 2018
>> Announcer: Live from New York, it's theCUBE. Covering machine learning everywhere. Build your ladder to AI. Brought to you by IBM. Welcome back to New York City. Along with Dave Vellante, I'm John Walls. We continue our coverage here on theCUBE of machine learning everywhere. Build your ladder to AI, IBM our host here today. We put together, occasionally at these events, a panel of esteemed experts with deep perspectives on a particular subject. Today our influencer panel is comprised of three well-known and respected authorities in this space. Glad to have Colin Sumpter here with us. He's the man with the mic, by the way. He's going to talk first. But, Colin is an IT architect with CrowdMole. Thank you for being with us, Colin. Jennifer Shin, those of you on theCUBE, you're very familiar with Jennifer, a long time Cuber. Founded 8 Path Solutions, on the faculty at NYU and Cal Berkeley, and also with us is Craig Brown, a big data consultant. And a home game for all of you guys, right, more or less here we are in the city. So, thanks for having us, we appreciate the time. First off, let's just talk about the title of the event, Build Your Path... Or Your Ladder, excuse me, to AI. What are those steps on that ladder, Colin? The fundamental steps that you've got to jump on, or step on, in order to get to that true AI environment? >> In order to get to that true AI environment, John, is a matter of mastering or organizing your information well enough to perform analytics. That'll give you two choices to do either linear regression or supervised classification, and then you actually have enough organized data to talk to your team and organize your team around that data to begin that ladder to successively benefit from your data science program. >> Want to take a stab at it, Jennifer? >> So, I would say, compute, right? You need to have the right processing, or at least the ability to scale out to be able to process the algorithm fast enough to be able to find value in your data. I think the other thing is, of course, the data source itself. Do you have right data to answer the questions you want to answer? So, I think, without those two things, you'll either have a lot of great data that you can't process in time, or you'll have a great process or a great algorithm that has no real information, so your output is useless. I think those are the fundamental things you really do need to have any sort of AI solution built. >> I'll take a stab at it from the business side. They have to adopt it first. They have to believe that this is going to benefit them and that the effort that's necessary in order to build into the various aspects of algorithms and data subjects is there, so I think adopting the concept of machine learning and the development aspects that it takes to do that is a key component to building the ladder. >> So this just isn't toe in the water, right? You got to dive in the deep end, right? >> Craig: Right. >> It gets to culture. If you look at most organizations, not the big five market capped companies, but most organizations, data is not at their core. Humans are at their core, human expertise and data is sort of bolted on, but that has to change, or they're going to get disrupted. Data has to be at the core, maybe the human expertise leverages that data. What do you guys seeing with end customers in terms of their readiness for this transformation? >> What I'm seeing customers spending time right now is getting out of the silos. So, when you speak culture, that's primarily what the culture surrounds. They develop applications with functionality as a silo, and data specific to that functionality is the component in which they look at data. They have to get out of that mindset and look at the data holistically, and ultimately, in these events, looking at it as an asset. >> The data is a shared resource. >> Craig: Right, correct. >> Okay, and again, with the exception of the... Whether it's Google, Facebook, obviously, but the Ubers, the AirBNB's, etc... With the exception of those guys, most customers aren't there. Still, the data is in silos, they've got myriad infrastructure. Your thoughts, Jennifer? >> I'm also seeing sort of a disconnect between the operationalizing team, the team that runs these codes, or has a real business need for it, and sometimes you'll see corporations with research teams, and there's sort of a disconnect between what the researchers do and what these operations, or marketing, whatever domain it is, what they're doing in terms of a day to day operation. So, for instance, a researcher will look really deep into these algorithms, and may know a lot about deep learning in theory, in theoretical world, and might publish a paper that's really interesting. But, that application part where they're actually being used every day, there's this difference there, where you really shouldn't have that difference. There should be more alignment. I think actually aligning those resources... I think companies are struggling with that. >> So, Colin, we were talking off camera about RPA, Robotic Process Automation. Where's the play for machine intelligence and RPA? Maybe, first of all, you could explain RPA. >> David, RPA stands for Robotic Process Automation. That's going to enable you to grow and scale a digital workforce. Typically, it's done in the cloud. The way RPA and Robotic Process Automation plays into machine learning and data science, is that it allows you to outsource business processes to compensate for the lack of human expertise that's available in the marketplace, because you need competency to enable the technology to take advantage of these new benefits coming in the market. And, when you start automating some of these processes, you can keep pace with the innovation in the marketplace and allow the human expertise to gradually grow into these new data science technologies. >> So, I was mentioning some of the big guys before. Top five market capped companies: Google, Amazon, Apple, Facebook, Microsoft, all digital. Microsoft you can argue, but still, pretty digital, pretty data oriented. My question is about closing that gap. In your view, can companies close that gap? How can they close that gap? Are you guys helping companies close that gap? It's a wide chasm, it seems. Thoughts? >> The thought on closing the chasm is... presenting the technology to the decision-makers. What we've learned is that... you don't know what you don't know, so it's impossible to find the new technologies if you don't have the vocabulary to just begin a simple research of these new technologies. And, to close that gap, it really comes down to the awareness, events like theCUBE, webinars, different educational opportunities that are available to line of business owners, directors, VP's of systems and services, to begin that awareness process, finding consultants... begin that pipeline enablement to begin allowing the business to take advantage and harness data science, machine learning and what's coming. >> One of the things I've noticed is that there's a lot of information out there, like everyone a webinar, everyone has tutorials, but there's a lot of overlap. There aren't that many very sophisticated documents you can find about how to implement it in real world conditions. They all tend to use the same core data set, a lot of these machine learning tutorials you'll find, which is hilarious because the data set's actually very small. And I know where it comes from, just from having the expertise, but it's not something I'd ever use in the real world. The level of skill you need to be able to do any of these methodologies. But that's what's out there. So, there's a lot of information, but they're kind of at a rudimentary level. They're not really at that sophisticated level where you're going to learn enough to deploy in real world conditions. One of the things I'm noticing is, with the technical teams, with the data science team, machine learning teams, they're kind of using the same methodologies I used maybe 10 years ago. Because the management who manage these teams are not technical enough. They're business people, so they don't understand how to guide them, how to explain hey maybe you shouldn't do that with your code, because that's actually going to cause a problem. You should use parallel code, you should make sure everything is running in parallel so compute's faster. But, if these younger teams are actually learning for the first time, they make the same mistakes you made 10 years ago. So, I think, what I'm noticing is that lack of leadership is partly one of the reasons, and also the assumption that a non-technical person can lead the technical team. >> So, it's just not skillset on the worker level, if you will. It's also knowledge base on the decision-maker level. That's a bad place to be, right? So, how do you get into the door to a business like that? Obviously, and we've talked about this a little bit today, that some companies say, "We're not data companies, we're not digital companies, we sell widgets." Well, yeah but you sell widgets and you need this to sell more widgets. And so, how do you get into the door and talk about this problem that Jennifer just cited? You're signing the checks, man. You're going to have to get up to speed on this otherwise you're not going to have checks to sign in three to five years, you're done! >> I think that speaks to use cases. I think that, and what I'm actually saying at customers, is that there's a disconnect and an understanding from the executive teams and the low-level technical teams on what the use case actually means to the business. Some of the use cases are operational in nature. Some of the use cases are data in nature. There's no real conformity on what does the use case mean across the organization, and that understanding isn't there. And so, the CIO's, the CEO's, the CTO's think that, "Okay, we're going to achieve a certain level of capability if we do a variety of technological things," and the business is looking to effectively improve some or bring some efficiency to business processes. At each level within the organization, the understanding is at the level at which the discussions are being made. And so, I'm in these meetings with senior executives and we have lots of ideas on how we can bring efficiencies and some operational productivity with technology. And then we get in a meeting with the data stewards and "What are these guys talking about? They don't understand what's going on at the data level and what data we have." And then that's where the data quality challenges come into the conversation, so I think that, to close that cataclysm, we have to figure out who needs to be in the room to effectively help us build the right understanding around the use cases and then bring the technology to those use cases then actually see within the organization how we're affecting that. >> So, to change the questioning here... I want you guys to think about how capable can we make machines in the near term, let's talk next decade near term. Let's say next decade. How capable can we make machines and are there limits to what we should do? >> That's a tough one. Although you want to go next decade, we're still faced with some of the challenges today in terms of, again, that adoption, the use case scenarios, and then what my colleagues are saying here about the various data challenges and dev ops and things. So, there's a number of things that we have to overcome, but if we can get past those areas in the next decade, I don't think there's going to be much of a limit, in my opinion, as to what the technology can do and what we can ask the machines to produce for us. As Colin mentioned, with RPA, I think that the capability is there, right? But, can we also ultimately, as humans, leverage that capability effectively? >> I get this question a lot. People are really worried about AI and robots taking over, and all of that. And I go... Well, let's think about the example. We've all been online, probably over the weekend, maybe it's 3 or 4 AM, checking your bank account, and you get an error message your password is wrong. And we swear... And I've been there where I'm like, "No, no my password's right." And it keeps saying that the password is wrong. Of course, then I change it, and it's still wrong. Then, the next day when I login, I can login, same password, because they didn't put a great error message there. They just defaulted to wrong password when it's probably a server that's down. So, there are these basics or processes that we could be improving which no one's improving. So you think in that example, how many customer service reps are going to be contacted to try to address that? How many IT teams? So, for every one of these bad technologies that are out there, or technologies that are not being run efficiently or run in a way that makes sense, you actually have maybe three people that are going to be contacted to try to resolve an issue that actually maybe could have been avoided to begin with. I feel like it's optimistic to say that robots are going to take over, because you're probably going to need more people to put band-aids on bad technology and bad engineering, frankly. And I think that's the reality of it. If we had hoverboards, that would be great, you know? For a while, we thought we did, right? But we found out, oh it's not quite hoverboards. I feel like that might be what happens with AI. We might think we have it, and then go oh wait, it's not really what we thought it was. >> So there are real limits, certainly in the near to mid to maybe even long term, that are imposed. But you're an optimist. >> Yeah. Well, not so much with AI but everything else, sure. (laughing) AI, I'm a little bit like, "Well, it would be great, but I'd like basic things to be taken care of every day." So, I think the usefulness of technology is not something anyone's talking about. They're talking about this advancement, that advancement, things people don't understand, don't know even how to use in their life. Great, great is an idea. But, what about useful things we can actually use in our real life? >> So block and tackle first, and then put some reverses in later, if you will, to switch over to football. We were talking about it earlier, just about basics. Fundamentals, get your fundamentals right and then you can complement on that with supplementary technologies. Craig, Colin? >> Jen made some really good points and brought up some very good points, and so has... >> John: Craig. >> Craig, I'm sorry. (laughing) >> Craig: It's alright. >> 10 years out, Jen and Craig spoke to false positives. And false positives create a lot of inefficiency in businesses. So, when you start using machine learning and AI 10 years from now, maybe there's reduced false positives that have been scored in real time, allowing teams not to have their time consumed and their business resources consumed trying to resolve false positives. These false positives have a business value that, today, some businesses might not be able to record. In financial services, banks count money not lended. But, in every day business, a lot of businesses aren't counting the monetary consequences of false positives and the drag it has on their operational ability and capacity. >> I want to ask you guys about disruption. If you look at where the disruption, the digital disruptions, have taken place, obviously retail, certainly advertising, certainly content businesses... There are some industries that haven't been highly disruptive: financial services, insurance, we were talking earlier about aerospace, defense rather. Is any business, any industry, safe from digital disruption? >> There are. Certain industries are just highly regulated: healthcare, financial services, real estate, transactional law... These are very extremely regulated technologies, or businesses, that are... I don't want to say susceptible to technology, but they can be disrupted at a basic level, operational efficiency, to make these things happen, these business processes happen more rapidly, more accurately. >> So you guys buy that? There's some... I'd like to get a little debate going here. >> So, I work with the government, and the government's trying to change things. I feel like that's kind of a sign because they tend to be a little bit slower than, say, other private industries, or private companies. They have data, they're trying to actually put it into a system, meaning like if they have files... I think that, at some point, I got contacted about putting files that they found, like birth records, right, marriage records, that they found from 100-plus years ago and trying to put that into the system. By the way, I did look into it, there was no way to use AI for that, because there was no standardization across these files, so they have half a million files, but someone's probably going to manually have to enter that in. The reality is, I think because there's a demand for having things be digital, we aren't likely to see a decrease in that. We're not going to have one industry that goes, "Oh, your files aren't digital." Probably because they also want to be digital. The companies themselves, the employees themselves, want to see that change. So, I think there's going to be this continuous move toward it, but there's the question of, "Are we doing it better?" It is better than, say, having it on paper sometimes? Because sometimes I just feel like it's easier on paper than to have to look through my phone, look through the app. There's so many apps now! >> (laughing) I got my index cards cards still, Jennifer! Dave's got his notebook! >> I'm not sure I want my ledger to be on paper... >> Right! So I think that's going to be an interesting thing when people take a step back and go like, "Is this really better? Is this actually an improvement?" Because I don't think all things are better digital. >> That's a great question. Will the world be a better, more prosperous place... Uncertain. Your thoughts? >> I think the competition is probably the driver as to who has to this now, who's not safe. The organizations that are heavily regulated or compliance-driven can actually use that as the reasoning for not jumping into the barrel right now, and letting it happen in other areas first, watching the technology mature-- >> Dave: Let's wait. >> Yeah, let's wait, because that's traditionally how they-- >> Dave: Good strategy in your opinion? >> It depends on the entity but I think there's nothing wrong with being safe. There's nothing wrong with waiting for a variety of innovations to mature. What level of maturity, I think, is the perspective that probably is another discussion for another day, but I think that it's okay. I don't think that everyone should jump in. Get some lessons learned, watch how the other guys do it. I think that safety is in the eyes of the beholder, right? But some organizations are just competition fierce and they need a competitive edge and this is where they get it. >> When you say safety, do you mean safety in making decisions, or do you mean safety in protecting data? How are you defining safety? >> Safety in terms of when they need to launch, and look into these new technologies as a basis for change within the organization. >> What about the other side of that point? There's so much more data about it, so much more behavior about it, so many more attitudes, so on and so forth. And there is privacy issues and security issues and all that... Those are real challenges for any company, and becoming exponentially more important as more is at stake. So, how do companies address that? That's got to be absolutely part of their equation, as they decide what these future deployments are, because they're going to have great, vast reams of data, but that's a lot of vulnerability too, isn't it? >> It's as vulnerable as they... So, from an organizational standpoint, they're accustomed to these... These challenges aren't new, right? We still see data breaches. >> They're bigger now, right? >> They're bigger, but we still see occasionally data breaches in organizations where we don't expect to see them. I think that, from that perspective, it's the experiences of the organizations that determine the risks they want to take on, to a certain degree. And then, based on those risks, and how they handle adversity within those risks, from an experience standpoint they know ultimately how to handle it, and get themselves to a place where they can figure out what happened and then fix the issues. And then the others watch while these risk-takers take on these types of scenarios. >> I want to underscore this whole disruption thing and ask... We don't have much time, I know we're going a little over. I want to ask you to pull out your Hubble telescopes. Let's make a 20 to 30 year view, so we're safe, because we know we're going to be wrong. I want a sort of scale of 1 to 10, high likelihood being 10, low being 1. Maybe sort of rapid fire. Do you think large retail stores are going to mostly disappear? What do you guys think? >> I think the way that they are structured, the way that they interact with their customers might change, but you're still going to need them because there are going to be times where you need to buy something. >> So, six, seven, something like that? Is that kind of consensus, or do you feel differently Colin? >> I feel retail's going to be around, especially fashion because certain people, and myself included, I need to try my clothes on. So, you need a location to go to, a physical location to actually feel the material, experience the material. >> Alright, so we kind of have a consensus there. It's probably no. How about driving-- >> I was going to say, Amazon opened a book store. Just saying, it's kind of funny because they got... And they opened the book store, so you know, I think what happens is people forget over time, they go, "It's a new idea." It's not so much a new idea. >> I heard a rumor the other day that their next big acquisition was going to be, not Neiman Marcus. What's the other high end retailer? >> Nordstrom? >> Nordstrom, yeah. And my wife said, "Bad idea, they'll ruin it." Will driving and owning your own car become an exception? >> Driving and owning your own car... >> Dave: 30 years now, we're talking. >> 30 years... Sure, I think the concept is there. I think that we're looking at that. IOT is moving us in that direction. 5G is around the corner. So, I think the makings of it is there. So, since I can dare to be wrong, yeah I think-- >> We'll be on 10G by then anyway, so-- >> Automobiles really haven't been disrupted, the car industry. But you're forecasting, I would tend to agree. Do you guys agree or no, or do you think that culturally I want to drive my own car? >> Yeah, I think people, I think a couple of things. How well engineered is it? Because if it's badly engineered, people are not going to want to use it. For instance, there are people who could take public transportation. It's the same idea, right? Everything's autonomous, you'd have to follow in line. There's going to be some system, some order to it. And you might go-- >> Dave: Good example, yeah. >> You might go, "Oh, I want it to be faster. I don't want to be in line with that autonomous vehicle. I want to get there faster, get there sooner." And there are people who want to have that control over their lives, but they're not subject to things like schedules all the time and that's their constraint. So, I think if the engineering is bad, you're going to have more problems and people are probably going to go away from wanting to be autonomous. >> Alright, Colin, one for you. Will robots and maybe 3D printing, for example RPA, will it reverse the trend toward offshore manufacturing? >> 30 years from now, yes. I think robotic process engineering, eventually you're going to be at your cubicle or your desk, or whatever it is, and you're going to be able to print office supplies. >> Do you guys think machines will make better diagnoses than doctors? Ohhhhh. >> I'll take that one. >> Alright, alright. >> I think yes, to a certain degree, because if you look at the... problems with diagnosis, right now they miss it and I don't know how people, even 30 years from now, will be different from that perspective, where machines can look at quite a bit of data about a patient in split seconds and say, "Hey, the likelihood of you recurring this disease is nil to none, because here's what I'm basing it on." I don't think doctors will be able to do that. Now, again, daring to be wrong! (laughing) >> Jennifer: Yeah so--6 >> Don't tell your own doctor either. (laughing) >> That's true. If anything happens, we know, we all know. I think it depends. So maybe 80%, some middle percentage might be the case. I think extreme outliers, maybe not so much. You think about anything that's programmed into an algorithm, someone probably identified that disease, a human being identified that as a disease, made that connection, and then it gets put into the algorithm. I think what w6ll happen is that, for the 20% that isn't being done well by machine, you'll have people who are more specialized being able to identify the outlier cases from, say, the standard. Normally, if you have certain symptoms, you have a cold, those are kind of standard ones. If you have this weird sort of thing where there's n6w variables, environmental variables for instance, your environment can actually lead to you having cancer. So, there's othe6 factors other than just your body and your health that's going to actually be important to think about wh6n diagnosing someone. >> John: Colin, go ahead. >> I think machines aren't going to out-decision doctors. I think doctors are going to work well the machine learning. For instance, there's a published document of Watson doing the research of a team of four in 10 minutes, when it normally takes a month. So, those doctors,6to bring up Jen and Craig's point, are going to have more time to focus in on what the actual symptoms are, to resolve the outcome of patient care and patient services in a way that benefits humanity. >> I just wish that, Dave, that you would have picked a shorter horizon that... 30 years, 20 I feel good about our chances of seeing that. 30 I'm just not so sure, I mean... For the two old guys on the panel here. >> The consensus is 20 years, not so much. But beyond 10 years, a lot's going to change. >> Well, thank you all for joining this. I always enjoy the discussions. Craig, Jennifer and Colin, thanks for being here with us here on theCUBE, we appreciate the time. Back with more here from New York right after this. You're watching theCUBE. (upbeat digital music)
SUMMARY :
Brought to you by IBM. enough organized data to talk to your team and organize or at least the ability to scale out to be able to process and that the effort that's necessary in order to build but that has to change, or they're going to get disrupted. and data specific to that functionality but the Ubers, the AirBNB's, etc... I think companies are struggling with that. Maybe, first of all, you could explain RPA. and allow the human expertise to gradually grow Are you guys helping companies close that gap? presenting the technology to the decision-makers. how to guide them, how to explain hey maybe you shouldn't You're going to have to get up to speed on this and the business is looking to effectively improve some and are there limits to what we should do? I don't think there's going to be much of a limit, that are going to be contacted to try to resolve an issue certainly in the near to mid to maybe even long term, but I'd like basic things to be taken care of every day." in later, if you will, to switch over to football. and brought up some very good points, and so has... Craig, I'm sorry. and the drag it has on their operational ability I want to ask you guys about disruption. operational efficiency, to make these things happen, I'd like to get a little debate going here. So, I think there's going to be this continuous move ledger to be on paper... So I think that's going to be an interesting thing Will the world be a better, more prosperous place... as to who has to this now, who's not safe. It depends on the entity but I think and look into these new technologies as a basis That's got to be absolutely part of their equation, they're accustomed to these... and get themselves to a place where they can figure out I want to ask you to pull out your Hubble telescopes. because there are going to be times I feel retail's going to be around, Alright, so we kind of have a consensus there. I think what happens is people forget over time, I heard a rumor the other day that their next big Will driving and owning your own car become an exception? So, since I can dare to be wrong, yeah I think-- or do you think that culturally I want to drive my own car? There's going to be some system, some order to it. going to go away from wanting to be autonomous. Alright, Colin, one for you. be able to print office supplies. Do you guys think machines will make "Hey, the likelihood of you recurring this disease Don't tell your own doctor either. being able to identify the outlier cases from, say, I think doctors are going to work well the machine learning. I just wish that, Dave, that you would have picked The consensus is 20 years, not so much. I always enjoy the discussions.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Craig | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Colin | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jen | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Jennifer Shin | PERSON | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Dave | PERSON | 0.99+ |
Colin Sumpter | PERSON | 0.99+ |
Craig Brown | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
20 | QUANTITY | 0.99+ |
John | PERSON | 0.99+ |
Nordstrom | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
AirBNB | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
Neiman Marcus | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
20% | QUANTITY | 0.99+ |
3 | DATE | 0.99+ |
today | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
CrowdMole | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
4 AM | DATE | 0.99+ |
8 Path Solutions | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
two old guys | QUANTITY | 0.99+ |
five years | QUANTITY | 0.99+ |
30 years | QUANTITY | 0.99+ |
30 year | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
three people | QUANTITY | 0.99+ |
Ubers | ORGANIZATION | 0.99+ |
10 minutes | QUANTITY | 0.99+ |
10 years | QUANTITY | 0.99+ |
a month | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
next decade | DATE | 0.98+ |
10 years ago | DATE | 0.98+ |
seven | QUANTITY | 0.98+ |
30 | QUANTITY | 0.98+ |
Hubble | ORGANIZATION | 0.98+ |
two things | QUANTITY | 0.98+ |
1 | QUANTITY | 0.98+ |
half a million files | QUANTITY | 0.97+ |
Garry Kasparov | Machine Learning Everywhere 2018
>> [Narrator] Live from New York, it's theCube, covering Machine Learning Everywhere. Build your ladder to AI, brought to you by IBM. >> Welcome back here to New York City as we continue at IBM's Machine Learning Everywhere, build your ladder to AI, along with Dave Vellante, I'm John Walls. It is now a great honor of ours to have I think probably and arguably the greatest chess player of all time, Garry Kasparov now joins us. He's currently the chairman of the Human Rights Foundation, political activist in Russia as well some time ago. Thank you for joining us, we really appreciate the time, sir. >> Thank you for inviting me. >> We've been looking forward to this. Let's just, if you would, set the stage for us. Artificial Intelligence obviously quite a hot topic. The maybe not conflict, the complementary nature of human intelligence. There are people on both sides of the camp. But you see them as being very complementary to one another. >> I think that's natural development in this industry that will bring together humans and machines. Because this collaboration will produce the best results. Our abilities are complementary. The humans will bring creativity and intuition and other typical human qualities like human judgment and strategic vision while machines will add calculation, memory, and many other abilities that they have been acquiring quickly. >> So there's room for both, right? >> Yes, I think it's inevitable because no machine will ever reach 100% perfection. Machines will be coming closer and closer, 90%, 92, 94, 95. But there's still room for humans because at the end of the day even with this massive power you have guide it. You have to evaluate the results and at the end of the day the machine will never understand when it reaches the territory of diminishing returns. It's very important for humans actually to identify. So what is the task? I think it's a mistake that is made by many pundits that they automatically transfer the machine's expertise for the closed systems into the open-ended systems. Because in every closed system, whether it's the game of chess, the game of gall, video games like daughter, or anything else where humans already define the parameters of the problem, machines will perform phenomenally. But if it's an open-ended system then machine will never identify what is the sort of the right question to be asked. >> Don't hate me for this question, but it's been reported, now I don't know if it's true or not, that at one point you said that you would never lose to a machine. My question is how capable can we make machines? First of all, is that true? Did you maybe underestimate the power of computers? How capable to you think we can actually make machines? >> Look, in the 80s when the question was asked I was much more optimistic because we saw very little at that time from machines that could make me, world champion at the time, worry about machines' capability of defeating me in the real chess game. I underestimated the pace it was developing. I could see something was happening, was cooking, but I thought it would take longer for machines to catch up. As I said in my talk here is that we should simply recognize the fact that everything we do while knowing how we do that, machines will do better. Any particular task that human perform, machine will eventually surpass us. >> What I love about your story, I was telling you off-camera about when we had Erik Brynjolfsson and Andrew McAfee on, you're the opposite of Samuel P. Langley to me. You know who Samuel P. Langley is? >> No, please. >> Samuel P. Langley, do you know who Samuel P. Langley is? He was the gentleman that, you guys will love this, that the government paid. I think it was $50,000 at the time, to create a flying machine. But the Wright Brothers beat him to it, so what did Samuel P. Langley do after the Wright Brothers succeeded? He quit. But after you lost to the machine you said you know what? I can beat the machine with other humans, and created what is now the best chess player in the world, is my understanding. It's not a machine, but it's a combination of machines and humans. Is that accurate? >> Yes, in chess actually, we could demonstrate how the collaboration can work. Now in many areas people rely on the lessons that have been revealed, learned from what I call advanced chess. That in this team, human plus machine, the most important element of success is not the strengths of the human expert. It's not the speed of the machine, but it's a process. It's an interface, so how you actually make them work together. In the future I think that will be the key of success because we have very powerful machine, those AIs, intelligent algorithms. All of them will require very special treatment. That's why also I use this analogy with the right fuel for Ferrari. We will have expert operators, I call them the shepherds, that will have to know exactly what are the requirements of this machine or that machine, or that group of algorithms to guarantee that we'll be able by our human input to compensate for their deficiencies. Not the other way around. >> What let you to that response? Was it your competitiveness? Was it your vision of machines and humans working together? >> I thought I could last longer as the undefeated world champion. Ironically, 1997 when you just look at the game and the quality of the game and try to evaluate the Deep Blue real strengths, I think I was objective, I was stronger. Because today you can analyze these games with much more powerful computers. I mean any chess app on your laptop. I mean you cannot really compare with Deep Blue. That's natural progress. But as I said, it's not about solving the game, it's not about objective strengths. It's about your ability to actually perform at the board. I just realized while we could compete with machines for few more years, and that's great, it did take place. I played two more matches in 2003 with German program. Not as publicized as IBM match. Both ended as a tie and I think they were probably stronger than Deep Blue, but I knew it would just be over, maybe a decade. How can we make chess relevant? For me it was very natural. I could see this immense power of calculations, brute force. On the other side I could see us having qualities that machines will never acquire. How about bringing together and using chess as a laboratory to find the most productive ways for human-machine collaboration? >> What was the difference in, I guess, processing power basically, or processing capabilities? You played the match, this is 1997. You played the match on standard time controls which allow you or a player a certain amount of time. How much time did Deep Blue, did the machine take? Or did it take its full time to make considerations as opposed to what you exercised? >> Well it's the standard time control. I think you should explain to your audience at that time it was seven hours game. It's what we call classical chess. We have rapid chess that is under one hour. Then you have blitz chess which is five to ten minutes. That was a normal time control. It's worth mentioning that other computers they were beating human players, myself included, in blitz chess. In the very fast chess. We still thought that more time was more time we could have sort of a bigger comfort zone just to contemplate the machine's plans and actually to create real problems that machine would not be able to solve. Again, more time helps humans but at the end of the day it's still about your ability not to crack under pressure because there's so many things that could take you off your balance, and machine doesn't care about it. At the end of the day machine has a steady hand, and steady hand wins. >> Emotion doesn't come into play. >> It's not about apps and strength, but it's about guaranteeing that it will play at a certain level for the entire game. While human game maybe at one point it could go a bit higher. But at the end of the day when you look at average it's still lower. I played many world championship matches and I analyze the games, games played at the highest level. I can tell you that even the best games played by humans at the highest level, they include not necessarily big mistakes, but inaccuracies that are irrelevant when humans facing humans because I make a mistake, tiny mistake, then I can expect you to return the favor. Against the machine it's just that's it. Humans cannot play at the same level throughout the whole game. The concentration, the vigilance are now required when humans face humans. Psychologically when you have a strong machine, machine's good enough to play with a steady hand, the game's over. >> I want to point out too, just so we get the record straight for people who might not be intimately familiar with your record, you were ranked number one in the world from 1986 to 2005 for all but three months. Three months, that's three decades. >> Two decades. >> Well 80s, 90s, and naughts, I'll give you that. (laughing) That's unheard of, that's phenomenal. >> Just going back to your previous question about why I just look for some new form of chess. It's one of the key lessons I learned from my childhood thanks to my mother who spent her live just helping me to become who I am, who I was after my father died when I was seven. It's about always trying to make the difference. It's not just about winning, it's about making a difference. It led me to kind of a new motto in my professional life. That is it's all about my own quality of the game. As long as I'm challenging my own excellence I will never be short of opponents. For me the defeat was just a kick, a push. So let's come up with something new. Let's find a new challenge. Let's find a way to turn this defeat, the lessons from this defeat into something more practical. >> Love it, I mean I think in your book I think, was it John Henry, the famous example. (all men speaking at once) >> He won, but he lost. >> Motivation wasn't competition, it was advancing society and creativity, so I love it. Another thing I just want, a quick aside, you mentioned performing under pressure. I think it was in the 1980s, it might have been in the opening of your book. You talked about playing multiple computers. >> [Garry] Yeah, in 1985. >> In 1985 and you were winning all of them. There was one close match, but the computer's name was Kasparov and you said I've got to beat this one because people will think that it's rigged or I'm getting paid to do this. So well done. >> It's I always mention this exhibition I played in 1985 against 32 chess-playing computers because it's not the importance of this event was not just I won all the games, but nobody was surprised. I have to admit that the fact that I could win all the games against these 32 chess-playing computers they're only chess-playing machine so they did nothing else. Probably boosted my confidence that I would never be defeated even by more powerful machines. >> Well I love it, that's why I asked the question how far can we take machines? We don't know, like you said. >> Why should we bother? I see so many new challenges that we will be able to take and challenges that we abandoned like space exploration or deep ocean exploration because they were too risky. We couldn't actually calculate all the odds. Great, now we have AI. It's all about increasing our risk because we could actually measure against this phenomenal power of AI that will help us to find the right pass. >> I want to follow up on some other commentary. Brynjolfsson and McAfee basically put forth the premise, look machines have always replaced humans. But this is the first time in history that they have replaced humans in the terms of cognitive tasks. They also posited look, there's no question that it's affecting jobs. But they put forth the prescription which I think as an optimist you would agree with, that it's about finding new opportunities. It's about bringing creativity in, complementing the machines and creating new value. As an optimist, I presume you would agree with that. >> Absolutely, I'm always saying jobs do not disappear, they evolve. It's an inevitable part of the technological progress. We come up with new ideas and every disruptive technology destroys some industries but creates new jobs. So basically we see jobs shifting from one industry to another. Like from agriculture, manufacture, from manufacture to other sectors, cognitive tasks. But now there will be something else. I think the market will change, the job market will change quite dramatically. Again I believe that we will have to look for riskier jobs. We will have to start doing things that we abandoned 30, 40 years ago because we thought they were too risky. >> Back to the book you were talking about, deep thinking or machine learning, or machine intelligence ends and human intelligence begins, you talked about courage. We need fail safes in place, but you also need that human element of courage like you said, to accept risk and take risk. >> Now it probably will be easier, but also as I said the machine's wheel will force a lot of talent actually to move into other areas that were not as attractive because there were other opportunities. There's so many what I call raw cognitive tasks that are still financially attractive. I hope and I will close many loops. We'll see talent moving into areas where we just have to open new horizons. I think it's very important just to remember it's the technological progress especially when you're talking about disruptive technology. It's more about unintended consequences. The fly to the moon was just psychologically it's important, the Space Race, the Cold War. But it was about also GPS, about so many side effects that in the 60s were not yet appreciated but eventually created the world we have now. I don't know what the consequences of us flying to Mars. Maybe something will happen, one of the asteroids will just find sort of a new substance that will replace fossil fuel. What I know, it will happen because when you look at the human history there's all this great exploration. They ended up with unintended consequences as the main result. Not what was originally planned as the number one goal. >> We've been talking about where innovation comes from today. It's a combination of a by-product out there. A combination of data plus being able to apply artificial intelligence. And of course there's cloud economics as well. Essentially, well is that reasonable? I think about something you said, I believe, in the past that you didn't have the advantage of seeing Deep Blue's moves, but it had the advantage of studying your moves. You didn't have all the data, it had the data. How does data fit into the future? >> Data is vital, data is fuel. That's why I think we need to find some of the most effective ways of collaboration between humans and machines. Machines can mine the data. For instance, it's a breakthrough in instantly mining data and human language. Now we could see even more effective tools to help us to mine the data. But at the end of the day it's why are we doing that? What's the purpose? What does matter to us, so why do we want to mine this data? Why do we want to do here and not there? It seems at first sight that the human responsibilities are shrinking. I think it's the opposite. We don't have to move too much but by the tiny shift, just you know percentage of a degree of an angle could actually make huge difference when this bullet reaches the target. The same with AI. More power actually offers opportunities to start just making tiny adjustments that could have massive consequences. >> Open up a big, that's why you like augmented intelligence. >> I think artificial is sci-fi. >> What's artificial about it, I don't understand. >> Artificial, it's an easy sell because it's sci-fi. But augmented is what it is because our intelligent machines are making us smarter. Same way as the technology in the past made us stronger and faster. >> It's not artificial horsepower. >> It's created from something. >> Exactly, it's created from something. Even if the machines can adjust their own code, fine. It still will be confined within the parameters of the tasks. They cannot go beyond that because again they can only answer questions. They can only give you answers. We provide the questions so it's very important to recognize that it is we will be in the leading role. That's why I use the term shepherds. >> How do you spend your time these days? You're obviously writing, you're speaking. >> Writing, speaking, traveling around the world because I have to show up at many conferences. The AI now is a very hot topic. Also as you mentioned I'm the Chairman of Human Rights Foundation. My responsibilities to help people who are just dissidents around the world who are fighting for their principles and for freedom. Our organization runs the largest dissident gathering in the world. It's called the Freedom Forum. We have the tenth anniversary, tenth event this May. >> It has been a pleasure. Garry Kasparov, live on theCube. Back with more from New York City right after this. (lively instrumental music)
SUMMARY :
Build your ladder to AI, brought to you by IBM. He's currently the chairman of the Human Rights Foundation, The maybe not conflict, the complementary nature that will bring together humans and machines. of the day even with this massive power you have guide it. How capable to you think we can actually make machines? recognize the fact that everything we do while knowing P. Langley to me. But the Wright Brothers beat him to it, In the future I think that will be the key of success the Deep Blue real strengths, I think I was objective, as opposed to what you exercised? I think you should explain to your audience But at the end of the day when you look at average you were ranked number one in the world from 1986 to 2005 Well 80s, 90s, and naughts, I'll give you that. For me the defeat was just a kick, a push. Love it, I mean I think in your book I think, in the opening of your book. was Kasparov and you said I've got to beat this one the importance of this event was not just I won We don't know, like you said. I see so many new challenges that we will be able Brynjolfsson and McAfee basically put forth the premise, Again I believe that we will have to look Back to the book you were talking about, deep thinking the machine's wheel will force a lot of talent but it had the advantage of studying your moves. But at the end of the day it's why are we doing that? But augmented is what it is because to recognize that it is we will be in the leading role. How do you spend your time these days? We have the tenth anniversary, tenth event this May. Back with more from New York City right after this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Samuel P. Langley | PERSON | 0.99+ |
Samuel P. Langley | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
Human Rights Foundation | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
$50,000 | QUANTITY | 0.99+ |
Kasparov | PERSON | 0.99+ |
Russia | LOCATION | 0.99+ |
five | QUANTITY | 0.99+ |
Garry Kasparov | PERSON | 0.99+ |
2003 | DATE | 0.99+ |
2005 | DATE | 0.99+ |
1986 | DATE | 0.99+ |
Andrew McAfee | PERSON | 0.99+ |
seven hours | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
1985 | DATE | 0.99+ |
100% | QUANTITY | 0.99+ |
Ferrari | ORGANIZATION | 0.99+ |
1997 | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
1980s | DATE | 0.99+ |
92 | QUANTITY | 0.99+ |
Mars | LOCATION | 0.99+ |
John Henry | PERSON | 0.99+ |
Space Race | EVENT | 0.99+ |
Three months | QUANTITY | 0.99+ |
seven | QUANTITY | 0.99+ |
three months | QUANTITY | 0.99+ |
94 | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
ten minutes | QUANTITY | 0.99+ |
Deep Blue | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
first time | QUANTITY | 0.99+ |
95 | QUANTITY | 0.99+ |
Cold War | EVENT | 0.99+ |
under one hour | QUANTITY | 0.99+ |
tenth event | QUANTITY | 0.99+ |
two more matches | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
Erik Brynjolfsson | PERSON | 0.98+ |
Garry | PERSON | 0.98+ |
one close match | QUANTITY | 0.98+ |
tenth anniversary | QUANTITY | 0.98+ |
30 | DATE | 0.97+ |
Two decades | QUANTITY | 0.97+ |
32 chess | QUANTITY | 0.97+ |
80s | DATE | 0.97+ |
three decades | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
one point | QUANTITY | 0.95+ |
Wright Brothers | PERSON | 0.95+ |
first sight | QUANTITY | 0.94+ |
Freedom Forum | ORGANIZATION | 0.93+ |
First | QUANTITY | 0.92+ |
one industry | QUANTITY | 0.92+ |
a decade | QUANTITY | 0.92+ |
60s | DATE | 0.92+ |
2018 | DATE | 0.91+ |
this May | DATE | 0.87+ |
McAfee | ORGANIZATION | 0.83+ |
90s | DATE | 0.78+ |
40 years ago | DATE | 0.75+ |
German | OTHER | 0.74+ |
Brynjolfsson | ORGANIZATION | 0.63+ |
more years | QUANTITY | 0.61+ |
theCube | ORGANIZATION | 0.6+ |
80s | QUANTITY | 0.59+ |
number | QUANTITY | 0.57+ |
Learning | ORGANIZATION | 0.35+ |
Blue | OTHER | 0.35+ |
Everywhere | TITLE | 0.32+ |
Madhu Kochar, IBM | Machine Learning Everywhere 2018
>> Announcer: Live from New York, it's theCUBE covering Machine Learning Everywhere, Build Your Ladder To AI, brought to you by IBM. (techy music playing) >> Welcome back to New York City as we continue here at IBM's Machine Learning Everywhere, Build Your Ladder To AI bringing it to you here on theCUBE, of course the rights to the broadcast of SiliconANGLE Media and Dave Vellante joins me here. Dave, good morning once again to you, sir. >> Hey, John, good to see you. >> And we're joined by Madhu Kochar, who is the Vice President of Analytics Development and Client Success at IBM, I like that, client success. Good to see you this morning, thanks for joining us. >> Yeah, thank you. >> Yeah, so let's bring up a four letter / ten letter word, governance, that some people just cringe, right, right away, but that's very much in your wheelhouse. Let's talk about that in terms of what you're having to be aware of today with data and all of a sudden these great possibilities, right, but also on the other side, you've got to be careful, and I know there's some clouds over in Europe as well, but let's just talk about your perspective on governance and how it's important to get it all under one umbrella. >> Yeah, so I lead product development for IBM analytics, governance, and integration, and like you said, right, governance has... Every time you talk that, people cringe and you think it's a dirty word, but it's not anymore, right. Especially when you want to tie your AI ladder story, right, there is no AI without information architecture, no AI without IA, and if you think about IA, what does that really mean? It means the foundation of that is data and analytics. Now, let's look deeper, what does that really mean, what is data analytics? Data is coming at us from everywhere, right, and there's records... The data shows there's about 2.5 quintillion bytes of data getting generated every single day, raw data from everywhere. How are we going to make sense out of it, right, and from that perspective it is just so important that you understand this type of data, what is the type of data, what's the classification of this means in a business. You know, when you are running your business, there's a lot of cryptic fields out there, what is the business terms assigned to it and what's the lineage of it, where did it come from. If you do have to do any analytics, if data scientists have to do any analytics on it they need to understand where did it actually originated from, can I even trust this data. Trust is really, really important here, right, and is the data clean, what is the quality of this data. The data is coming at us all raw formats from IOT sensors and such. What is the quality of this data? To me, that is the real definition of governance. Right, it's not just about what we used to think about compliance, yes, that's-- >> John: Like rolling a rag. >> Right, right. >> But it's all about being appropriate with all the data you have coming in. >> Exactly, I call it governance 2.0 or governance for insights, because that's what it needs to be all about. Right, compliance, yes indeed, with GDPR and other things coming at us it's important, but I think the most critical is that we have to change the term of governance into, like, this is that foundation for your AI ladder that is going to help us really drive the right insights, that's my perspective. >> I want to double click on that because you're right, I mean, it is kind of governance 2.0. It used to be, you know, Enron forced a lot of, you know, governance and the Federal Rules of Civil Procedure forced a lot of sort of even some artificial governance, and then I think organization, especially public companies and large organizations said, "You know what, we can't just do "this as a band-aid every time." You know, now GDPR, many companies are not ready for GDPR, we know that. Having said that, because it is, went through governance 1.0, many companies are not panicked. I mean, they're kind of panicking because May is coming, (laughs) but they've been through this before. >> Madhu: Mm-hm. >> Do you agree with that premise, that they've got at least the skillsets and the professionals to, if they focus, they can get there pretty quickly? >> Yeah, no, I agree with that, but I think our technology and tools needs to change big time here, right, because regulations are coming at us from all different angles. Everybody's looking to cut costs, right? >> Dave: Right. >> You're not going to hire more people to sit there and classify the data and say, "Hey, is this data ready for GDPR," or for Basel or for POPI, like in South Africa. I mean, there's just >> Dave: Yeah. >> Tons of things, right, so I do think the technology needs to change, and that's why, you know, in our governance portfolio, in IBM information server, we have infused machine learning in it, right, >> Dave: Hm. >> Where it's automatically you have machine learning algorithms and models understanding your data, classifying the data. You know, you don't need humans to sit there and assign terms, the business terms to it. We have compliance built into our... It's running actually on machine learning. You can feed in taxonomy for GDPR. It would automatically tag your data in your catalog and say, "Hey, this is personal data, "this is sensitive data, or this data "is needed for these type of compliance," and that's the aspect which I think we need to go focus on >> Dave: Mm-hm. >> So the companies, to your point, don't shrug every time they hear regulations, that it's kind of built in-- >> Right. >> In the DNA, but technologies have to change, the tools have to change. >> So, to me that's good news, if you're saying the technology and the tools is the gap. You know, we always talk about people, process, and technology the bromide is, but it's true, people and process are the really-- >> Madhu: Mm-hm. >> Hard pieces of it. >> Madhu: Mm-hm. >> Technology comes and goes >> Madhu: Mm-hm. >> And people kind of generally get used to that. So, I'm inferring from your comments that you feel as though governance, there's a value component of governance now >> Yeah, yeah. >> It's not just a negative risk avoidance. It can be a contributor to value. You mentioned the example of classification, which I presume is auto-classification >> Madhu: Yes. >> At the point of use or creation-- >> Madhu: Yes. >> Which has been a real nagging problem for decades, especially after FRCP, Federal Rules of Civil Procedure, where it was like, "Ugh, we can't figure "this out, we'll do email archiving." >> Madhu: Mm-hm. >> You can't do this manually, it's just too much data-- >> Yeah. >> To your point, so I wonder if you could talk a little bit about governance and its contribution to value. >> Yeah, so this is good question. I was just recently visiting some large banks, right, >> Dave: Mm-hm. >> And normally, the governance and compliance has always been an IT job, right? >> Dave: Right. >> And they figure out bunch of products, you know, you can download opensource and do other things to quickly deliver data or insights to their business groups, right, and for business to further figure out new business models and such, right. So, recently what has happened is by doing machine learning into governance, you're making your IT guys the heroes because now they can deliver stuff very quickly, and the business guys are starting to get those insights and their thoughts on data is changing, you know, and recently I was talking with these banks where they're like, "Can you come and talk to "our CFOs because I think the policies," the cultural change you referred to then, maybe the data needs to be owned by businesses. >> Dave: Hm. >> No longer an IT thing, right? So, governance I feel like, you know, governance and integration I feel like is a glue which is helping us drive that cultural change in the organizations, bringing IT and the business groups together to further drive the insights. >> So, for years we've been talking about information as a liability or an asset, and for decades it was really viewed as a liability, get rid of it if you can. You have to keep it for seven years, then get rid of it, you know. That started to change, you know, with the big data movement, >> Madhu: Yeah. >> But there was still sort of... It was hard, right, but what I'm hearing now is increasingly, especially of the businesses sort of owning the data, it's becoming viewed as an asset. >> Madhu: Yes. >> You've got to manage the liabilities, we got that, but now how do we use it to drive business value. >> Yeah, yeah, no, exactly, and that's where I think our focus in IBM analytics, with machine learning and automation, and truly driving that insights out of the data. I mean, you know, people... We've been saying data is a natural resource. >> Dave: Mm-hm. >> It's our bloodline, it's this and that. It truly is, you know, and talking to the large enterprises, everybody is in their mode of digital transformation or transforming, right? We in IBM are doing the same things. Right, we're eating our own, drinking our own champagne (laughs). >> John: Not the Kool-Aid. >> You know, yeah, yeah. >> John: Go right to the dog. >> Madhu: Yeah, exactly. >> Dave: No dog smoothie. (laughs) >> Drinking our own champagne, and truly we're seeing transformation in how we're running our own business as well. >> Now what, there are always surprises. There are always some, you know, accidents kind of waiting to happen, but in terms of the IOT, you know, have got these millions, right, of sensors-- >> Madhu: Mm-hm. >> You know, feeding data in, and what, from a governance perspective, is maybe a concern about, you know, an unexpected source or an unexpected problem or something where yeah, you have great capabilities, but with those capabilities might come a surprise or two in terms of protecting data and a machine might provide perhaps a little more insight than you might've expected. So, I mean, just looking down the road from your perspective, you know, is there anything along those lines that you're putting up flags for just to keep an eye on to see what new inputs might create new problems for you? >> Yeah, no, for sure, I mean, we're always looking at how do we further do innovation, how do we disrupt ourselves and make sure that data doesn't become our enemy, right, I mean it's... You know, as we are talking about AI, people are starting to ask a lot of questions about ethics and other things, too, right. So, very critical, so obviously when you focus on governance, the point of that is let's take the manual stuff out, make it much faster, but part of the governance is that we're protecting you, right. That's part of that security and understanding of the data, it's all about that you don't end up in jail. Right, that's the real focus in terms of our technology in terms of the way we're looking at. >> So, maybe help our audience a little bit. So, I described at our open AI is sort of the umbrella and machine learning is the math and the algorithms-- >> Madhu: Yeah. >> That you apply to train systems to do things maybe better than, maybe better than humans can do and then there's deep learning, which is, you know, neural nets and so forth, but am I understanding that you've essentially... First of all, is that sort of, I know it's rudimentary, but is it reasonable, and then it sounds like you've infused ML into your software. >> Madho: Yes. >> And so I wonder if you could comment on that and then describe from the client's standpoint what skills they need to take advantage of that, if any. >> Oh, yeah, no, so embedding ML into a software, like a packaged software which gets delivered to our client, people don't understand actually how powerful that is, because your data, your catalog, is learning. It's continuously learning from the system itself, from the data itself, right, and that's very exciting. The value to the clients really is it cuts them their cost big time. Let me give you an example, in a large organization today for example, if they have, like, maybe 22,000 some terms, normally it would take them close to six months for one application with a team of 20 to sit there and assign the terms, the right business glossary for their business to get data. (laughs) So, by now doing machine learning in our software, we can do this in days, even in hours, obviously depending on what's the quantity of the data in the organization. That's the value, so the value to the clients is cutting down that. They can take those folks and go focus on some, you know, bigger value add applications and others and take advantage of that data. >> The other huge value that I see is as the business changes, the machine can help you adapt. >> Madhu: Yeah. >> I mean, taxonomies are like cement in data classification, and while we can't, you know, move the business forward because we have this classification, can your machines adapt, you know, in real time and can they change at the speed of my business, is my question. >> Right, right, no, it is, right, and clients are not able to move on their transformation journey because they don't have data classified done right. >> Dave: Mm-hm. >> They don't, and you can't put humans to it. You're going to need the technology, you're going to need the machine learning algorithms and the AI built into your software to get that, and that will lead to, really, success of every kind. >> Broader question, one of the good things about things like GDPR is it forces, it puts a deadline on there and we all know, "Give me a deadline and I'll hit it," so it sort of forces action. >> Madhu: Mm-hm. >> And that's good, we've talked about the value that you can bring to an organization from a data perspective, but there's a whole non-governance component of data orientation. How do you see that going, can the governance initiatives catalyze sort of what I would call a... You know, people talk about a data driven organization. Most companies, they may say they are data driven but they're really not foundational. >> Mm-hm. >> Can governance initiatives catalyze that transformation to a data driven organization, and if so, how? >> Yeah, no, absolutely, right. So, the example I was sharing earlier with talking to some of the large financial institutes, where the business guys, you know, outside of IT are talking about how important it is for them to get the data really real time, right, and self-service. They don't want to be dependent on either opening a work ticket for somebody in IT to produce data for them and god forbid if somebody's out on vacation they can never get that. >> Dave: Right. >> We don't live in that world anymore, right. It's online, it's real time, it's all, you know, self-service type of aspects, which the business, the data scientists building new analytic models are looking for that. So, for that, data is the key, key core foundation in governance. The way I explained it earlier, it's not just about compliance. That is going to lead to that transformation for every client, it's the core. They will not be successful without that. >> And the attributes are changing. Not only is it self-service, it's pervasive-- >> Madhu: Yeah. >> It's embedded, it's aware, it's anticipatory. Am I overstating that? >> Madhu: No. >> I mean, is the data going to find me? >> Yeah, you know, (laughs) that's a good way to put it, you know, so no, you're at the, I think you got it. This is absolutely the right focus, and the companies and the enterprises who understand this and use the right technology to fix it that they'll win. >> So, if you have a partner that maybe, if it is contextual, I mean... >> Dave: Yeah. >> So, also make it relevant-- >> Madhu: Yes. >> To me and help me understand its relevance-- >> Madhu: Yes. >> Because maybe as a, I hate to say as a human-- >> Madhu: Yes. >> That maybe just don't have that kind of prism, but can that, does that happen as well, too? >> Madhu: Yeah, no. >> John: It can put up these white flags and say, "Yeah, this is what you need." >> Yeah, no, absolutely, so like the focus we have on our natural language processing, for example, right. If you're looking for something you don't have to always know what your SQL is going to be for a query to do it. You just type in, "Hey, I'm looking for "some customer retention data," you know, and it will go out and figure it out and say, "Hey, are you looking for churn analysis "or are you looking to do some more promotions?" It will learn, you know, and that's where this whole aspect of machine learning and natural language processing is going to give you that contextual aspect of it, because that's how the self-service models will work. >> Right, what about skills, John asked me at the open about skillsets and I want to ask a general question, but then specifically about governance. I would make the assertion that most employees don't have the multidimensional digital skills and domain expertise skills today. >> Yeah. >> Some companies they do, the big data companies, but in governance, because it's 2.0, do you feel like the skills are largely there to take advantage of the innovations that IBM is coming out with? >> I think I generally, my personal opinion is the way the technology's moving, the way we are getting driven by a lot of disruptions, which are happening around us, I think we don't have the right skills out there, right. We all have to retool, I'm sure all of us in our career have done this all the time. You know, so (laughs) to me, I don't think we have it. So, building the right tools, the right technologies and enabling the resources that the teams out there to retool themselves so they can actually focus on innovation in their own enterprises is going to be critical, and that's why I really think more burn I can take off from the IT groups, more we can make them smarter and have them do their work faster. It will help give that time to go see hey, what's their next big disruption in their organization. >> Is it fair to say that traditionally governance has been a very people-intensive activity? >> Mm-hm. >> Will governance, you know, in the next, let's say decade, become essentially automated? >> That's my desire, and with the product-- >> Dave: That's your job. >> That's my job, and I'm actually really proud of what we have done thus far and where we are heading. So, next time when we meet we will be talking maybe governance 3.0, I don't know, right. (laughs) Yeah, that's the thing, right? I mean, I think you hit it on the nail, that this is, we got to take a lot of human-intensive stuff out of our products and more automation we can do, more smarts we can build in. I coined this term like, hey, we've got to build smarter metadata, right? >> Dave: Right. >> Data needs to, metadata is all about data of your data, right? That needs to become smarter, think about having a universe where you don't have to sit there and connect the dots and say, "I want to move from here to there." System already knows it, they understand certain behaviors, they know what your applications is going to do and it kind of automatically does it for you. No more science fake, I think it can happen. (laughs) >> Do you think we'll ever have more metadata than data... (laughs) >> Actually, somebody did ask me that question, will we be figuring out here we're building data lakes, what do we do about metadata. No, I think we will not have that problem for a while, we'll make it smarter. >> Dave: Going too fast, right. >> You're right. >> But it is, it's like working within your workforce and you're telling people, you know, "You're a treasure hunter and we're going to give you a better map." >> Madhu: Yeah. >> So, governance is your better map, so trust me. >> Madhu: Hey, I like that, maybe I'll use it next time. >> Yeah, but it's true, it's like are you saying governance is your friend here-- >> Madhu: Yes. >> And we're going to fine-tune your search, we're going to make you a more efficient employee, we're going to make you a smarter person and you're going to be able to contribute in a much better way, but it's almost enforced, but let it be your friend, not your foe. >> Yes, yeah, be your differentiator, right. >> But my takeaway is it's fundamental, it's embedded. You know, you're doing this now with less thinking. Security's got to get to the same play, but for years security, "Ugh, it slows me down," but now people are like, "Help me," right, >> Madhu: Mm-hm. >> And I think the same dynamic is true here, embedded governance in my business. Not a bolt on, not an afterthought. It's fundamental and foundational to my organization. >> Madhu: Yeah, absolutely. >> Well, Madhu, thank you for the time. We mentioned on the outset by the interview if you want to say hi to your kids that's your camera right there. Do you want to say hi to your kids real quick? >> Yeah, hi Mohed, Kepa, I love you so much. (laughs) >> All right. >> Thank you. >> So, they know where mom is. (laughs) New York City at IBM's Machine Learning Everywhere, Build Your Ladder To AI. Thank you for joining us, Madhu Kochar. >> Thank you, thank you. >> Back with more here from New York in just a bit, you're watching theCUBE. (techy music playing)
SUMMARY :
Build Your Ladder To AI, brought to you by IBM. Build Your Ladder To AI bringing it to you here Good to see you this morning, thanks for joining us. right, but also on the other side, You know, when you are running your business, with all the data you have coming in. that is going to help us really drive a lot of, you know, governance and the Everybody's looking to cut costs, You're not going to hire more people and assign terms, the business terms to it. to change, the tools have to change. So, to me that's good news, if you're saying So, I'm inferring from your comments that you feel Yeah, You mentioned the example of classification, Federal Rules of Civil Procedure, and its contribution to value. Yeah, so this is good question. and the business guys are starting to get So, governance I feel like, you know, That started to change, you know, is increasingly, especially of the businesses You've got to manage the liabilities, we got that, I mean, you know, people... It truly is, you know, and talking to Dave: No dog smoothie. Drinking our own champagne, and truly the IOT, you know, have got these concern about, you know, an unexpected source it's all about that you don't end up in jail. is the math and the algorithms-- which is, you know, neural nets and so forth, And so I wonder if you could comment on and assign the terms, the right business changes, the machine can help you adapt. you know, move the business forward and clients are not able to move on algorithms and the AI built into your software Broader question, one of the good things the value that you can bring to an organization where the business guys, you know, That is going to lead to that transformation And the attributes are changing. It's embedded, it's aware, it's anticipatory. Yeah, you know, (laughs) that's a good So, if you have a partner that and say, "Yeah, this is what you need." have to always know what your SQL is don't have the multidimensional digital do you feel like the skills are largely You know, so (laughs) to me, I don't think we have it. I mean, I think you hit it on the nail, applications is going to do and it Do you think we'll ever have more metadata than data... No, I think we will not have that problem and we're going to give you a better map." we're going to make you a more efficient employee, Security's got to get to the same play, It's fundamental and foundational to my organization. if you want to say hi to your kids Yeah, hi Mohed, Kepa, I love you so much. Thank you for joining us, Madhu Kochar. a bit, you're watching theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Madhu | PERSON | 0.99+ |
Mohed | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Enron | ORGANIZATION | 0.99+ |
Madhu Kochar | PERSON | 0.99+ |
South Africa | LOCATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Kepa | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Federal Rules of Civil Procedure | TITLE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Madho | PERSON | 0.99+ |
22,000 | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
ten letter | QUANTITY | 0.99+ |
one application | QUANTITY | 0.99+ |
FRCP | TITLE | 0.98+ |
today | DATE | 0.98+ |
four letter | QUANTITY | 0.96+ |
Kool-Aid | ORGANIZATION | 0.96+ |
about 2.5 quintillion bytes | QUANTITY | 0.94+ |
decades | QUANTITY | 0.92+ |
First | QUANTITY | 0.9+ |
millions | QUANTITY | 0.9+ |
20 | QUANTITY | 0.9+ |
one | QUANTITY | 0.89+ |
2018 | DATE | 0.87+ |
Machine | TITLE | 0.86+ |
six months | QUANTITY | 0.85+ |
Rob Thomas, IBM | Machine Learning Everywhere 2018
>> Announcer: Live from New York, it's theCUBE, covering Machine Learning Everywhere: Build Your Ladder to AI, brought to you by IBM. >> Welcome back to New York City. theCUBE continue our coverage here at IBM's event, Machine Learning Everywhere: Build Your Ladder to AI. And with us now is Rob Thomas, who is the vice president of, or general manager, rather, of IBM analytics. Sorry about that, Rob. Good to have you with us this morning. Good to see you, sir. >> Great to see you John. Dave, great to see you as well. >> Great to see you. >> Well let's just talk about the event first. Great lineup of guests. We're looking forward to visiting with several of them here on theCUBE today. But let's talk about, first off, general theme with what you're trying to communicate and where you sit in terms of that ladder to success in the AI world. >> So, maybe start by stepping back to, we saw you guys a few times last year. Once in Munich, I recall, another one in New York, and the theme of both of those events was, data science renaissance. We started to see data science picking up steam in organizations. We also talked about machine learning. The great news is that, in that timeframe, machine learning has really become a real thing in terms of actually being implemented into organizations, and changing how companies run. And that's what today is about, is basically showcasing a bunch of examples, not only from our clients, but also from within IBM, how we're using machine learning to run our own business. And the thing I always remind clients when I talk to them is, machine learning is not going to replace managers, but I think machine learning, managers that use machine learning will replace managers that do not. And what you see today is a bunch of examples of how that's true because it gives you superpowers. If you've automated a lot of the insight, data collection, decision making, it makes you a more powerful manager, and that's going to change a lot of enterprises. >> It seems like a no-brainer, right? I mean, or a must-have. >> I think there's a, there's always that, sometimes there's a fear factor. There is a culture piece that holds people back. We're trying to make it really simple in terms of how we talk about the day, and the examples that we show, to get people comfortable, to kind of take a step onto that ladder back to the company. >> It's conceptually a no-brainer, but it's a challenge. You wrote a blog and it was really interesting. It was, one of the clients said to you, "I'm so glad I'm not in the technology industry." And you went, "Uh, hello?" (laughs) "I've got news for you, you are in the technology industry." So a lot of customers that I talk to feel like, meh, you know, in our industry, it's really not getting disrupted. That's kind of taxis and retail. We're in banking and, you know, but, digital is disrupting every industry and every industry is going to have to adopt ML, AI, whatever you want to call it. Can traditional companies close that gap? What's your take? >> I think they can, but, I'll go back to the word I used before, it starts with culture. Am I accepting that I'm a technology company, even if traditionally I've made tractors, as an example? Or if traditionally I've just been you know, selling shirts and shoes, have I embraced the role, my role as a technology company? Because if you set that culture from the top, everything else flows from there. It can't be, IT is something that we do on the side. It has to be a culture of, it's fundamental to what we do as a company. There was an MIT study that said, data-driven cultures drive productivity gains of six to 10 percent better than their competition. You can't, that stuff compounds, too. So if your competitors are doing that and you're not, not only do you fall behind in the short term but you fall woefully behind in the medium term. And so, I think companies are starting to get there but it takes a constant push to get them focused on that. >> So if you're a tractor company, you've got human expertise around making tractors and messaging and marketing tractors, and then, and data is kind of there, sort of a bolt-on, because everybody's got to be data-driven, but if you look at the top companies by market cap, you know, we were talking about it earlier. Data is foundational. It's at their core, so, that seems to me to be the hard part, Rob, I'd like you to comment in terms of that cultural shift. How do you go from sort of data in silos and, you know, not having cloud economics and, that are fundamental, to having that dynamic, and how does IBM help? >> You know, I think, to give companies credit, I think most organizations have developed some type of data practice or discipline over the last, call it five years. But most of that's historical, meaning, yeah, we'll take snapshots of history. We'll use that to guide decision making. You fast-forward to what we're talking about today, just so we're on the same page, machine learning is about, you build a model, you train a model with data, and then as new data flows in, your model is constantly updating. So your ability to make decisions improves over time. That's very different from, we're doing historical reporting on data. And so I think it's encouraging that companies have kind of embraced that data discipline in the last five years, but what we're talking about today is a big next step and what we're trying to break it down to what I call the building blocks, so, back to the point on an AI ladder, what I mean by an AI ladder is, you can't do AI without machine learning. You can't do machine learning without analytics. You can't do analytics without the right data architecture. So those become the building blocks of how you get towards a future of AI. And so what I encourage companies is, if you're not ready for that AI leading edge use case, that's okay, but you can be preparing for that future now. That's what the building blocks are about. >> You know, I think we're, I know we're ahead of, you know, Jeremiah Owyang on a little bit later, but I was reading something that he had written about gut and instinct, from the C-Suite, and how, that's how companies were run, right? You had your CEO, your president, they made decisions based on their guts or their instincts. And now, you've got this whole new objective tool out there that's gold, and it's kind of taking some of the gut and instinct out of it, in a way, and maybe there are people who still can't quite grasp that, that maybe their guts and their instincts, you know, what their gut tells them, you know, is one thing, but there's pretty objective data that might indicate something else. >> Moneyball for business. >> A little bit of a clash, I mean, is there a little bit of a clash in that respect? >> I think you'd be surprise by how much decision making is still pure opinion. I mean, I see that everywhere. But we're heading more towards what you described for sure. One of the clients talking here today, AMC Networks, think it's a great example of a company that you wouldn't think of as a technology company, primarily a content producer, they make great shows, but they've kind of gone that extra step to say, we can integrate data sources from third parties, our own data about viewer habits, we can do that to change our relationship with advertisers. Like, that's a company that's really embraced this idea of being a technology company, and you can see it in their results, and so, results are not coincidence in this world anymore. It's about a practice applied to data, leveraging machine learning, on a path towards AI. If companies are doing that, they're going to be successful. >> And we're going to have the tally from AMC on, but so there's a situation where they have embraced it, that they've dealt with that culture, and data has become foundational. Now, I'm interested as to what their journey look like. What are you seeing with clients? How they break this down, the silos of data that have been built up over decades. >> I think, so they get almost like a maturity curve. You've got, and the rule I talk about is 40-40-20, where 40% of organizations are really using data just to optimize costs right now. That's okay, but that's on the lower end of the maturity curve. 40% are saying, all right, I'm starting to get into data science. I'm starting to think about how I extend to new products, new services, using data. And then 20% are on the leading edge. And that's where I'd put AMC Networks, by the way, because they've done unique things with integrating data sets and building models so that they've automated a lot of what used to be painstakingly long processes, internal processes to do it. So you've got this 40-40-20 of organizations in terms of their maturity on this. If you're not on that curve right now, you have a problem. But I'd say most are somewhere on that curve. If you're in the first 40% and you're, right now data for you is just about optimizing cost, you're going to be behind. If you're not right now, you're going to be behind in the next year, that's a problem. So I'd kind of encourage people to think about what it takes to be in the next 40%. Ultimately you want to be in the 20% that's actually leading this transformation. >> So change it to 40-20-40. That's where you want it to go, right? You want to flip that paradigm. >> I want to ask you a question. You've done a lot of M and A in the past. You spent a lot of time in Silicon Valley and Silicon Valley obviously very, very disruptive, you know, cultures and organizations and it's always been a sort of technology disruption. It seems like there's a ... another disruption going on, not just horizontal technologies, you know, cloud or mobile or social, whatever it is, but within industries. Some industries, as we've been talking, radically disrupted. Retail, taxis, certainly advertising, et cetera et cetera. Some have not yet, the client that you talked to. Do you see, technology companies generally, Silicon Valley companies specifically, as being able to pull off a sort of disruption of not only technologies but also industries and where does IBM play there? You've made a sort of, Ginni in particular has made a deal about, hey, we're not going to compete with our customers. So talking about this sort of dual disruption agenda, one on the technology side, one within industries that Apple's getting into financial services and, you know, Amazon getting into grocery, what's your take on that and where does IBM fit in that world? >> So, I mean, IBM has been in Silicon Valley for a long time, I would say probably longer than 99.9% of the companies in Silicon Valley, so, we've got a big lab there. We do a lot of innovation out of there. So love it, I mean, the culture of the valley is great for the world because it's all about being the challenger, it's about innovation, and that's tremendous. >> No fear. >> Yeah, absolutely. So, look, we work with a lot of different partners, some who are, you know, purely based in the valley. I think they challenge us. We can learn from them, and that's great. I think the one, the one misnomer that I see right now, is there's a undertone that innovation is happening in Silicon Valley and only in Silicon Valley. And I think that's a myth. Give you an example, we just, in December, we released something called Event Store which is basically our stab at reinventing the database business that's been pretty much the same for the last 30 to 40 years. And we're now ingesting millions of rows of data a second. We're doing it in a Parquet format using a Spark engine. Like, this is an amazing innovation that will change how any type of IOT use case can manage data. Now ... people don't think of IBM when they think about innovations like that because it's not the only thing we talk about. We don't have, the IBM website isn't dedicated to that single product because IBM is a much bigger company than that. But we're innovating like crazy. A lot of that is out of what we're doing in Silicon Valley and our labs around the world and so, I'm very optimistic on what we're doing in terms of innovation. >> Yeah, in fact, I think, rephrase my question. I was, you know, you're right. I mean people think of IBM as getting disrupted. I wasn't posing it, I think of you as a disruptor. I know that may sound weird to some people but in the sense that you guys made some huge bets with things like Watson on solving some of the biggest, world's problems. And so I see you as disrupting sort of, maybe yourselves. Okay, frame that. But I don't see IBM as saying, okay, we are going to now disrupt healthcare, disrupt financial services, rather we are going to help our, like some of your comp... I don't know if you'd call them competitors. Amazon, as they say, getting into content and buying grocery, you know, food stores. You guys seems to have a different philosophy. That's what I'm trying to get to is, we're going to disrupt ourselves, okay, fine. But we're not going to go hard into healthcare, hard into financial services, other than selling technology and services to those organizations, does that make sense? >> Yeah, I mean, look, our mission is to make our clients ... better at what they do. That's our mission, we want to be essential in terms of their journey to be successful in their industry. So frankly, I love it every time I see an announcement about Amazon entering another vertical space, because all of those companies just became my clients. Because they're not going to work with Amazon when they're competing with them head to head, day in, day out, so I love that. So us working with these companies to make them better through things like Watson Health, what we're doing in healthcare, it's about making companies who have built their business in healthcare, more effective at how they perform, how they drive results, revenue, ROI for their investors. That's what we do, that's what IBM has always done. >> Yeah, so it's an interesting discussion. I mean, I tend to agree. I think Silicon Valley maybe should focus on those technology disruptions. I think that they'll have a hard time pulling off that dual disruption and maybe if you broadly define Silicon Valley as Seattle and so forth, but, but it seems like that formula has worked for decades, and will continue to work. Other thoughts on sort of the progression of ML, how it gets into organizations. You know, where you see this going, again, I was saying earlier, the parlance is changing. Big data is kind of, you know, mm. Okay, Hadoop, well, that's fine. We seem to be entering this new world that's pervasive, it's embedded, it's intelligent, it's autonomous, it's self-healing, it's all these things that, you know, we aspire to. We're now back in the early innings. We're late innings of big data, that's kind of ... But early innings of this new era, what are your thoughts on that? >> You know, I'd say the biggest restriction right now I see, we talked before about somehow, sometimes companies don't have the desire, so we have to help create the desire, create the culture to go do this. Even for the companies that have a burning desire, the issue quickly becomes a skill gap. And so we're doing a lot to try to help bridge that skill gap. Let's take data science as an example. There's two worlds of data science that I would describe. There's clickers, and there's coders. Clickers want to do drag and drop. They will use traditional tools like SPSS, which we're modernizing, that's great. We want to support them if that's how they want to work and build models and deploy models. There's also this world of coders. This is people that want to do all their data science in ML, and Python, and Scala, and R, like, that's what they want to do. And so we're supporting them through things like Data Science Experience, which is built on Apache Jupiter. It's all open source tooling, it'd designed for coders. The reason I think that's important, it goes back to the point on skill sets. There is a skill gap in most companies. So if you walk in and you say, this is the only way to do this thing, you kind of excluded half the companies because they say, I can't play in that world. So we are intentionally going after a strategy that says, there's a segmentation in skill types. In places there's a gap, we can help you fill that gap. That's how we're thinking about them. >> And who does that bode well for? If you say that you were trying to close a gap, does that bode well for, we talked about the Millennial crowd coming in and so they, you know, do they have a different approach or different mental outlook on this, or is it to the mid-range employee, you know, who is open minded, I mean, but, who is the net sweet spot, you think, that say, oh, this is a great opportunity right now? >> So just take data science as an example. The clicker coder comment I made, I would put the clicker audience as mostly people that are 20 years into their career. They've been around a while. The coder audience is all the Millennials. It's all the new audience. I think the greatest beneficiary is the people that find themselves kind of stuck in the middle, which is they're kind of interested in this ... >> That straddle both sides of the line yeah? >> But they've got the skill set and the desire to do some of the new tooling and new approaches. So I think this kind of creates an opportunity for that group in the middle to say, you know, what am I going to adopt as a platform for how I go forward and how I provide leadership in my company? >> So your advice, then, as you're talking to your clients, I mean you're also talking to their workforce. In a sense, then, your advice to them is, you know, join, jump in the wave, right? You've got your, you can't straddle, you've got to go. >> And you've got to experiment, you've got to try things. Ultimately, organizations are going to gravitate to things that they like using in terms of an approach or a methodology or a tool. But that comes with experimentation, so people need to get out there and try something. >> Maybe we could talk about developers a little bit. We were talking to Dinesh earlier and you guys of course have focused on data scientists, data engineers, obviously developers. And Dinesh was saying, look, many, if not most, of the 10 million Java developers out there, they're not, like, focused around the data. That's really the data scientist's job. But then, my colleague John Furrier says, hey, data is the new development kit. You know, somebody said recently, you know, Andreessen's comment, "software is eating the world." Well, data is eating software. So if Furrier is right and that comment is right, it seems like developers increasingly have to become more data aware, fundamentally. Blockchain developers clearly are more data focused. What's your take on the developer community, where they fit into this whole AI, machine learning space? >> I was just in Las Vegas yesterday and I did a session with a bunch of our business partners. ISVs, so software companies, mostly a developer audience, and the discussion I had with them was around, you're doing, you're building great products, you're building great applications. But your product is only as good as the data and the intelligence that you embed in your product. Because you're still putting too much of a burden on the user, as opposed to having everything happen magically, if you will. So that discussion was around, how do you embed data, embed AI, into your products and do that at the forefront versus, you deliver a product and the client has to say, all right, now I need to get my data out of this application and move it somewhere else so I can do the data science that I want to do. That's what I see happening with developers. It's kind of ... getting them to think about data as opposed to just thinking about the application development framework, because that's where most of them tend to focus. >> Mm, right. >> Well, we've talked about, well, earlier on about the governance, so just curious, with Madhu, which I'll, we'll have that interview in just a little bit here. I'm kind of curious about your take on that, is that it's a little kinder, gentler, friendlier than maybe some might look at it nowadays because of some organization that it causes, within your group and some value that's being derived from that, that more efficiency, more contextual information that's, you know, more relevant, whatever. When you talk to your clients about meeting rules, regs, GDPR, all these things, how do you get them to see that it's not a black veil of doom and gloom but it really is, really more of an opportunity for them to cash in? >> You know, my favorite question to ask when I go visit clients is I say, I say, just show of hands, how many people have all the data they need to do their job? To date, nobody has ever raised their hand. >> Not too many hands up. >> The reason I phrased it that way is, that's fundamentally a governance challenge. And so, when you think about governance, I think everybody immediately thinks about compliance, GDPR, types of things you mentioned, and that's great. But there's two use cases for governance. One is compliance, the other one is self service analytics. Because if you've done data governance, then you can make your data available to everybody in the organization because you know you've got the right rules, the right permissions set up. That will change how people do their jobs and I think sometimes governance gets painted into a compliance corner, when organizations need to think about it as, this is about making data accessible to my entire workforce. That's a big change. I don't think anybody has that today. Except for the clients that we're working with, where I think we've made good strides in that. >> What's your sort of number one, two, and three, or pick one, advice for those companies that as you blogged about, don't realize yet that they're in the software business and the technology business? For them to close the ... machine intelligence, machine learning, AI gap, where should they start? >> I do think it can be basic steps. And the reason I say that is, if you go to a company that hasn't really viewed themselves as a technology company, and you start talking about machine intelligence, AI, like, everybody like, runs away scared, like it's not interesting. So I bring it back to building blocks. For a client to be great in data, and to become a technology company, you really need three platforms for how you think about data. You need a platform for how you manage your data, so think of it as data management. You need a platform for unified governance and integration, and you need a platform for data science and business analytics. And to some extent, I don't care where you start, but you've got to start with one of those. And if you do that, you know, you'll start to create a flywheel of momentum where you'll get some small successes. Then you can go in the other area, and so I just encourage everybody, start down that path. Pick one of the three. Or you may already have something going in one of them, so then pick one where you don't have something going. Just start down the path, because, those building blocks, once you have those in place, you'll be able to scale AI and ML in the future in your organization. But without that, you're going to always be limited to kind of a use case at a time. >> Yeah, and I would add, this is, you talked about it a couple times today, is that cultural aspect, that realization that in order to be data driven, you know, buzzword, you have to embrace that and drive that through the culture. Right? >> That starts at the top, right? Which is, it's not, you know, it's not normal to have a culture of, we're going to experiment, we're going to try things, half of them may not work. And so, it starts at the top in terms of how you set the tone and set that culture. >> IBM Think, we're less than a month away. CUBE is going to be there, very excited about that. First time that you guys have done Think. You've consolidated all your big, big events. What can we expect from you guys? >> I think it's going to be an amazing show. To your point, we thought about this for a while, consolidating to a single IBM event. There's no question just based on the response and the enrollment we have so far, that was the right answer. We'll have people from all over the world. A bunch of clients, we've got some great announcements that will come out that week. And for clients that are thinking about coming, honestly the best thing about it is all the education and training. We basically build a curriculum, and think of it as a curriculum around, how do we make our clients more effective at competing with the Amazons of the world, back to the other point. And so I think we build a great curriculum and it will be a great week. >> Well, if I've heard anything today, it's about, don't be afraid to dive in at the deep end, just dive, right? Get after it and, looking forward to the rest of the day. Rob, thank you for joining us here and we'll see you in about a month! >> Sounds great. >> Right around the corner. >> All right, Rob Thomas joining us here from IBM Analytics, the GM at IBM Analytics. Back with more here on theCUBE. (upbeat music)
SUMMARY :
Build Your Ladder to AI, brought to you by IBM. Good to have you with us this morning. Dave, great to see you as well. and where you sit in terms of that ladder And what you see today is a bunch of examples I mean, or a must-have. onto that ladder back to the company. So a lot of customers that I talk to And so, I think companies are starting to get there to be the hard part, Rob, I'd like you to comment You fast-forward to what we're talking about today, and it's kind of taking some of the gut But we're heading more towards what you described for sure. Now, I'm interested as to what their journey look like. to think about what it takes to be in the next 40%. That's where you want it to go, right? I want to ask you a question. So love it, I mean, the culture of the valley for the last 30 to 40 years. but in the sense that you guys made some huge bets in terms of their journey to be successful Big data is kind of, you know, mm. create the culture to go do this. The coder audience is all the Millennials. for that group in the middle to say, you know, you know, join, jump in the wave, right? so people need to get out there and try something. and you guys of course have focused on data scientists, that you embed in your product. When you talk to your clients about have all the data they need to do their job? And so, when you think about governance, and the technology business? And to some extent, I don't care where you start, that in order to be data driven, you know, buzzword, Which is, it's not, you know, it's not normal CUBE is going to be there, very excited about that. I think it's going to be an amazing show. and we'll see you in about a month! from IBM Analytics, the GM at IBM Analytics.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Amazon | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
December | DATE | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Dinesh | PERSON | 0.99+ |
AMC Networks | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Jeremiah Owyang | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Rob | PERSON | 0.99+ |
20 years | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
Munich | LOCATION | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
MIT | ORGANIZATION | 0.99+ |
10 million | QUANTITY | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
20% | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Furrier | PERSON | 0.99+ |
AMC | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
six | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
40% | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Seattle | LOCATION | 0.99+ |
Scala | TITLE | 0.99+ |
two use cases | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Python | TITLE | 0.98+ |
Andreessen | PERSON | 0.98+ |
both sides | QUANTITY | 0.98+ |
two | QUANTITY | 0.98+ |
Watson Health | ORGANIZATION | 0.98+ |
millions of rows | QUANTITY | 0.98+ |
five years | QUANTITY | 0.97+ |
next year | DATE | 0.97+ |
less than a month | QUANTITY | 0.97+ |
Madhu | PERSON | 0.97+ |
Amazons | ORGANIZATION | 0.96+ |
Vitaly Tsivin, AMC | Machine Learning Everywhere 2018
>> Voiceover: Live from New York it's theCUBE, covering Machine Learning Everywhere: Build Your Ladder to AI. Brought to you by IBM. (upbeat techno music) >> Welcome back to New York City as theCUBE continues our coverage here at IBM's Machine Learning Everywhere: Build Your Ladder to AI. Along with Dave Vellante, I'm John Walls. We're now joined by Vitaly Tsivan who is Executive Vice President at AMC Networks. And Vitaly, thanks for joining us here this morning. >> Thank you. >> I don't know how this interview is going to go, frankly. Because we've got a die-hard Yankee fan in our guest, and a Red Sox fans who bleeds Red Sox Nation. Can you guys get along for about 15 minutes? >> Dave: Maybe about 15. >> I'm glad there's a bit of space between us. >> Dave: It's given us the off-season and the Yankees have done so well. I'll be humble. Okay? (John laughs) We'll wait and see. >> All right. Just in case, I'm ready to jump in if we have to separate here. But it is good to have you here with us this morning. Thanks for making the time. First off, talk about AMC Networks a little bit. So, five U.S. networks. You said multiple international networks and great presence there. But you've had to make this transition to becoming a data company, in essence. You have content and you're making this merger in the data. How has that gone for you? And how have you done that? >> First of all, you make me happy when you say that AMC Networks have made a transition to be a data company. So, we haven't. We are using data to help our primary business, which is obviously broadcasting our content to our viewers. But yes, we use data to help to tune our business, to follow the lead that viewers are giving us. As you can imagine, in the last so many years, viewers have actually dictating how they want to watch. Whether it's streaming video rather than just turning their satellite boxes or TV boxes on, and pretty much dictating what content they want to watch. So, we have to follow, we have to adjust and be at the cutting edge all for our business. And this is where data come into play. >> How did you get there? You must have done a lot of testing, right? I mean, I remember when binge watching didn't even exist, and then all of a sudden now everybody drops 10 episodes at once. Was that a lot of A-B testing? Just analyzing data? How does a company like yours come to that realization? Or is it just, wow, the competition is doing it, we should too. Explain how -- >> Vitaly: Interesting. So, when I speak to executives, I always tell them that business intelligence and data analytics for any company is almost like an iceberg. So, you can actually see the top of it, and you enjoy it very much but there's so much underwater. So, that's what you're referring to which is that in order to be able to deliver that premium thing that's the tip of the iceberg is that we have to have state of the art data management platforms. We have to curate our own first by data. We have to acquire meaningful third party data. We have to mingle it all together. We have to employ optimization predictive algorithms on top of that. We have to employ statistics, and arm business with data-driven decisions. And then it all comes to fruition. >> Now, your company's been around for awhile. You've got an application -- You're a developer. You're an application development executive. So, you've sort of made your personal journey. I'm curious as to how the company made its journey. How did you close that gap between the data platforms that we all know, the Googles, the Facebooks, etc., which data is the central part of their organization, to where you used to be? Which probably was building, looking back doing a lot of business intelligence, decision support, and a lot of sort of asynchronous activities. How did you get from there to where you are today? >> Makes sense. So, I've been with AMC Networks for four years. Prior to that I'd been with Disney, ABC, ESPN four, six years, doing roughly the same thing. So, number one, we're utilizing ever rapidly changing technologies to get us to the right place. Number two is during those four years with AMC, we've employed various tactics. Some of them are called data democratization. So, that's actually not only get the right data sources not only process them correctly, but actually arm everyone in the company with immediate, easy access to this data. Because the entire business, data business, is all about insights. So, the insights -- And if you think of the business, if you for a minute separate business and business intelligence, then business doesn't want to know too much about business intelligence. What they want insights on a silver plate that will tell them what to do next. Now, that's the hardest thing, you can imagine, right? And so the search and drive for those insights has to come from every business person in the organization. Now, obviously, you don't expect them to build their own statistical algorithms and see the results in employee and machine learning. But if you arm them with that data at the tip of their fingers, they'll make many better decisions on a daily basis which means that they're actually coming up with their own small insights. So, there are small insights, big insights, and they're all extremely valuable. >> A big part of that is cultural as well, that mindset. Many companies that I work with, they're data is very siloed. I don't know if that was the case with your firm, maybe less prior to your joining. I'd be curious as to how you've achieved that cultural mindset shift. Cause a lot of times, people try to keep their own data. They don't want to share it. They want to keep it in a silo, gain political power. How did you address that? >> Vitaly: Absolutely. One of my conversations with the president, we were discussing the fact that if we were to go make recordings of how people talk about data in their organization today and go back in time and show them what they will be doing three years from now, they would be shocked. They wouldn't believe that. So, absolutely. So, culturally, educationally, bringing everyone into the place where they can understand data. They can take advantage of the data. It's an undertaking. But we are successful in doing that. >> Help me out here. Maybe I just have never acquired a little translation here, or simplification. So, you think about AMC. You've got programming. You've got your line up. I come on, I click, I go, I watch a movie and I enjoy it or watch my program, whatever. So, now in this new world of viewer habits changing, my behaviors are changing. What have you done? What have you looked for in terms of data and telling you about me that has now allowed you to modify your business and adapt to that. So, I mean, health data shouldn't drive that on a day to day basis in terms of how I access your programming. >> So, good example to that would be something we called TV everywhere. So, you said it yourself, obviously users or viewers are used to watching television as when the shows were provided via television. So, with new technologies, with streaming opportunities, today, they want to watch when they want to watch, and what they want to watch. So, one of the ways we accommodate them with that is that we don't just television, so we are on every available platform today and we are allowing viewers to watch our content on demand, digitally, when they want to watch it. So, that is one of the ways how we are reacting to it. And so, that puts us in the position as one of the B to C type of businesses, where we're now speaking directly to our consumers not via just the television. So, we're broadcasting, their watching which means that we understand how they watch and we try to react accordingly to that. Which is something that Netflix is bragging about is that they know the patterns, they actually kind of promote their business so we on that business too. >> Can you describe your innovation formula, if you will? How do you go about innovating? Obviously, there's data, there's technology. Presumably, there's infrastructure that scales. You have to be able to scale and have massive speed and infrastructure that heals itself. All those other things. But what's your innovation formula? How would you describe it? So, informally simple. It starts with business. I'm fortunate that business has desire to innovate. So, formulating goals is something that drives us to respond to it. So, we don't just walk around the thing, and look around and say, "Let's innovate." So, we follow the business goals with innovation. A good example is when we promote our shows. So, the major portion of our marketing campaigns falls on our own air. So, we promote our shows to our AMC viewers or WE tv viewers. When we do that, we try to optimize our campaigns to the highest level possible, to get the most out of ROI out of that. And so, we've succeeded and we managed today to get about 30% ROI on that and either just do better with our promotional campaigns or reallocate that time for other businesses. >> You were saying that after the first question, or during responding to the first question, about you saying we're really not ... We're a content company still. And we have incorporated data, but you really aren't, Dave and I have talked about this a lot, everybody's a data company now, in a way. Because you have to be. Cause you've got this hugely competitive landscape that you're operating in, right? In terms of getting more odd calls. >> That's right. >> So, it's got to be no longer just a part of what you do or a section of what you do. It's got to be embedded in what you do. Does it not? Oh, it absolutely is. I still think that it's a bit premature to call AMC Networks a data company. But to a degree, every company today is a data company. And with the culture change over the years, if I used to solicit requests and go about implementing them, today it's more of a prioritization of work because every department in the company got educated to the degree that they all want to get better. And they all want those insights from the data. They want their parts of the business to be improved. And we're venturing into new businesses. And it's quite a bit in demand. >> So, is it your aspiration to become a data company? Or is it more data-driven sort of TV network? How would you sort of view that? >> I'd like to say data-driven TV network. Of course. >> Dave: Okay. >> It's more in tune with reality. >> And so, talk about aligning with the business goals. That's kind of your starting point. You were talking earlier about a gut feel. We were joking about baseball. Moneyball for business. So, you're a data person. The data doesn't lie, etc. But insights sometimes are hard. They don't just pop out. Is that true? Do you see that changing as the time to insight, from insight to decision going to compress? What do you see there? >> The search for insights will never stop. And the more dense we are in that journey the better we are going to be as a company. The data business is so much depends on technologies. So, that when technologies matures, and we manage to employ them in a timely basis, so we simply get better from that. So, good example is machine learning. There are a ton of optimizations, optimization algorithms, forecasting algorithms that we put in place. So, for awhile it was a pinnacle of our deliveries. Now, with machine learning maturing today. We are able or trying to be in tune with the audience that is changing their behavior. So, the patterns that we would be looking for manually in the past, machine is now looking for those patterns. So, that's the perfect example for our strength to catch up with the reality. What I'm hoping for, and that's where the future is, is that one day we won't be just reacting utilizing machine learning to the change in patterns in behavior. We are actually going to be ahead of those patterns and anticipate those changes to come, and react properly. >> I was going to say, yeah, what is the next step? Because you said that you are reacting. >> Vitaly: I was ahead of your question. >> Yeah, you were. (laughter) So, I'm going to go ahead and re-ask it. >> Dave: Data guy. (laughter) >> But you've got to get to that next step of not just anticipating but almost creating, right, in your way. Creating new opportunities, creating news data to develop these insights into almost shaping viewer behavior, right? >> Vitaly: Totally. So, like I said, optimization is one avenue that we pursue and continue to pursue. Forecasting is another. But I'm talking about true predictability. I mean, something goes beyond just to say how our show will do. Even beyond, which show would do better. >> John: Can you do that? Even to the point and say these are the elements that have been successful for this genre and for this size of audience, and therefore as we develop programming, whether it's in script and casting, whatever. I mean, take it all the way down to that micro-level to developing almost these ideals, these optimal programs that are going to be better received by your audience. >> Look, it's not a big secret. Every company that is in the content business is trying to get as many The Walking Deads as they can in their portfolio. Is there a direct path to success? Probably not, otherwise everyone would have been-- >> John: Over do it. >> Yeah, would be doing that. But yeah, so those are the most critical and difficult insights to get ahold of and we're working toward that. >> Are you finding that your predictive capabilities are getting meaningfully better? Maybe you could talk about that a little bit in terms of predicting those types of successes. Or is it still a lot of trial and error? >> I'd like to say they are meaningfully better. (laughter) Look, we do, there are obviously interesting findings. There are sometimes setbacks and we learn from it, and we move forward. >> Okay, as good as the weather or better? Or worse? (laughs) >> Depends on the morning and the season. (laughter) >> Vitaly, how have your success or have your success measurements changed as we enter this world of digital and machine learning and artificial intelligence? And if so, how? >> Well, they become more and more challenging and complex. Like, I gave an example for data democratization. It was such an interesting and telling company-wide initiative. And at the time, it felt as a true achievement when everybody get access to their data on their desktops and laptops. When we look back now a few years, it was a walk in the park to achieve. So, the more complex data and objectives we set in front of ourselves, the more educated people in the company become, the more challenging it is to deliver and take the next step. And we strive to do that. >> I wonder if I can ask you a question from a developers perspective. You obviously understand the developer mindset. We were talking to Dennis earlier. He's like, "Yeah, you know, it's really the data scientists that are loving the data, taking a bath in it. The data engineers and so forth." And I was kind of pushing on that saying, "Well, but eventually the developers have to be data-oriented. Data is the new development kit. What's your take? I mean, granted the 10 million Java developers most of them are not focused on the data per se. Will that change? Is that changing? >> So, first of all, I want separate the classical IT that you just referred to, which are developers. Because this discipline has been well established whether it's Waterfall or Agile. So, every company has those departments and they serve companies well. Business intelligence is a different animal. So, most of the work, if not all of the work we do is more of an R&D type of work. It is impossible to say, in three months I'll arrive with the model that will transform this business. So, we're driving there. That's the major distinction between the two. Is it the right path for some of the data-oriented developers to move on from, let's say, IT disciplines and into BI disciplines? I would highly encourage that because the job is so much more challenging, so interesting. There's very little routine as we said. It's actually challenge, challenge, and challenge. And, you know, you look at the news the way I do, and you see that data scientists becomes the number one desired job in America. I hope that there will be more and more people in that space because as every other department was struggling to find good people, right people for the space, and even within that space, you have as you mentioned, data engineers. You have data scientists or statisticians. And now it's maturing to the point that you have people who are above and beyond that. Those who actually can envision models not to execute on them. >> Are you investigating blockchain and playing around with that at all? Is there an application in your business? >> It hasn't matured fully yet in our hands but we're looking into it. >> And the reason I ask is that there seems to me that blockchain developers are data-oriented. And those two worlds, in my view, are coming together. But it's earlier days. >> Look, I mean, we are in R&D space. And like I said, we don't know exactly, we can't fully commit to a delivery. But it's always a balance between being practical and dreaming. So, if I were to say, you know, let me jump into a blockchain right now and be ahead of the game. Maybe. But then my commitments are going to be sort of farther ahead and I'm trying to be pragmatic. >> Before we let you go, I got to give you 30 seconds on your Yankees. How do you feel about the season coming up? >> As for with every season, I'm super-excited. And I can't wait until the season starts. >> We're always excited when pitchers and catchers show up. >> That's right. (laughter) >> If I were a Yankee fan, I'd be excited too. I must admit. >> Nobody's lost a game. >> That's right. >> Vitaly, thank you for being with us here. We appreciate it. And continued success at AMC Networks. Thank you for having me. >> Back with more on theCUBE right after this. (upbeat techno music)
SUMMARY :
Brought to you by IBM. Build Your Ladder to AI. I don't know how this interview is going to go, frankly. and the Yankees have done so well. But it is good to have you here with us this morning. So, we have to follow, How did you get there? that's the tip of the iceberg is that we have to have to where you used to be? Now, that's the hardest thing, you can imagine, right? I don't know if that was the case with your firm, But we are successful in doing that. that has now allowed you to modify your business So, that is one of the ways how we are reacting to it. So, we follow the business goals with innovation. or during responding to the first question, So, it's got to be no longer just a part of what you do I'd like to say data-driven TV network. Do you see that changing as the time to insight, So, the patterns that we would be looking for Because you said that you are reacting. So, I'm going to go ahead and re-ask it. (laughter) creating news data to develop these insights So, like I said, optimization is one avenue that we pursue and therefore as we develop programming, Every company that is in the content business and difficult insights to get ahold of Are you finding that your predictive capabilities and we move forward. and the season. So, the more complex have to be data-oriented. And now it's maturing to the point that but we're looking into it. And the reason I ask is that there seems to me and be ahead of the game. Before we let you go, I got to give you 30 seconds And I can't wait until the season starts. and catchers show up. That's right. I must admit. Vitaly, thank you for being with us here. Back with more on theCUBE right after this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
AMC | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Disney | ORGANIZATION | 0.99+ |
Vitaly | PERSON | 0.99+ |
Vitaly Tsivin | PERSON | 0.99+ |
Dennis | PERSON | 0.99+ |
AMC Networks | ORGANIZATION | 0.99+ |
Vitaly Tsivan | PERSON | 0.99+ |
ABC | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
John | PERSON | 0.99+ |
America | LOCATION | 0.99+ |
10 episodes | QUANTITY | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
Red Sox | ORGANIZATION | 0.99+ |
ESPN | ORGANIZATION | 0.99+ |
first question | QUANTITY | 0.99+ |
four years | QUANTITY | 0.99+ |
30 seconds | QUANTITY | 0.99+ |
10 million | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Yankees | ORGANIZATION | 0.99+ |
New York City | LOCATION | 0.99+ |
two | QUANTITY | 0.99+ |
Googles | ORGANIZATION | 0.99+ |
Facebooks | ORGANIZATION | 0.99+ |
Yankee | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
six years | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Red Sox Nation | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
three months | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
two worlds | QUANTITY | 0.96+ |
about 15 minutes | QUANTITY | 0.96+ |
First | QUANTITY | 0.96+ |
The Walking Deads | TITLE | 0.96+ |
Machine Learning Everywhere: Build Your Ladder to AI | TITLE | 0.93+ |
this morning | DATE | 0.92+ |
four | QUANTITY | 0.91+ |
about 30% | QUANTITY | 0.91+ |
about 15 | QUANTITY | 0.9+ |
Number two | QUANTITY | 0.88+ |
Java | TITLE | 0.88+ |
2018 | DATE | 0.81+ |
one avenue | QUANTITY | 0.81+ |
Agile | TITLE | 0.81+ |
New York | LOCATION | 0.81+ |
Executive Vice President | PERSON | 0.79+ |
three years | QUANTITY | 0.73+ |
one of the ways | QUANTITY | 0.72+ |
U.S. | LOCATION | 0.67+ |
Machine Learning Everywhere | TITLE | 0.63+ |
number one | QUANTITY | 0.63+ |
theCUBE | TITLE | 0.59+ |
Voiceover | TITLE | 0.56+ |
theCUBE | ORGANIZATION | 0.43+ |
years | QUANTITY | 0.35+ |
Sam Lightstone, IBM | Machine Learning Everywhere 2018
>> Narrator: Live from New York, it's the Cube. Covering Machine Learning Everywhere: Build Your Ladder to AI. Brought to you by IBM. >> And welcome back here to New York City. We're at IBM's Machine Learning Everywhere: Build Your Ladder to AI, along with Dave Vellante, John Walls, and we're now joined by Sam Lightstone, who is an IBM fellow in analytics. And Sam, good morning. Thanks for joining us here once again on the Cube. >> Yeah, thanks a lot. Great to be back. >> Yeah, great. Yeah, good to have you here on kind of a moldy New York day here in late February. So we're talking, obviously data is the new norm, is what certainly, have heard a lot about here today and of late here from IBM. Talk to me about, in your terms, of just when you look at data and evolution and to where it's now become so central to what every enterprise is doing and must do. I mean, how do you do it? Give me a 30,000-foot level right now from your prism. >> Sure, I mean, from a super, if you just stand back, like way far back, and look at what data means to us today, it's really the thing that is separating companies one from the other. How much data do they have and can they make excellent use of it to achieve competitive advantage? And so many companies today are about data and only data. I mean, I'll give you some like really striking, disruptive examples of companies that are tremendously successful household names and it's all about the data. So the world's largest transportation company, or personal taxi, can't call it taxi, but (laughs) but, you know, Uber-- >> Yeah, right. >> Owns no cars, right? The world's largest accommodation company, Airbnb, owns no hotels, right? The world's largest distributor of motion pictures owns no movie theaters. So these companies are disrupting because they're focused on data, not on the material stuff. Material stuff is important, obviously. Somebody needs to own a car, somebody needs to own a way to view a motion picture, and so on. But data is what differentiates companies more than anything else today. And can they tap into the data, can they make sense of it for competitive advantage? And that's not only true for companies that are, you know, cloud companies. That's true for every company, whether you're a bricks and mortars organization or not. Now, one level of that data is to simply look at the data and ask questions of the data, the kinds of data that you already have in your mind. Generating reports, understanding who your customers are, and so on. That's sort of a fundamental level. But the deeper level, the exciting transformation that's going on right now, is the transformation from reporting and what we'll call business intelligence, the ability to take those reports and that insight on data and to visualize it in the way that human beings can understand it, and go much deeper into machine learning and AI, cognitive computing where we can start to learn from this data and learn at the pace of machines, and to drill into the data in a way that a human being cannot because we can't look at bajillions of bytes of data on our own, but machines can do that and they're very good at doing that. So it is a huge, that's one level. The other level is, there's so much more data now than there ever was because there's so many more devices that are now collecting data. And all of us, you know, every one of our phones is collecting data right now. Your cars are collecting data. I think there's something like 60 sensors on every car that rolls of the manufacturing line today. 60. So it's just a wild time and a very exciting time because there's so much untapped potential. And that's what we're here about today, you know. Machine learning, tapping into that unbelievable potential that's there in that data. >> So you're absolutely right on. I mean the data is foundational, or must be foundational in order to succeed in this sort of data-driven world. But it's not necessarily the center of the universe for a lot of companies. I mean, it is for the big data, you know, guys that we all know. You know, the top market cap companies. But so many organizations, they're sort of, human expertise is at the center of their universe, and data is sort of, oh yeah, bolt on, and like you say, reporting. >> Right. >> So how do they deal with that? Do they get one big giant DB2 instance and stuff all the data in there, and infuse it with MI? Is that even practical? How do they solve this problem? >> Yeah, that's a great question. And there's, again, there's a multi-layered answer to that. But let me start with the most, you know, one of the big changes, one of the massive shifts that's been going on over the last decade is the shift to cloud. And people think of the shift to cloud as, well, I don't have to own the server. Someone else will own the server. That's actually not the right way to look at it. I mean, that is one element of cloud computing, but it's not, for me, the most transformative. The big thing about the cloud is the introduction of fully-managed services. It's not just you don't own the server. You don't have to install, configure, or tune anything. Now that's directly related to the topic that you just raised, because people have expertise, domains of expertise in their business. Maybe you're a manufacturer and you have expertise in manufacturing. If you're a bank, you have expertise in banking. You may not be a high-tech expert. You may not have deep skills in tech. So one of the great elements of the cloud is that now you can use these fully managed services and you don't have to be a database expert anymore. You don't have to be an expert in tuning SQL or JSON, or yadda yadda. Someone else takes care of that for you, and that's the elegance of a fully managed service, not just that someone else has got the hardware, but they're taking care of all the complexity. And that's huge. The other thing that I would say is, you know, the companies that are really like the big data houses, they got lots of data, they've spent the last 20 years working so hard to converge their data into larger and larger data lakes. And some have been more successful than others. But everybody has found that that's quite hard to do. Data is coming in many places, in many different repositories, and trying to consolidate, you know, rip the data out, constantly ripping it out and replicating into some data lake where you, or data warehouse where you can do your analytics, is complicated. And it means in some ways you're multiplying your costs because you have the data in its original location and now you're copying it into yet another location. You've got to pay for that, too. So you're multiplying costs. So one of the things I'm very excited about at IBM is we've been working on this new technology that we've now branded it as IBM Queryplex. And that gives us the ability to query data across all of these myriad sources as if they are in one place. As if they are a single consolidated data lake, and make it all look like (snaps) one repository. And not only to the application appear as one repository, but actually tap into the processing power of every one of those data sources. So if you have 1,000 of them, we'll bring to bear the power 1,000 data sources and all that computing and all that memory on these analytics problems. >> Well, give me an example why that matters, of what would be a real-world application of that. >> Oh, sure, so there, you know, there's a couple of examples. I'll give you two extremes, two different extremes. One extreme would be what I'll call enterprise, enterprise data consolidation or virtualization, where you're a large institution and you have several of these repositories. Maybe you got some IBM repositories like DB2. Maybe you've got a little bit of Oracle and a little bit of SQL Server. Maybe you've got some open source stuff like Postgres or MySQL. You got a bunch of these and different departments use different things, and it develops over decades and to some extent you can't even control it, (laughs) right? And now you just want to get analytics on that. You just, what's this data telling me? And as long as all that data is sitting in these, you know, dozens or hundreds of different repositories, you can't tell, unless you copy it all out into a big data lake, which is expensive and complicated. So Queryplex will solve that problem. >> So it's sort of a virtual data store. >> Yeah, and one of the terms, many different terms that are used, but one of the terms that's used in the industry is data virtualization. So that would be a suitable terminology here as well. To make all that data in hundreds, thousands, even millions of possible data sources, appear as one thing, it has to tap into the processing power of all of them at once. Now, that's one extreme. Let's take another extreme, which is even more extreme, which is the IoT scenario, Internet of Things, right? Internet of Things. Imagine you've, have devices, you know, shipping containers and smart meters on buildings. You could literally have 100,000 of these or a million of these things. They're usually small; they don't usually have a lot of data on them. But they can store, usually, couple of months of data. And what's fascinating about that is that most analytics today are really on the most recent you know, 48 hours or four weeks, maybe. And that time is getting shorter and shorter, because people are doing analytics more regularly and they're interested in, just tell me what's going on recently. >> I got to geek out here, for a second. >> Please, well thanks for the warning. (laughs) >> And I know you know things, but I'm not a, I'm not a technical person, but I've been a molt. I've been around a long time. A lot of questions on data virtualization, but let me start with Queryplex. The name is really interesting to me. When I, and you're a database expert, so I'm going to tap your expertise. When I read the Google Spanner paper, I called up my colleague David Floyer, who's an ex-IBM, I said, "This is like global Sysplex. "It's a global distributed thing," And he goes, "Yeah, kind of." And I got very excited. And then my eyes started bleeding when I read the paper, but the name, Queryplex, is it a play on Sysplex? Is there-- >> It's actually, there's a long story. I don't think I can say the story on-air, but we, suffice it to say we wanted to get a name that was legally usable and also descriptive. >> Dave: Okay. >> And we went through literally hundreds and hundreds of permutations of words and we finally landed on Queryplex. But, you know, you mentioned Google Spanner. I probably should spend a moment to differentiate how what we're doing is-- >> Great, if you would. >> A different kind of thing. You know, on Google Spanner, you put data into Google Spanner. With Queryplex, you don't put data into it. >> Dave: Don't have to move it. >> You don't have to move it. You leave it where it is. You can have your data in DB2, you can have it in Oracle, you can have it in a flat file, you can have an Excel spreadsheet, and you know, think about that. An Excel spreadsheet, a collection of text files, comma delimited text files, SQL Server, Oracle, DB2, Netezza, all these things suddenly appear as one database. So that's the transformation. It's not about we'll take your data and copy it into our system, this is about leave your data where it is, and we're going to tap into your (snaps) existing systems for you and help you see them in a unified way. So it's a very different paradigm than what others have done. Part of the reason why we're so excited about it is we're, as far as we know, nobody else is really doing anything quite like this. >> And is that what gets people to the 21st century, basically, is that they have all these legacy systems and yet the conversion is much simpler, much more economical for them? >> Yeah, exactly. It's economical, it's fast. (snaps) You can deploy this in, you know, a very small amount of time. And we're here today talking about machine learning and it's a very good segue to point out in order to get to high-quality AI, you need to have a really strong foundation of an information architecture. And for the industry to show up, as some have done over the past decade, and keep telling people to re-architect their data infrastructure, keep modifying their databases and creating new databases and data lakes and warehouses, you know, it's just not realistic. And so we want to provide a different path. A path that says we're going to make it possible for you to have superb machine learning, cognitive computing, artificial intelligence, and you don't have to rebuild your information architecture. We're going to make it possible for you to leverage what you have and do something special. >> This is exciting. I wasn't aware of this capability. And we were talking earlier about the cloud and the managed service component of that as a major driver of lowering cost and complexity. There's another factor here, which is, we talked about moving data-- >> Right. >> And that's one of the most expensive components of any infrastructure. If I got to move data and the transmission costs and the latency, it's virtually impossible. Speed of light's still up. I know you guys are working on speed of light, but (Sam laughs) you'll eventually get there. >> Right. >> Maybe. But the other thing about cloud economics, and this relates to sort of Queryplex. There's this API economy. You've got virtually zero marginal costs. When you were talking, I was writing these down. You got global scale, it's never down, you've got this network effect working for you. Are you able to, are the standards there? Are you able to replicate those sort of cloud economics the APIs, the standards, that scale, even though you're not in control of this, there's not a single point of control? Can you explain sort of how that magic works? >> Yeah, well I think the API economy is for real and it's very important for us. And it's very important that, you know, we talk about API standards. There's a beautiful quote I once heard. The beautiful thing about standards is there's so many to choose from. (All laugh) And the reality is that, you know, you have standards that are official standards, and then you have the de facto standards because something just catches on and nobody blessed it. It just got popular. So that's a big part of what we're doing at IBM is being at the forefront of adopting the standards that matter. We made a big, a big investment in being Spark compatible, and, in fact, even with Queryplex. You can issue Spark SQL against Queryplex even though it's not a Spark engine, per se, but we make it look and feel like it can be Spark SQL. Another critical point here, when we talk about the API economy, and the speed of light, and movement to the cloud, and these topics you just raised, the friction of the Internet is an unbelievable friction. (John laughs) It's unbelievable. I mean, you know, when you go and watch a movie over the Internet, your home connection is just barely keeping up. I mean, you're pushing it, man. So a gigabyte, you know, a gigabyte an hour or something like that, right? Okay, and if you're a big company, maybe you have a fatter pipe. But not a lot fatter. I mean, not orders of, you're talking incredible friction. And what that means is that it is difficult for people, for companies, to en masse, move everything to the cloud. It's just not happening overnight. And, again, in the interest of doing the best possible service to our customers, that's why we've made it a fundamental element of our strategy in IBM to be a hybrid, what we call hybrid data management company, so that the APIs that we use on the cloud, they are compatible with the APIs that we use on premises. And whether that's software or private cloud. You've got software, you've got private cloud, you've got public cloud. And our APIs are going to be consistent across, and applications that you code for one will run on the other. And you can, that makes it a lot easier to migrate at your leisure when you're ready. >> Makes a lot of sense. That way you can bring cloud economics and the cloud operating model to your data, wherever the data exists. Listening to you speak, Sam, it reminds me, do you remember when Bob Metcalfe who I used to work with at IDG, predicted the collapse of the Internet? He predicted that year after year after year, in speech after speech, that it was so fragile, and you're bringing back that point of, guys, it's still, you know, a lot of friction. So that's very interesting, (laughs) as an architect. >> You think Bob's going to be happy that you brought up that he predicted the Internet was going to be its own demise? (Sam laughs) >> Well, he did it in-- >> I'm just saying. >> I'm staying out of it, man. >> He did it as a lightning rod. >> As a talking-- >> To get the industry to respond, and he had a big enough voice so he could do that. >> That it worked, right. But so I want to get back to Queryplex and the secret sauce. Somehow you're creating this data virtualization capability. What's the secret sauce behind it? >> Yeah, so I think, we're not the first to try, by the way. Actually this problem-- >> Hard problem. >> Of all these data sources all over the place, you try to make them look like one thing. People have been trying to figure out how to do that since like the '70s, okay, so, but-- >> Dave: Really hasn't worked. >> And it hasn't worked. And really, the reason why it hasn't worked is that there's been two fundamental strategies. One strategy is, you have a central coordinator that tries to speak to each of these data sources. So I've got, let's say, 10,000 data sources. I want to have one coordinator tap into each of them and have a dialogue. And what happens is that that coordinator, a server, an agent somewhere, becomes a network bottleneck. You were talking about the friction of the Internet. This is a great example of friction. One coordinator trying to speak to, you know, and collaborators becomes a point of friction. And it also becomes a point of friction not only in the Internet, but also in the computation, because he ends up doing too much of the work. There's too many things that cannot be done at the, at these edge repositories, aggregations, and joins, and so on. So all the aggregations and joins get done by this one sucker who can't keep up. >> Dave: The queue. >> Yeah, so there's a big queue, right. So that's one strategy that didn't work. The other strategy that people tried was sort of an end squared topology where every data source tries to speak to every other data source. And that doesn't scale as well. So what we've done in Queryplex is something that we think is unique and much more organic where we try to organize the universe or constellation of these data sources so that every data source speaks to a small number of peers but not a large number of peers. And that way no single source is a bottleneck, either in network or in computation. That's one trick. And the second trick is we've designed algorithms that can truly be distributed. So you can do joins in a distributed manner. You can do aggregation in a distributed manner. These are things, you know, when I say aggregation, I'm talking about simple things like a sum or an average or a median. These are super popular in, in analytic queries. Everybody wants to do a sum or an average or a median, right? But in the past, those things were hard to do in a distributed manner, getting all the participants in this universe to do some small incremental piece of the computation. So it's really these two things. Number one, this organic, dynamically forming constellation of devices. Dynamically forming a way that is latency aware. So if I'm a, if I represent a data source that's joining this universe or constellation, I'm going to try to find peers who I have a fast connection with. If all the universe of peers were out there, I'll try to find ones that are fast. And the second is having algorithms that we can all collaborate on. Those two things change the game. >> We're getting the two minute sign, and this is fascinating stuff. But so, how do you deal with the data consistency problem? You hear about eventual consistency and people using atomic clocks and-- Right, so Queryplex, you know, there's a reason we call it Queryplex not Dataplex. Queryplex is really a read-only operation. >> Dave: Oh, there you go. >> You've got all these-- >> Problem solved. (laughs) >> Problem solved. You've got all these data sources. They're already doing their, they already have data's coming in how it's coming in. >> Dave: Simple and brilliant. >> Right, and we're not changing any of that. All we're saying is, if you want to query them as one, you can query them as one. I should say a few words about the machine learning that we're doing here at the conference. We've talked about the importance of an information architecture and how that lays a foundation for machine learning. But one of the things that we're showing and demonstrating at the conference today, or at the showcase today, is how we're actually putting machine learning into the database. Create databases that learn and improve over time, learn from experience. In 1952, Arthur Samuel was a researcher at IBM who first, had one of the most fundamental breakthroughs in machine learning when he created a machine learning algorithm that will play checkers. And he programmed this checker playing game of his so it would learn over time. And then he had a great idea. He programmed it so it would play itself, thousands and thousands and thousands of times over, so it would actually learn from its own mistakes. And, you know, the evolution since then. Deep Blue playing chess and so on. The Watson Jeopardy game. We've seen tremendous potential in machine learning. We're putting into the database so databases can be smarter, faster, more consistent, and really just out of the box (snaps) performing. >> I'm glad you brought that up. I was going to ask you, because the legend Steve Mills once said to me, I had asked him a question about in-memory databases. He said ever databases have been around, in-memory databases have been around. But ML-infused databases are new. >> Sam: That's right, something totally new. >> Dave: Yeah, great. >> Well, you mentioned Deep Blue. Looking forward to having Garry Kasparov on a little bit later on here. And I know he's speaking as well. But fascinating stuff that you've covered here, Sam. We appreciate the time here. >> Thank you, thanks for having me. >> And wish you continued success, as well. >> Thank you very much. >> Sam Lightstone, IBM fellow joining us here live on the Cube. We're back with more here from New York City right after this. (electronic music)
SUMMARY :
Brought to you by IBM. and we're now joined by Sam Lightstone, Great to be back. Yeah, good to have you here on kind of a moldy New York day and it's all about the data. the kinds of data that you already have in your mind. I mean, it is for the big data, you know, and trying to consolidate, you know, rip the data out, of what would be a real-world application of that. and you have several of these repositories. Yeah, and one of the terms, Please, well thanks for the warning. And I know you know things, but I'm not a, suffice it to say we wanted to get a name that was But, you know, you mentioned Google Spanner. With Queryplex, you don't put data into it. and you know, think about that. And for the industry to show up, and the managed service component of that And that's one of the most expensive components and this relates to sort of Queryplex. And the reality is that, you know, and the cloud operating model to your data, To get the industry What's the secret sauce behind it? Yeah, so I think, we're not the first to try, by the way. you try to make them look like one thing. And really, the reason why it hasn't worked is that And the second trick is Right, so Queryplex, you know, Problem solved. You've got all these data sources. and really just out of the box (snaps) performing. because the legend Steve Mills once said to me, Well, you mentioned Deep Blue. live on the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Justin Warren | PERSON | 0.99+ |
Sanjay Poonen | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Clarke | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Diane Greene | PERSON | 0.99+ |
Michele Paluso | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sam Lightstone | PERSON | 0.99+ |
Dan Hushon | PERSON | 0.99+ |
Nutanix | ORGANIZATION | 0.99+ |
Teresa Carlson | PERSON | 0.99+ |
Kevin | PERSON | 0.99+ |
Andy Armstrong | PERSON | 0.99+ |
Michael Dell | PERSON | 0.99+ |
Pat Gelsinger | PERSON | 0.99+ |
John | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Lisa Martin | PERSON | 0.99+ |
Kevin Sheehan | PERSON | 0.99+ |
Leandro Nunez | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Alibaba | ORGANIZATION | 0.99+ |
NVIDIA | ORGANIZATION | 0.99+ |
EMC | ORGANIZATION | 0.99+ |
GE | ORGANIZATION | 0.99+ |
NetApp | ORGANIZATION | 0.99+ |
Keith | PERSON | 0.99+ |
Bob Metcalfe | PERSON | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
90% | QUANTITY | 0.99+ |
Sam | PERSON | 0.99+ |
Larry Biagini | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Brendan | PERSON | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
Peter | PERSON | 0.99+ |
Clarke Patterson | PERSON | 0.99+ |
Dinesh Nirmal, IBM | Machine Learning Everywhere 2018
>> Announcer: Live from New York, it's theCUBE, covering Machine Learning Everywhere: Build Your Ladder to AI. Brought to you by IBM. >> Welcome back to Midtown, New York. We are at Machine Learning Everywhere: Build Your Ladder to AI being put on by IBM here in late February in the Big Apple. Along with Dave Vellante, I'm John Walls. We're now joined by Dinesh Nirmal, who is the Vice President of Analytics Development and Site Executive at the IBM Silicon Valley lab, soon. Dinesh, good to see you, this morning, sir. >> Thank you, John. >> Fresh from California. You look great. >> Thanks. >> Alright, you've talked about this, and it's really your world: data, the new normal. Explain that. When you say it's the new normal, what exactly... How is it transforming, and what are people having to adjust to in terms of the new normal. >> So, if you look at data, I would say each and every one of us has become a living data set. Our age, our race, our salary. What our likes or dislikes, every business is collecting every second. I mean, every time you use your phone, that data is transmitted somewhere, stored somewhere. And, airlines for example, is looking, you know, what do you like? Do you like an aisle seat? Do you like to get home early? You know, all those data. >> All of the above, right? >> And petabytes and zettabytes of data is being generated. So now, businesses' challenge is that how do you take that data and make insights out of it to serve you as a better customer. That's where I've come to, but the biggest challenge is that, how do you deal with this tremendous amount of data? That is the challenge. And creating sites out of it. >> That's interesting. I mean, that means the definition of identity is really... For decades it's been the same, and what you just described is a whole new persona, identity of an individual. >> And now, you take the data, and you add some compliance or provisioning like GDPR on top of it, all of a sudden how do-- >> John: What is GDPR? For those who might not be familiar with it. >> Dinesh: That's the regulatory term that's used by EU to make sure that, >> In the EU. >> If me as a customer come to an enterprise, say, I don't want any of my data stored, it's up to you to go delete that data completely, right? That's the term that's being used. And that goes into effect in May. How do you make sure that that data gets completely deleted by that time the customer has... How do you get that consent from the customer to go do all those... So there's a whole lot of challenges, as data multiplies, how do you deal with the data, how do you create insights to the data, how do you create consent on the data, how do you be compliant on that data, how do you create the policies that's needed to generate that data? All those things need to be... Those are the challenges that enterprises face. >> You bring up GDPR, which, for those who are not familiar with it, actually went into effect last year but the fines go into effect this year, and the fines are onerous, like 4% of turnover, I mean it's just hideous, and the question I have for you is, this is really scary for companies because they've been trying to catch up to the big data world, and so they're just throwing big data projects all over the place, which is collecting data, oftentimes private information, and now the EU is coming down and saying, "Hey you have to be able to, if requested, delete that." A lot of times they don't even know where it is, so big challenge. Are you guys, can you help? >> Yeah, I mean, today if you look at it, the data exists all over the place. I mean, whether it's in your relational database or in your Hadoop, unstructured data, whereas you know, optics store, it exists everywhere. And you have to have a way to say where the data is and the customer has to consent given to go, for you to look at the data, for you to delete the data, all those things. We have tools that we have built and we have been in the business for a very long time for example our governance catalog where you can see all the data sources, the policies that's associated with it, the compliance, all those things. So for you to look through that catalog, and you can see which data is GDPR compliant, which data is not, which data you can delete, which data you cannot. >> We were just talking in the open, Dave made the point that many companies, you need all-stars, not just somebody who has a specialty in one particular area, but maybe somebody who's in a particular regiment and they've got to wear about five different hats. So how do you democratize data to the point that you can make these all-stars? Across all kinds of different business units or different focuses within a company, because all of a sudden people have access to great reams of information. I've never had to worry about this before. But now, you've got to spread that wealth out and make everybody valuable. >> Right, really good question. Like I said, the data is existing everywhere, and most enterprises don't want to move the data. Because it's a tremendous effort to move from an existing place to another one and make sure the applications work and all those things. We are building a data virtualization layer, a federation layer, whereby which if you are, let's say you're a business unit. You want to get access to that data. Now you can use that federational data virtualization layer without moving data, to go and grab that small piece of data, if you're a data scientist, let's say, you want only a very small piece of data that exists in your enterprise. You can go after, without moving the data, just pick that data, do your work, and build a model, for example, based on that data. So that data virtualization layer really helps because it's basically an SQL statement, if I were to simplify it. That can go after the data that exists, whether it's at relational or non-relational place, and then bring it back, have your work done, and then put that data back into work. >> I don't want to be a pessimist, because I am an optimist, but it's scary times for companies. If they're a 20th century organization, they're really built around human expertise. How to make something, how to transact something, or how to serve somebody, or consult, whatever it is. The 21st century organization, data is foundational. It's at the core, and if my data is all over the place, I wasn't born data-driven, born in the cloud, all those buzzwords, how do traditional organizations catch up? What's the starting point for them? >> Most, if not all, enterprises are moving into a data-driven economy, because it's all going to be driven by data. Now it's not just data, you have to change your applications also. Because your applications are the ones that's accessing the data. One, how do you make an application adaptable to the amount of data that's coming in? How do you make accuracy? I mean, if you're building a model, having an accurate model, generating accuracy, is key. How do you make it performant, or govern and self-secure? That's another challenge. How do you make it measurable, monitor all those things? If you take three or four core tenets, that's what the 21st century's going to be about, because as we augment our humans, or developers, with AI and machine learning, it becomes more and more critical how do you bring these three or four core tenets into the data so that, as the data grows, the applications can also scale. >> Big task. If you look at the industries that have been disrupted, taxis, hotels, books, advertising. >> Dinesh: Retail. >> Retail, thank you. Maybe less now, and you haven't seen that disruption yet in banks, insurance companies, certainly parts of government, defense, you haven't seen a big disruption yet, but it's coming. If you've got the data all over the place, you said earlier that virtually every company has to be data-driven, but a lot of companies that I talk to say, "Well, our industry is kind of insulated," or "Yeah, we're going to wait and see." That seems to me to be the recipe for disaster, what are your thoughts on that? >> I think the disruption will come from three angles. One, AI. Definitely that will change the way, blockchain, another one. When you say, we haven't seen in the financial side, blockchain is going to change that. Third is quantum computing. The way we do compute is completely going to change by quantum computing. So I think the disruption is coming. Those are the three, if I have to predict into the 21st century, that will change the way we work. I mean, AI is already doing a tremendous amount of work. Now a machine can basically look at an image and say what it is, right? We have Watson for cancer oncology, we have 400 to 500,000 patients being treated by Watson. So AI is changing, not just from an enterprise perspective, but from a socio-economic perspective and a from a human perspective, so Watson is a great example for that. But yeah, disruption is happening as we speak. >> And do you agree that foundational to AI is the data? >> Oh yeah. >> And so, with your clients, like you said, you described it, they've got data all over the place, it's all in silos, not all, but much of it is in silos. How does IBM help them be a silo-buster? >> Few things, right? One, data exists everywhere. How do you make sure you get access to the data without moving the data, that's one. But if you look at the whole lifecycle, it's about ingesting the data, bringing the data, cleaning the data, because like you said, data becomes the core. Garbage in, garbage out. You cannot get good models unless the data is clean. So there's that whole process, I would say if you're a data scientist, probably 70% of your time is spent on cleaning the data, making the data ready for building a model or for a model to consume. And then once you build that model, how do you make sure that the model gets retrained on a regular basis, how do you monitor the model, how do you govern the model, so that whole aspect goes in. And then the last piece is visualizational reporting. How do you make sure, once the model or the application is built, how do you create a report that you want to generate or you want to visualize that data. The data becomes the base layer, but then there's a whole lifecycle on top of it based on that data. >> So the formula for future innovation, then, starts with data. You add in AI, I would think that cloud economics, however we define that, is also a part of that. My sense is most companies aren't ready, what's your take? >> For the cloud, or? >> I'm talking about innovation. If we agree that innovation comes from the data plus AI plus you've got to have... By cloud economics I mean it's an API economy, you've got massive scale, those kinds of, to compete. If you look at the disruptions in taxis and retail, it's got cloud economics underneath it. So most customers don't really have... They haven't yet even mastered cloud economics, yet alone the data and the AI component. So there's a big gap. >> It's a huge challenge. How do we take the data and create insights out of the data? And not just existing data, right? The data is multiplying by the second. Every second, petabytes or zettabytes of data are being generated. So you're not thinking about the data that exists within your enterprise right now, but now the data is coming from several different places. Unstructured data, structured data, semi-structured data, how do you make sense of all of that? That is the challenge the customers face, and, if you have existing data, on top of the newcoming data, how do you predict what do you want to come out of that. >> It's really a pretty tough conundrum that some companies are in, because if you're behind the curve right now, you got a lot of catching up to do. So you think that we have to be in this space, but keeping up with this space, because the change happens so quickly, is really hard, so we have to pedal twice as fast just to get in the game. So talk about the challenge, how do you address it? How do you get somebody there to say, "Yep, here's your roadmap. "I know it's going to be hard, "but once you get there you're going to be okay, "or at least you're going to be on a level playing field." >> I look at the three D's. There's the data, there's the development of the models or the applications, and then the deployment of those models or applications into your existing enterprise infrastructure. Not only the data is changing, but that development of the models, the tools that you use to develop are also changing. If you look at just the predictive piece, I mean look from the 80's to now. You look at vanilla machine learning versus deep learning, I mean there's so many tools available. How do you bring it all together to make sense which one would you use? I think, Dave, you mentioned Hadoop was the term from a decade ago, now it's about object store and how do you make sure that data is there or JSON and all those things. Everything is changing, so how do you bring, as an enterprise, you keep up, afloat, on not only the data piece, but all the core infrastructure piece, the applications piece, the development of those models piece, and then the biggest challenge comes when you have to deploy it. Because now you have a model that you have to take and deploy in your current infrastructure, which is not easy. Because you're infusing machine learning into your legacy applications, your third-party software, software that was written in the 60's and 70's, it's not an easy task. I was at a major bank in Europe, and the CTO mentioned to me that, "Dinesh, we built our model in three weeks. "It has been 11 months, we still haven't deployed it." And that's the reality. >> There's a cultural aspect too, I think. I think it was Rob Thomas, I was reading a blog that he wrote, and he said that he was talking to a customer saying, "Thank god I'm not in the technology industry, "things change so fast I could never, "so glad I'm not a software company." And Rob's reaction was, "Uh, hang on. (laughs) "You are in the technology business, "you are a software company." And so there's that cultural mindset. And you saw it with GE, Jeffrey Immelt said, "I went to bed an industrial giant, "woke up a software company." But look at the challenges that industrial giant has had transforming, so... They need partners, they need people that have done this before, they need expertise and obviously technology, but it's people and process that always hold it up. >> I mean technology is one piece, and that's where I think companies like IBM make a huge difference. You understand enterprise. Because you bring that wealth of knowledge of working with them for decades and they understand your infrastructure, and that is a core element, like I said the last piece is the deployment piece, how do you bring that model into your existing infrastructure and make sure that it fits into that architecture. And that involves a tremendous amount of work, skills, and knowledge. >> Job security. (all laugh) >> Dinesh, thanks for being with us this morning, we appreciate that and good luck with the rest of the event, here in New York City. Back with more here on theCUBE, right after this. (calming techno music)
SUMMARY :
Brought to you by IBM. and Site Executive at the IBM Silicon Valley lab, soon. You look great. When you say it's the new normal, what exactly... I mean, every time you use your phone, how do you take that data and make insights out of it and what you just described is a whole new persona, For those who might not be familiar with it. How do you get that consent from the customer and the question I have for you is, given to go, for you to look at the data, So how do you democratize data to the point a federation layer, whereby which if you are, It's at the core, and if my data is all over the place, One, how do you make If you look at the industries that have been disrupted, Maybe less now, and you haven't seen that disruption yet When you say, we haven't seen in the financial side, like you said, you described it, how do you make sure that the model gets retrained So the formula for future innovation, If you look at the disruptions in taxis and retail, how do you predict what do you want to come out of that. So talk about the challenge, how do you address it? and how do you make sure that data is there And you saw it with GE, Jeffrey Immelt said, how do you bring that model the rest of the event, here in New York City.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Tom | PERSON | 0.99+ |
Marta | PERSON | 0.99+ |
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Chris Keg | PERSON | 0.99+ |
Laura Ipsen | PERSON | 0.99+ |
Jeffrey Immelt | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Chris O'Malley | PERSON | 0.99+ |
Andy Dalton | PERSON | 0.99+ |
Chris Berg | PERSON | 0.99+ |
Dave Velante | PERSON | 0.99+ |
Maureen Lonergan | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Paul Forte | PERSON | 0.99+ |
Erik Brynjolfsson | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Andrew McCafee | PERSON | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
Cheryl | PERSON | 0.99+ |
Mark | PERSON | 0.99+ |
Marta Federici | PERSON | 0.99+ |
Larry | PERSON | 0.99+ |
Matt Burr | PERSON | 0.99+ |
Sam | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Dave Wright | PERSON | 0.99+ |
Maureen | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Cheryl Cook | PERSON | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
$8,000 | QUANTITY | 0.99+ |
Justin Warren | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
2012 | DATE | 0.99+ |
Europe | LOCATION | 0.99+ |
Andy | PERSON | 0.99+ |
30,000 | QUANTITY | 0.99+ |
Mauricio | PERSON | 0.99+ |
Philips | ORGANIZATION | 0.99+ |
Robb | PERSON | 0.99+ |
Jassy | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Mike Nygaard | PERSON | 0.99+ |
Kickoff John Walls and Dave Vellante | Machine Learning Everywhere 2018
>> Announcer: Live from New York, it's theCUBE! Covering Machine Learning Everywhere: Build Your Ladder To AI. Brought to you by IBM. >> Well, good morning! Welcome here on theCUBE. Along with Dave Vellante, I'm John Walls. We're in Midtown New York for IBM's Machine Learning Everywhere: Build Your Ladder To AI. Great lineup of guests we have for you today, looking forward to bringing them to you, including world champion chess master Garry Kasparov a little bit later on. It's going to be fascinating. Dave, glad you're here. Dave, good to see you, sir. >> John, always a pleasure. >> How you been? >> Up from DC, you know, I was in your area last week doing some stuff with John Furrier, but I've been great. >> Stopped by the White House, drop in? >> You know, I didn't this time. No? >> No. >> Dave: My son, as you know, goes to school down there, so when I go by my hotel, I always walk by the White House, I wave. >> Just in case, right? >> No reciprocity. >> Same deal, we're in the same boat. Let's talk about what we have coming up here today. We're talking about this digital transformation that's going on within multiple industries. But you have an interesting take on it that it's a different wave, and it's a bigger wave, and it's an exciting wave right now, that digital is creating. >> Look at me, I've been around for a long time. I think we're entering a new era. You know, the great thing about theCUBE is you go to all these events, you hear the innovations, and we started theCUBE in 2010. The Big Data theme was just coming in, and it appeared, everybody was very excited. Still excited, obviously, about the data-driven concept. But we're now entering a new era. It's like every 10 years, the parlance in our industry changes. It was cloud, Big Data, SaaS, mobile, social. It just feels like, okay, we're here. We're doing that now. That's sort of a daily ritual. We used to talk about how it's early innings. It's not anymore. It's the late innings for those. I think the industry is changing. The describers of what we're entering are autonomous, pervasive, self-healing, intelligent. When you infuse artificial intelligence, I'm not crazy about that name, but when you infuse that throughout the landscape, things start to change. Data is at the center of it, but I think, John, we're going to see the parlance change. IBM, for example, uses cognitive. People use artificial intelligence. I like machine intelligence. We're trying to still figure out the names. To me, it's an indicator that things are changing. It's early innings now. What we're seeing is a whole new set of opportunities emerging, and if you think about it, it's based on this notion of digital services, where data is at the center. That's something that I want to poke at with the folks at IBM and our guests today. How are people going to build new companies? You're certainly seeing it with the likes of Uber, Airbnb, Waze. It's built on these existing cloud and security, off-the-shelf, if you will, horizontal technologies. How are new companies going to be built, what industries are going to be disruptive? Hint, every industry. But really, the key is, how will existing companies keep pace? That's what I really want to understand. >> You said, every industry's going to be disrupted, which is certainly, I think, an exciting prospect in some respects, but a little scary to some, too, right? Because they think, "No, we're fat and happy "and things are going well right now in our space, "and we know our space better than anybody." Some of those leaders might be thinking that. But as you point out, digital technology has transformed to the extent now that there's nobody safe, because you just slap this application in, you put this technology in, and I'm going to change your business overnight. >> That's right. Digital means data, data is at the center of this transformation. A colleague of mine, David Moschella, has come up with this concept of the matrix, and what the matrix is is a set of horizontal technology services. Think about cloud, or SaaS, or security, or mobile, social, all the way up the stack through data services. But when you look at the companies like Airbnb and Uber and, certainly, what Google is doing, and Facebook, and others, they're building services on top of this matrix. The matrix is comprised of vertical slices by industry and horizontal slices of technology. Disruptors are cobbling together through software and data new sets of services that are disrupting industries. The key to this, John, in my view, anyway, is that, historically, within healthcare or financial services, or insurance, or manufacturing, or education, those were very siloed. But digital and data allows companies and disruptors to traverse silos like never before. Think about it. Amazon buying Whole Foods. Apple getting into healthcare and financial services. You're seeing these big giants disrupt all of these different industries, and even smaller guys, there's certainly room for startups. But it's all around the data and the digital transformation. >> You spoke about traditional companies needing to convert, right? Needing to get caught up, perhaps, or to catch up with what's going on in that space. What do you do with your workforce in that case? You've got a bunch of great, hardworking people, embedded legacy. You feel good about where you are. And now you're coming to that workforce and saying, "Here's a new hat." >> I think that's a great question. I think the concern that one would have for traditional companies is, data is not foundational for most companies. It's not at their core. The vast majority of companies, the core are the people. You hear it all the time. "The people are our greatest asset." That, I hate to say it, but it's somewhat changing. If you look at the top five companies by market cap, their greatest asset is their data, and the people are surrounding that data. They're very, very important because they know how to leverage that data. But if you look at most traditional companies, people are at their core. Data is kind of, "Oh, we got this bolt-on," or it's in a bunch of different silos. The big question is, how do they close that gap? You're absolutely right. The key is skillsets, and the skills have to be, you know, we talk about five-tool baseball players. You're a baseball fan, as am I. Well, you need multi-tool players, those that understand not only the domain of whether it's marketing or sales or operational expertise or finance, but they also require digital expertise. They know, for example, if you're a marketing professional, they know how to do hypertargeting. They know how to leverage social. They know how to do SEO, all these digital skills, and they know how to get information that's relevant and messaging out into the marketplace and permeate that. And so, we're entering, again, this whole new world that's highly scalable, highly intelligent, pervasive, autonomous. We're going to talk about that today with a lot of their guests, with a lot of our guests, that really are kind of futurists and have thought through, I think, the changes that are coming. >> You can't have a DH anymore, right, that's what you're saying? You need a guy that can play the field. >> Not only play the field, not only a utility player, but somebody who's a utility player, but great. Best of breed at all these different skillsets. >> Machine learning, we haven't talked much about that, and another term, right, that certainly has different definitions, but certainly real specific applications to what's going on today. We'll talk a lot about ML today. Your thoughts about that, and how that squares into the artificial intelligence picture, and what we're doing with all those machines out there that are churning 24/7. >> Yeah, so, real quick, I know we're tight on time here. Artificial intelligence to me is the umbrella. Machine learning is the application of math and algorithms to solve a particular problem or answer a particular question. And then there's deep learning, which is highly focused neural networks that go deeper and deeper and deeper, and become auto-didactic, self-learning, in a manner. Those are just the very quick and rudimentary description. Machine learning to me is the starting point, and that's really where organizations really want to start to learn and begin to close the gap. >> A lot of ground to cover, and we're going to do that for you right here on theCUBE as we continue our coverage of Machine Learning Everywhere: Your Ladder To AI, coming up here, IBM hosting us in Midtown, New York. Back with more here on theCUBE in just a bit. (fast electronic music)
SUMMARY :
Brought to you by IBM. Great lineup of guests we have for you today, Up from DC, you know, I was in your area last week You know, I didn't this time. I always walk by the White House, I wave. But you have an interesting take on it that and if you think about it, and I'm going to change your business overnight. But when you look at the companies like Airbnb or to catch up with what's going on in that space. and the skills have to be, You need a guy that can play the field. Not only play the field, and what we're doing with all those machines out there of math and algorithms to solve a particular problem and we're going to do that for you right here on theCUBE
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David Moschella | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Apple | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
2010 | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Garry Kasparov | PERSON | 0.99+ |
Whole Foods | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Airbnb | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
last week | DATE | 0.98+ |
five-tool | QUANTITY | 0.98+ |
five companies | QUANTITY | 0.98+ |
Midtown, New York | LOCATION | 0.97+ |
DC | LOCATION | 0.97+ |
Waze | ORGANIZATION | 0.91+ |
Midtown New York | LOCATION | 0.9+ |
every 10 years | QUANTITY | 0.88+ |
Machine Learning Everywhere | TITLE | 0.82+ |
White House | LOCATION | 0.71+ |
2018 | DATE | 0.66+ |
theCUBE | ORGANIZATION | 0.62+ |
Kickoff | PERSON | 0.61+ |
To | TITLE | 0.51+ |
Sharad Singhal, The Machine & Michael Woodacre, HPE | HPE Discover Madrid 2017
>> Man: Live from Madrid, Spain, it's the Cube! Covering HPE Discover Madrid, 2017. Brought to you by: Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is The Cube, the leader in live tech coverage. My name is Dave Vellante, I'm here with my co-host, Peter Burris, and this is our second day of coverage of HPE's Madrid Conference, HPE Discover. Sharad Singhal is back, Director of Machine Software and Applications, HPE and Corps and Labs >> Good to be back. And Mike Woodacre is here, a distinguished engineer from Mission Critical Solutions at Hewlett-Packard Enterprise. Gentlemen, welcome to the Cube, welcome back. Good to see you, Mike. >> Good to be here. >> Superdome Flex is all the rage here! (laughs) At this show. You guys are happy about that? You were explaining off-camera that is the first jointly-engineered product from SGI and HPE, so you hit a milestone. >> Yeah, and I came into Hewett Packard Enterprise just over a year ago with the SGI Acquisition. We're already working on our next generation in memory computing platform. We basically hit the ground running, integrated the engineering teams immediately that we closed the acquisition so we could drive through the finish line and with the product announcement just recently, we're really excited to get that out into the market. Really represent the leading in memory, computing system in the industry. >> Sharad, a high performance computer, you've always been big data, needing big memories, lots of performance... How has, or has, the acquisition of SGI shaped your agenda in any way or your thinking, or advanced some of the innovations that you guys are coming up with? >> Actually, it was truly like a meeting of the minds when these guys came into HPE. We had been talking about memory-driven computing, the machine prototype, for the last two years. Some of us were aware of it, but a lot of us were not aware of it. These guys had been working essentially in parallel on similar concepts. Some of the work we had done, we were thinking in terms of our road maps and they were looking at the same things. Their road maps were looking incredibly similar to what we were talking about. As the engineering teams came about, we brought both the Superdome X technology and The UV300 technology together into this new product that Mike can talk a lot more about. From my side, I was talking about the machine and the machine research project. When I first met Mike and I started talking to him about what they were doing, my immediate reaction was, "Oh wow wait a minute, this is exactly what I need!" I was talking about something where I could take the machine concepts and deliver products to customers in the 2020 time frame. With the help of Mike and his team, we are able to now do essentially something where we can take the benefits we are describing in the machine program and- make those ideas available to customers right now. I think to me that was the fun part of this journey here. >> So what are the key problems that your team is attacking with this new offering? >> The primary use case for the Superdome Flex is really high-performance in memory database applications, typically SAP Hana is sort of the industry leading solution in that space right now. One of the key things with the Superdome Flex, you know, Flex is the active word, it's the flexibility. You can start with a small building block of four socket, three terabyte building block, and then you just connect these boxes together. The memory footprint just grows linearly. The latency across our fabric just stays constant as you add these modules together. We can deliver up to 32 processes, 48 terabytes of in-memory data in a single rack. So it's really the flexibility, sort of a pay as you grow model. As their needs grow, they don't have to throw out the infrastructure. They can add to it. >> So when you take a look ultimately at the combination, we talked a little bit about some of the new types of problems that can be addressed, but let's bring it practical to the average enterprise. What can the enterprise do today, as a consequence of this machine, that they couldn't do just a few weeks ago? >> So it sort of builds on the modularity, as Lance explained. If you ask a CEO today, "what's my database requirement going to be in two or three years?" they're like, "I hope my business is successful, I hope I'm gonna grow my needs," but I really don't know where that side is going to grow, so the flexibility to just add modules and scale up the capacity of memory to bring that- so the whole concept of in-memory databases is basically bringing your online transaction processing and your data-analytics processing together. So then you can do this in real time and instead of your data going to a data warehouse and looking at how the business is operating days or weeks or months ago, I can see how it's acting right now with the latest updates of transactions. >> So this is important. You mentioned two different things. Number one is you mentioned you can envision- or three things. You can start using modern technology immediately on an extremely modern platform. Number two, you can grow this and scale this as needs follow, because Hana in memory is not gonna have the same scaling limitations that you know, Oracle on a bunch of spinning discs had. >> Mike: Exactly. >> So, you still have the flexibility to learn and then very importantly, you can start adding new functions, including automation, because now you can put the analytics and the transaction processing together, close that loop so you can bring transactions, analytics, boom, into a piece of automation, and scale that in unprecedented ways. That's kind of three things that the business can now think about. Have I got that right? >> Yeah, that's exactly right. It lets people really understand how their business is operating in real time, look for trends, look for new signatures in how the business is operating. They can basically build on their success and basically having this sort of technology gives them a competitive advantage over their competitors so they can out-compute or out-compete and get ahead of the competition. >> But it also presumably leads to new kinds of efficiencies because you can converge, that converge word that we've heard so much. You can not just converge the hardware and converge the system software management, but you can now increasingly converge tasks. Bring those tasks in the system, but also at a business level, down onto the same platform. >> Exactly, and so moving in memory is really about bringing real time to the problem instead of batch mode processing, you bring in the real-time aspect. Humans, we're interactive, we like to ask a question, get an answer, get on to the next question in real time. When processes move from batch mode to real time, you just get a step change in the innovation that can occur. We think with this foundation, we're really enabling the industry to step forward. >> So let's create a practical example here. Let's apply this platform to a sizeable system that's looking at customer behavior patterns. Then let's imagine how we can take the e-commerce system that's actually handling order, bill, fulfillment and all those other things. We can bring those two things together not just in a way that might work, if we have someone online for five minutes, but right now. Is that kind of one of those examples that we're looking at? >> Absolutely, you can basically- you have a history of the customers you're working with. In retail when you go in a store, the store will know your history of transactions with them. They can decide if they want to offer you real time discounts on particular items. They'll also be taking in other data, weather conditions to drive their business. Suddenly there's going to be a heat wave, I want more ice cream in the store, or it's gonna be freezing next week, I'm gonna order in more coats and mittens for everyone to buy. So taking in lots of transactional data, not just the actual business transaction, but environmental data, you can accelerate your ability to provide consumers with the things they will need. >> Okay, so I remember when you guys launched Apollo. Antonio Neri was running the server division, you might have had networking to him. He did a little reveal on the floor. Antonio's actually in the house over there. >> Mike: (laughs) Next door. There was an astronaut at the reveal. We covered it on the Cube. He's always been very focused on this part of the business of the high-performance computing, and obviously the machine has been a huge project. How has the leadership been? We had a lot of skeptics early on that said you were crazy. What was the conversation like with Meg and Antonio? Were they continuously supportive, were they sometimes skeptical too? What was that like? >> So if you think about the total amount of effort we've put in the machine program, and truly speaking, that kind of effort would not be possible if the senior leadership was not behind us inside this company. Right? A lot of us in HP labs were working on it. It was not just a labs project, it was a project where our business partners were working on it. We brought together engineering teams from the business groups who understood how projects were put together. We had software people working with us who were working inside the business, we had researchers from labs working, we had supply chain partners working with us inside this project. A project of this scale and scope does not succeed if it's a handful of researchers doing this work. We had enormous support from the business side and from our leadership team. I give enormous thanks to our leadership team to allow us to do this, because it's an industry thing, not just an HP Enterprise thing. At the same time, with this kind of investment, there's clearly an expectation that we will make it real. It's taken us three years to go from, "here is a vague idea from a group of crazy people in labs," to something which actually works and is real. Frankly, the conversation in the last six months has been, "okay, so how do we actually take it to customers?" That's where the partnership with Mike and his team has become so valuable. At this point in time, we have a shared vision of where we need to take the thing. We have something where we can on-board customers right now. We have something where, frankly, even I'm working on the examples we were talking about earlier today. Not everybody can afford a 16-socket, giant machine. The Superdome Flex allows my customer, or anybody who is playing with an application to start small, something that is reasonably affordable, try that application out. If that application is working, they have the ability to scale up. This is what makes the Superdome Flex such a nice environment to work in for the types of applications I'm worrying about because it takes something which when we had started this program, people would ask us, "when will the machine product be?" From day one, we said, "the machine product will be something that might become available to you in some form or another by the end of the decade." Well, suddenly with Mike, I think I can make it happen right now. It's not quite the end of the decade yet, right? So I think that's what excited me about this partnership we have with the Superdome Flex team. The fact that they had the same vision and the same aspirations that we do. It's a platform that allows my current customers with their current applications like Mike described within the context of say, SAB Hana, a scalable platform, they can operate it now. It's also something that allows them to involve towards the future and start putting new applications that they haven't even thought about today. Those were the kinds of applications we were talking about. It makes it possible for them to move into this journey today. >> So what is the availability of Superdome Flex? Can I buy it today? >> Mike: You can buy it today. Actually, I had the pleasure of installing the first early-access system in the UK last week. We've been delivering large memory platforms to Stephen Hawking's team at Cambridge University for the last twenty years because they really like the in-memory capability to allow them, as they say, to be scientists, not computer scientists, in working through their algorithms and data. Yeah, it's ready for sale today. >> What's going on with Hawking's team? I don't know if this is fake news or not, but I saw something come across that said he says the world's gonna blow up in 600 years. (laughter) I was like, uh-oh, what's Hawking got going now? (laughs) That's gotta be fun working with those guys. >> Yeah, I know, it's been fun working with that team. Actually, what I would say following up on Sharad's comment, it's been really fun this last year, because I've sort of been following the machine from outside when the announcements were made a couple of years ago. Immediately when the acquisition closed, I was like, "tell me about the software you've been developing, tell me about the photonics and all these technologies," because boy, I can now accelerate where I want to go with the technology we've been developing. Superdome Flex is really the first step on the path. It's a better product than either company could have delivered on their own. Now over time, we can integrate other learnings and technologies from the machine research program. It's a really exciting time. >> Excellent. Gentlemen, I always love the SGI acquisitions. Thought it made a lot of sense. Great brand, kind of put SGI back on the map in a lot of ways. Gentlemen, thanks very much for coming on the Cube. >> Thank you again. >> We appreciate you. >> Mike: Thank you. >> Thanks for coming on. Alright everybody, We'll be back with our next guest right after this short break. This is the Cube, live from HGE Discover Madrid. Be right back. (energetic synth)
SUMMARY :
it's the Cube! the leader in live tech coverage. Good to be back. that is the first jointly-engineered the finish line and with the product How has, or has, the acquisition of Some of the work we had done, One of the key things with the What can the enterprise do today, so the flexibility to just add gonna have the same scaling limitations that the transaction processing together, how the business is operating. You can not just converge the hardware and the innovation that can occur. Let's apply this platform to a not just the actual business transaction, Antonio's actually in the house We covered it on the Cube. the same aspirations that we do. Actually, I had the pleasure of he says the world's gonna blow up in 600 years. Superdome Flex is really the first Gentlemen, I always love the SGI This is the Cube,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Peter Burris | PERSON | 0.99+ |
Mike | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Meg | PERSON | 0.99+ |
Sharad Singhal | PERSON | 0.99+ |
Antonio | PERSON | 0.99+ |
Mike Woodacre | PERSON | 0.99+ |
SGI | ORGANIZATION | 0.99+ |
Hawking | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
Hewlett Packard Enterprise | ORGANIZATION | 0.99+ |
Antonio Neri | PERSON | 0.99+ |
Lance | PERSON | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
48 terabytes | QUANTITY | 0.99+ |
Hewlett-Packard Enterprise | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
three years | QUANTITY | 0.99+ |
next week | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
HP | ORGANIZATION | 0.99+ |
Michael Woodacre | PERSON | 0.99+ |
Stephen Hawking | PERSON | 0.99+ |
last week | DATE | 0.99+ |
Madrid | LOCATION | 0.99+ |
Hewett Packard Enterprise | ORGANIZATION | 0.99+ |
first step | QUANTITY | 0.99+ |
2020 | DATE | 0.99+ |
Sharad | PERSON | 0.99+ |
One | QUANTITY | 0.99+ |
Cambridge University | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
HGE Discover Madrid | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
last year | DATE | 0.98+ |
three things | QUANTITY | 0.98+ |
two | QUANTITY | 0.98+ |
three terabyte | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
600 years | QUANTITY | 0.97+ |
16-socket | QUANTITY | 0.97+ |
second day | QUANTITY | 0.97+ |
The Machine | ORGANIZATION | 0.96+ |
Superdome Flex | ORGANIZATION | 0.96+ |
Madrid, Spain | LOCATION | 0.95+ |
two different things | QUANTITY | 0.95+ |
up | QUANTITY | 0.93+ |
single rack | QUANTITY | 0.91+ |
Cube | COMMERCIAL_ITEM | 0.9+ |
end | DATE | 0.9+ |
HPE Discover | EVENT | 0.88+ |
32 processes | QUANTITY | 0.88+ |
Superdome Flex | COMMERCIAL_ITEM | 0.88+ |
few weeks ago | DATE | 0.88+ |
SAB Hana | TITLE | 0.86+ |
couple of years ago | DATE | 0.86+ |
over | DATE | 0.84+ |
Number two | QUANTITY | 0.83+ |
Mission Critical Solutions | ORGANIZATION | 0.83+ |
four socket | QUANTITY | 0.82+ |
end of the decade | DATE | 0.82+ |
last six months | DATE | 0.81+ |
a year ago | DATE | 0.81+ |
earlier today | DATE | 0.8+ |
Sharad Singhal, The Machine & Matthias Becker, University of Bonn | HPE Discover Madrid 2017
>> Announcer: Live from Madrid, Spain, it's theCUBE, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is theCUBE, the leader in live tech coverage and my name is Dave Vellante, and I'm here with Peter Burris, this is day two of HPE Hewlett Packard Enterprise Discover in Madrid, this is their European version of a show that we also cover in Las Vegas, kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sharad Singal is here, he covers software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker, who's a postdoctoral researcher at the University of Bonn. Gentlemen, thanks so much for coming in theCUBE. >> Thank you. >> No problem. >> You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about, you know, something just more important, right? We're talking about lives and the human condition and >> Peter: Hard problems to solve. >> Specifically, yeah, hard problems like Alzheimer's. So Sharad, why don't we start with you, maybe talk a little bit about what this initiative is all about, what the partnership is all about, what you guys are doing. >> So we started on a project called the Machine Project about three, three and a half years ago and frankly at that time, the response we got from a lot of my colleagues in the IT industry was "You guys are crazy", (Dave laughs) right. We said we are looking at an enormous amount of data coming at us, we are looking at real time requirements on larger and larger processing coming up in front of us, and there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data, and we have to rethink how we do computing, and the real question for those of us who are in research in Hewlett Packard Labs, was if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past, this should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years, in terms of computing architectures and what we will do there. In the last three years we have gone from ideas and paper study, paper designs, and things which were made out of plastic, to a real working system. We have around Las Vegas time, we'd basically announced that we had the entire system working with actual applications running on it, 160 terabytes of memory all addressable from any processing core in 40 computing nodes around it. And the reason is, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture, as opposed to the processor, and any processor can return to any part of the data directly as if it was doing, addressing in local memory. This provides us with a degree of flexibility and freedom in compute that we never had before, and as a software person, I work in software, as a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of us in the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers, where we said, you know, this algorithm I had written off decades ago because it didn't work, but now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden, a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications, ranging from just Spark on this kind of an environment, to how do you do large scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order of, oh I can get you a 30% improvement. We are saying in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. >> So many orders of magnitude. >> Many, many orders >> When you don't have to wait for the horrible storage stack. (laughs) >> That's right, right. And these kinds of results gave us the hope that as we look forward, all of us in these new computing architectures that we are thinking through right now, will take us through this data mountain, data tsunami that we are all facing, in terms of bringing all of the data back and essentially doing real-time work on those. >> Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. >> So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases, and in their mission they are facing all diseases like Alzheimer's, Parkinson's, Multiple Sclerosis, and so on. And in particular Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improve, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much we currently can do, so the DZNE is focusing on their research efforts together with the German government in this direction, and one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that will point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE they have started on a cohort study. In the area around Bonn, they are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years, and in this process we generate a lot of data, so of course we do the usual surveys to learn a bit about them, we learn about their environments. But we also do very more detailed analysis, so we take blood samples and we analyze the complete genome, and also we acquire imaging data from the brain, so we do an MRI at an extremely high resolution with some very advanced machines we have, and all this data is accumulated because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study, so that we can later analyze the time series when in 10 years someone develops Alzheimer's we can go back through the data and see, maybe there's something interesting in there, maybe there was one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we need something new to analyze this data, and to deal with this, and when we heard about the machine, we though immediately this is a system that we would need. >> Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts, I used to live there, in Framingham, Massachusetts, >> Dave: I was actually born in Framingham. >> You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and relationship between smoking and cancer, and other really interesting problems. But they used a paper-based study with an interview base, so for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte and a half of data. You just described a couple of gigabytes of data per person, 30,000, multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data, to then do something with it, so you can generate those important inferences. >> Exactly, so with all these large amounts of data we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we have captured with each other. So there's really a lot of things we have to parse and compare. >> This brings together the idea that it's not just the volume of data. I also have to do analytics and cross all of that data together, right, so every time a scientist, one of the people who is doing biology studies or informatic studies asks a question, and they say, I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease, they then want to go through all of that data, and analyze it as as they are asking the question. Now if the amount of compute it takes to actually answer their questions takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. >> But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions, and use a small amount of data. The technology to do that's been around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem, we're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I gotta presume that there's the quid pro quo for getting people into the study, is that, you know, 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on, have I got that right? >> So, we're trying to do that, but also there are limits to this, of course. >> Of course. >> For us it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies. >> To help future generations. >> So that's not necessarily quid pro quo. >> Okay, there isn't, okay. But still, the knowledge is enough for them. >> Yeah, their incentive is they're gonna help people who have this disease down the road. >> I mean if it is not me, if it helps society in general, people are willing to do a lot. >> Yeah of course. >> Oh sure. >> Now the machine is not a product yet that's shipping, right, so how do you get access to it, or is this sort of futures, or... >> When we started talking to one another about this, we actually did not have the prototype with us. But remember that when we started down this journey for the machine three years ago, we know back then that we would have hardware somewhere in the future, but as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. It does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack, using emulation and simulation environments, where we have some simulators with essentially instruction level simulator for what the machine does, or what that prototype would have done, and we were running code on top of those simulators. We also had performance simulators, where we'd say, if we write the application this way, this is how much we think we would gain in terms of performance, and all of those applications on all of that code we were writing was actually on our large memory machines, Superdome X to be precise. So by the time we started talking to them, we had these emulation environments available, we had experience using these emulation environments on our Superdome X platform. So when they came to us and started working with us, we took their software that they brought to us, and started working within those emulation environments to see how fast we could make those problems, even within those emulation environments. So that's how we started down this track, and most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdome X platform. So even in that emulated environment, which is emulating the machine now, on course in the emulation Superdome X, for example, I can only hold 24 terabytes of data in memory. I say only 24 terabytes >> Only! because I'm looking at much larger systems, but an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right, they won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us we then started looking for problems, and I'll let Matthias comment on the problems that they brought to us, and then we can talk about how we actually solved those problems. >> So we work a lot with genomics data, and usually what we do is we have a pipeline so we connect multiple tools, and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the new optimal solution. So prior work was taking up to six days for processing, they were able to cut it to 22 minutes, and we thought, okay, this is a perfect challenge for our collaboration, and we went ahead and we took this tool, we put it on the Superdome X that was already running and stepped five minutes instead of just 22, and then we started modifying the code and in the end we were able to shrink the time down to just 30 seconds, so that's two magnitudes faster. >> We took something which was... They were able to run in 22 minutes, and that was already had been optimized by people in the field to say "I want this answer fast", and then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment, so the serious coding was starting in around March. By June we had 9X improvement, which is already a factor of 10, and since June up to now, we have gotten another factor of 10 on that application. So I'm now at a 100X faster than what the application was able to do before. >> Dave: Two orders of magnitude in a year? >> Sharad: In a year. >> Okay, we're out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? >> For us, we're really aiming to analyze our data in real time. Oftentimes when we have biological questions that we address, we analyze our data set, and then in a discussion a new question comes up, and we have to say, "Sorry, we have to process the data, "come back in a week", and our idea is to be able to generate these answers instantaneously from our data. >> And those answers will lead to what? Just better care for individuals with Alzheimer's, or potentially, as you said, making Alzheimer's a memory. >> So the idea is to identify Alzheimer long before the first symptoms are shown, because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. >> Well thank you for your great work, gentlemen, and best of luck on behalf of society, >> Thank you very much >> really appreciate you coming on theCUBE and sharing your story. You're welcome. All right, keep it right there, buddy. Peter and I will be back with our next guest right after this short break. This is theCUBE, you're watching live from Madrid, HPE Discover 2017. We'll be right back.
SUMMARY :
brought to you by Hewlett Packard Enterprise. that we also cover in Las Vegas, So Sharad, why don't we start with you, and frankly at that time, the response we got When you don't have to computing architectures that we are thinking through and how this technology gives you possible hope and in this process we generate a lot of data, So Dave lives in Massachusetts, I used to live there, is the Framingham Heart Study, which tracked people that we have captured with each other. Now if the amount of compute it takes to actually the Framingham Heart Study, you know, there are limits to this, of course. and people are really willing to donate everything So that's not necessarily But still, the knowledge is enough for them. people who have this disease down the road. I mean if it is not me, if it helps society in general, Now the machine is not a product yet and most of the results we have shown in the study that they brought to us, and then we can talk about and in the end we were able to shrink the time based on the emulation results we were seeing underneath, and we have to say, "Sorry, we have to process the data, Just better care for individuals with Alzheimer's, So the idea is to identify Alzheimer Peter and I will be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Neil | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jonathan | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Ajay Patel | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
$3 | QUANTITY | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Jonathan Ebinger | PERSON | 0.99+ |
Anthony | PERSON | 0.99+ |
Mark Andreesen | PERSON | 0.99+ |
Savannah Peterson | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Paul Gillin | PERSON | 0.99+ |
Matthias Becker | PERSON | 0.99+ |
Greg Sands | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jennifer Meyer | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Target | ORGANIZATION | 0.99+ |
Blue Run Ventures | ORGANIZATION | 0.99+ |
Robert | PERSON | 0.99+ |
Paul Cormier | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
OVH | ORGANIZATION | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Sony | ORGANIZATION | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Robin | PERSON | 0.99+ |
Red Cross | ORGANIZATION | 0.99+ |
Tom Anderson | PERSON | 0.99+ |
Andy Jazzy | PERSON | 0.99+ |
Korea | LOCATION | 0.99+ |
Howard | PERSON | 0.99+ |
Sharad Singal | PERSON | 0.99+ |
DZNE | ORGANIZATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
$2.7 million | QUANTITY | 0.99+ |
Tom | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Matthias | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Jesse | PERSON | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
James Kobielus, Wikibon | The Skinny on Machine Intelligence
>> Announcer: From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now here's your host, Dave Vellante. >> In the early days of big data and Hadoop, the focus was really on operational efficiency where ROI was largely centered on reduction of investment. Fast forward 10 years and you're seeing a plethora of activity around machine learning, and deep learning, and artificial intelligence, and deeper business integration as a function of machine intelligence. Welcome to this Cube conversation, The Skinny on Machine Intelligence. I'm Dave Vellante and I'm excited to have Jim Kobielus here up from the District area. Jim, great to see you. Thanks for coming into the office today. >> Thanks a lot, Dave, yes great to be here in beautiful Marlboro, Massachusetts. >> Yes, so you know Jim, when you think about all the buzz words in this big data business, I have to ask you, is this just sort of same wine, new bottle when we talk about all this AI and machine intelligence stuff? >> It's actually new wine. But of course there's various bottles and they have different vintages, and much of that wine is still quite tasty, and let me just break it out for you, the skinny on machine intelligence. AI as a buzzword and as a set of practices really goes back of course to the early post-World War II era, as we know Alan Turing and the Imitation Game and so forth. There are other developers, theorists, academics in the '40s and the '50s and '60s that pioneered in this field. So we don't want to give Alan Turing too much credit, but he was clearly a mathematician who laid down the theoretical framework for much of what we now call Artificial Intelligence. But when you look at Artificial Intelligence as a ever-evolving set of practices, where it began was in an area that focused on deterministic rules, rule-driven expert systems, and that was really the state of the art of AI for a long, long time. And so you had expert systems in a variety of areas that became useful or used in business, and science, and government and so forth. Cut ahead to the turn of the millennium, we are now in the 21st century, and what's different, the new wine, is big data, larger and larger data sets that can reveal great insights, patterns, correlations that might be highly useful if you have the right statistical modeling tools and approaches to be able to surface up these patterns in an automated or semi-automated fashion. So one of the core areas is what we now call machine learning, which really is using statistical models to infer correlations, anomalies, trends, and so forth in the data itself, and machine learning, the core approach for machine learning is something called Artificial Neural Networks, which is essentially modeling a statistical model along the lines of how, at a very high level, the nervous system is made up, with neurons connected by synapses, and so forth. It's an analog in statistical modeling called a perceptron. The whole theoretical framework of perceptrons actually got started in the 1950s with the first flush of AI, but didn't become a practical reality until after the turn of this millennium, really after the turn of this particular decade, 2010, when we started to see not only very large big data sets emerge and new approaches for managing it all, like Hadoop, come to the fore. But we've seen artificial neural nets get more sophisticated in terms of their capabilities, and a new approach for doing machine learning, artificial neural networks, with deeper layers of perceptrons, neurons, called deep learning has come to the fore. With deep learning, you have new algorithms like convolutional neural networks, recurrent neural networks, generative adversarial neural networks. These are different ways of surfacing up higher level abstractions in the data, for example for face recognition and object recognition, voice recognition and so forth. These all depend on this new state of the art for machine learning called deep learning. So what we have now in the year 2017 is we have quite a mania for all things AI, much of it is focused on deep learning, much of it is focused on tools that your average data scientist or your average developer increasingly can use and get very productive with and build these models and train and test them, and deploy them into working applications like going forward, things like autonomous vehicles would be impossible without this. >> Right, and we'll get some of that. But so you're saying that machine learning is essentially math that infers patterns from data. And math, it's new math, math that's been around for awhile or. >> Yeah, and inferring patterns from data has been done for a long time with software, and we have some established approaches that in many ways predate the current vogue for neural networks. We have support vector machines, and decision trees, and Bayesian logic. These are different ways of approaches statistical for inferring patterns, correlations in the data. They haven't gone away, they're a big part of the overall AI space, but it's a growing area that I've only skimmed the surface of. >> And they've been around for many many years, like SVM for example. Okay, now describe further, add some color to deep learning. You sort of painted a picture of this sort of deep layers of these machine learning algorithms and this network with some depth to it, but help us better understand the difference between machine learning and deep learning, and then ultimately AI. >> Yeah, well with machine learning generally, you know, inferring patterns from data that I said, artificial neural networks of which the deep learning networks are one subset. Artificial neural networks can be two or more layers of perceptrons or neurons, they have relationship to each other in terms of their activation according to various mathematical functions. So when you look at an artificial neural network, it basically does very complex math equations through a combination of what they call scalar functions, like multiplication and so forth, and then you have these non-linear functions, like cosine and so forth, tangent, all that kind of math playing together in these deep structures that are triggered by data, data input that's processed according to activation functions that set weights and reset the weights among all the various neural processing elements, that ultimately output something, the insight or the intelligence that you're looking for, like a yes or no, is this a face or not a face, that these incoming bits are presenting. Or it might present output in terms of categories. What category of face is this, a man, a woman, a child, or whatever. What I'm getting at is that so deep learning is more layers of these neural processing elements that are specialized to various functions to be able to abstract higher level phenomena from the data, it's not just, "Is this a face," but if it's a scene recognition deep learning network, it might recognize that this is a face that corresponds to a person named Dave who also happens to be the father in the particular family scene, and by the way this is a family scene that this deep learning network is able to ascertain. What I'm getting at is those are the higher level abstractions that deep learning algorithms of various sorts are built to identify in an automated way. >> Okay, and these in your view all fit under the umbrella of artificial intelligence, or is that sort of an uber field that we should be thinking of. >> Yeah, artificial intelligence as the broad envelope essentially refers to any number of approaches that help machines to think like humans, essentially. When you say, "Think like humans," what does that mean actually? To do predictions like humans, to look for anomalies or outliers like a human might, you know separate figure from ground for example in a scene, to identify the correlations or trends in a given scene. Like I said, to do categorization or classification based on what they're seeing in a given frame or what they're hearing in a given speech sample. So all these cognitive processes just skim the surface, or what AI is all about, automating to a great degree. When I say cognitive, but I'm also referring to affective like emotion detection, that's another set of processes that goes on in our heads or our hearts, that AI based on deep learning and so forth is able to do depending on different types of artificial neural networks are specialized particular functions, and they can only perform these functions if A, they've been built and optimized for those functions, and B, they have been trained with actual data from the phenomenon of interest. Training the algorithms with the actual data to determine how effective the algorithms are is the key linchpin of the process, 'cause without training the algorithms you don't know if the algorithm is effective for its intended purpose, so in Wikibon what we're doing is in the whole development process, DevOps cycle, for all things AI, training the models through a process called supervised learning is absolutely an essential component of ascertaining the quality of the network that you've built. >> So that's the calibration and the iteration to increase the accuracy, and like I say, the quality of the outcome. Okay, what are some of the practical applications that you're seeing for AI, and ML, and DL. >> Well, chat bots, you know voice recognition in general, Siri and Alexa, and so forth. Without machine learning, without deep learning to do speech recognition, those can't work, right? Pretty much in every field, now for example, IT service management tools of all sorts. When you have a large network that's logging data at the server level, at the application level and so forth, those data logs are too large and too complex and changing too fast for humans to be able to identify the patterns related to issues and faults and incidents. So AI, machine learning, deep learning is being used to fathom those anomalies and so forth in an automated fashion to be able to alert a human to take action, like an IT administrator, or to be able to trigger a response work flow, either human or automated. So AI within IT service management, hot hot topic, and we're seeing a lot of vendors incorporate that capability into their tools. Like I said, in the broad world we live in in terms of face recognition and Facebook, the fact is when I load a new picture of myself or my family or even with some friends or brothers in it, Facebook knows lickity-split whether it's my brother Tom or it's my wife or whoever, because of face recognition which obviously depends, well it's not obvious to everybody, depends on deep learning algorithms running inside Facebook's big data network, big data infrastructure. They're able to immediately know this. We see this all around us now, speech recognition, face recognition, and we just take it for granted that it's done, but it's done through the magic of AI. >> I want to get to the development angle scenario that you specialize in. Part of the reason why you came to Wikibon is to really focus on that whole application development angle. But before we get there, I want to follow the data for a bit 'cause you mentioned that was really the catalyst for the resurgence in AI, and last week at the Wikibon research meeting we talked about this three-tiered model. Edge, as edge piece, and then something in the middle which is this aggregation point for all this edge data, and then cloud which is where I guess all the deep modeling occurs, so sort of a three-tier model for the data flow. >> John: Yes. >> So I wonder if you could comment on that in the context of AI, it means more data, more I guess opportunities for machine learning and digital twins, and all this other cool stuff that's going on. But I'm really interested in how that is going to affect the application development and the programming model. John Farrier has a phrase that he says that, "Data is the new development kit." Well, if you got all this data that's distributed all over the place, that changes the application development model, at least you think it does. So I wonder if you could comment on that edge explosion, the data explosion as a result, and what it means for application development. >> Right, so more and more deep learning algorithms are being pushed to edge devices, by that I mean smartphones, and smart appliances like the ones that incorporate Alexa and so forth. And so what we're talking about is the algorithms themselves are being put into CPUs and FPGAs and ASICs and GPUs. All that stuff's getting embedded in everything that we're using, everything's that got autonomous, more and more devices have the ability if not to be autonomous in terms of making decisions, independent of us, or simply to serve as augmentation vehicles for our own whatever we happen to be doing thanks to the power of deep learning at the client. Okay, so when deep learning algorithms are embedded in say an internet of things edge device, what the deep learning algorithms are doing is A, they're ingesting the data through the sensors of that device, B, they're making inferences, deep learning algorithmic-driven inferences, based on that data. It might be speech recognition, face recognition, environmental sensing and being able to sense geospatially where you are and whether you're in a hospitable climate for whatever. And then the inferences might drive what we call actuation. Now in the autonomous vehicle scenario, the autonomous vehicle is equipped with all manner of sensors in terms of LiDAR and sonar and GPS and so forth, and it's taking readings all the time. It's doing inferences that either autonomously or in conjunction with inferences that are being made through deep learning and machine learning algorithms that are executing in those intermediary hubs like you described, or back in the cloud, or in a combination of all of that. But ultimately, the results of all those analytics, all those deep learning models, feed the what we call actuation of the car itself. Should it stop, should it put on the brakes 'cause it's about to hit a wall, should it turn right, should it turn left, should it slow down because it happened to have entered a new speed zone or whatever. All of the decisions, the actions that the edge device, like a car would be an edge device in this scenario, are being driven by evermore complex algorithms that are trained by data. Now, let's stay with the autonomous vehicle because that's an extreme case of a very powerful edge device. To train an autonomous vehicle you need of course lots and lots of data that's acquired from possibly a prototype that you, a Google or a Tesla, or whoever you might be, have deployed into the field or your customers are using, B, proving grounds like there's one out by my stomping ground out in Ann Arbor, a proving ground for the auto industry for self-driving vehicles and gaining enough real training data based on the operation of these vehicles in various simulated scenarios, and so forth. This data is used to build and iterate and refine the algorithms, the deep learning models that are doing the various operations of not only the vehicles in isolation but the vehicles operating as a fleet within an entire end to end transportation system. So what I'm getting at, is if you look at that three-tier model, then the edge device is the car, it's running under its own algorithms, the middle tier the hub might be a hub that's controlling a particular zone within a traffic system, like in my neck of the woods it might be a hub that's controlling congestion management among self-driving vehicles in eastern Fairfax County, Virginia. And then the cloud itself might be managing an entire fleet of vehicles, let's say you might have an entire fleet of vehicles under the control of say an Uber, or whatever is managing its own cars from a cloud-based center. So when you look at the tiering model that analytics, deep learning analytics is being performed, increasingly it will be for various, not just self-driving vehicles, through this tiered model, because the edge device needs to make decisions based on local data. The hub needs to make decisions based on a wider view of data across a wider range of edge entities. And then the cloud itself has responsibility or visibility for making deep learning driven determinations for some larger swath. And the cloud might be managing both the deep learning driven edge devices, as well as monitoring other related systems that self-driving network needs to coordinate with, like the government or whatever, or police. >> So envisioning that three-tier model then, how does the programming paradigm change and evolve as a result of that. >> Yeah, the programming paradigm is the modeling itself, the building and the training and the iterating the models generally will stay centralized, meaning to do all these functions, I mean to do modeling and training and iteration of these models, you need teams of data scientists and other developers who are both adept as to statistical modeling, who are adept at acquiring the training data, at labeling it, labeling is an important function there, and who are adept at basically developing and deploying one model after another in an iterative fashion through DevOps, through a standard release pipeline with version controls, and so forth built in, the governance built in. And that's really it needs to be a centralized function, and it's also very compute and data intensive, so you need storage resources, you need large clouds full of high performance computing, and so forth. Be able to handle these functions over and over. Now the edge devices themselves will feed in the data in just the data that is fed into the centralized platform where the training and the modeling is done. So what we're going to see is more and more centralized modeling and training with decentralized execution of the actual inferences that are driven by those models is the way it works in this distributive environment. >> It's the Holy Grail. All right, Jim, we're out of time but thanks very much for helping us unpack and giving us the skinny on machine learning. >> John: It's a fat stack. >> Great to have you in the office and to be continued. Thanks again. >> John: Sure. >> All right, thanks for watching everybody. This is Dave Vellante with Jim Kobelius, and you're watching theCUBE at the Marlboro offices. See ya next time. (upbeat music)
SUMMARY :
Announcer: From the SiliconANGLE Media office Thanks for coming into the office today. Thanks a lot, Dave, yes great to be here in beautiful So one of the core areas is what we now call math that infers patterns from data. that I've only skimmed the surface of. the difference between machine learning might recognize that this is a face that corresponds to a of artificial intelligence, or is that sort of an Training the algorithms with the actual data to determine So that's the calibration and the iteration at the server level, at the application level and so forth, Part of the reason why you came to Wikibon is to really all over the place, that changes the application development devices have the ability if not to be autonomous in terms how does the programming paradigm change and so forth built in, the governance built in. It's the Holy Grail. Great to have you in the office and to be continued. and you're watching theCUBE at the Marlboro offices.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Jim Kobelius | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Jim Kobielus | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
John Farrier | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
21st century | DATE | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
Alan Turing | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Siri | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
Wikibon | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
Alexa | TITLE | 0.99+ |
Marlboro | LOCATION | 0.99+ |
Tom | PERSON | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
10 years | QUANTITY | 0.98+ |
Ann Arbor | LOCATION | 0.98+ |
1950s | DATE | 0.98+ |
both | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
Marlboro, Massachusetts | LOCATION | 0.97+ |
one | QUANTITY | 0.96+ |
2017 | DATE | 0.95+ |
three-tier | QUANTITY | 0.95+ |
2010 | DATE | 0.95+ |
World War II | EVENT | 0.95+ |
first flush | QUANTITY | 0.94+ |
three-tier model | QUANTITY | 0.93+ |
Alan Turing | TITLE | 0.88+ |
'50s | DATE | 0.88+ |
eastern Fairfax County, Virginia | LOCATION | 0.87+ |
The Skinny on Machine Intelligence | TITLE | 0.87+ |
Wikibon | TITLE | 0.87+ |
one model | QUANTITY | 0.86+ |
'40s | DATE | 0.85+ |
Cube | ORGANIZATION | 0.84+ |
DevOps | TITLE | 0.83+ |
three-tiered | QUANTITY | 0.82+ |
one subset | QUANTITY | 0.81+ |
The Skinny | ORGANIZATION | 0.81+ |
'60s | DATE | 0.8+ |
Imitation Game | TITLE | 0.79+ |
more layers | QUANTITY | 0.74+ |
theCUBE | ORGANIZATION | 0.73+ |
SiliconANGLE Media | ORGANIZATION | 0.72+ |
post- | DATE | 0.56+ |
decade | DATE | 0.46+ |
Eng Lim Goh, HPE & Tuomas Sandholm, Strategic Machine Inc. - HPE Discover 2017
>> Announcer: Live from Las Vegas, it's theCUBE covering HPE Discover 2017, brought to you by Hewlett Packard Enterprise. >> Okay, welcome back everyone. We're live here in Las Vegas for SiliconANGLE's CUBE coverage of HPE Discover 2017. This is our seventh year of covering HPE Discover Now. HPE Discover in its second year. I'm John Furrier, my co-host Dave Vellante. We've got two great guests, two doctors, PhD's in the house here. So Eng Lim Goh, VP and SGI CTO, PhD, and Tuomas Sandholm, Professor at Carnegie Mellon University of Computer Science and also runs the marketplace lab over there, welcome to theCube guys, doctors. >> Thank you. >> Thank you. >> So the patient is on the table, it's called machine learning, AI, cloud computing. We're living in a really amazing place. I call it open bar and open source. There's so many new things being contributed to open source, so much new hardware coming on with HPE that there's a lot of innovation happening. So want to get your thoughts first on how you guys are looking at this big trend where all this new software is coming in and these new capabilities, what's the vibe, how do you look at this. You must be, Carnegie Mellon, oh this is an amazing time, thoughts. >> Yeah, it is an amazing time and I'm seeing it both on the academic side and the startup side that you know, you don't have to invest into your own custom hardware. We are using HPE with the Pittsburgh Supercomputing Center in academia, using cloud in the startups. So it really makes entry both for academic research and startups easier, and also the high end on the academic research, you don't have to worry about maintaining and staying up to speed with all of the latest hardware and networking and all that. You know it kind of. >> Focus on your research. >> Focus on the research, focus on the algorithms, focus on the AI, and the rest is taken care of. >> John: Eng talk about the supercomputer world that's now there, if you look at the abundant computer intelligent edge we're seeing genome sequencing done in minutes, the prices are dropping. I mean high performance computing used to be this magical, special thing, that you had to get a lot of money to pay for or access to. Democratization is pretty amazing can I just hear your thoughts on what you see happening. >> Yes, Yes democratization in the traditional HPC approach the goal is to prediction and forecasts. Whether the engine will stay productive, or financial forecasts, whether you should buy or sell or hold, let's use the weather as an example. In traditional HPC for the last 30 years what we do to predict tomorrows weather, what we do first is to write all the equations that models the weather. Measure today's weather and feed that in and then we apply supercomputing power in the hopes that it will predict tomorrows weather faster than tomorrow is coming. So that has been the traditional approach, but things have changed. Two big things changed in the last few years. We got these scientists that think perhaps there is a new way of doing it. Instead of calculating your prediction can you not use data intensive method to do an educated guess at your prediction and this is what you do. Instead of feeding today's weather information into the machine learning system they feed 30 years everyday, 10 thousand days. Everyday they feed the data in, the machine learning system guess at whether it will rain tomorrow. If it gets it wrong, it's okay, it just goes back to the weights that control the inputs and adjust them. Then you take the next day and feed it in again after 10 thousand tries, what started out as a wild guess becomes an educated guess, and this is how the new way of doing data intensive computing is starting to emerge using machine learning. >> Democratization is a theme I threw that out because I think it truly is happening. But let's get specific now, I mean a lot of science has been, well is climate change real, I mean this is something that is in the news. We see that in today's news cycle around climate change things of that as you mentioned weather. So there's other things, there's other financial models there's other in healthcare, in disease and there's new ways to get at things that were kind of hocus pocus maybe some science, some modeling, forecasting. What are you seeing that's right low hanging fruit right now that's going to impact lives? What key things will HPC impact besides weather? Is healthcare there, where is everyone getting excited? >> I think health and safety immediately right. Health and safety, you mentioned gene sequencing, drug designs, and you also mentioned in gene sequencing and drug design there is also safety in designing of automobiles and aircrafts. These methods have been traditionally using simulation, but more and more now they are thinking while these engines for example, are flying can you collect more data so you can predict when this engine will fail. And also predict say, when will the aircraft lands what sort of maintenance you should be applying on the engine without having to spend some time on the ground, which is unproductive time, that time on the ground diagnosing the problems. You start to see application of data intensive methods increased in order to improve safety and health. >> I think that's good and I agree with that. You could also kind of look at some of the technology perspective as to what kind of AI is going to be next and if you look back over the last five to seven years, deep learning has become a very hot part of machine learning and machine learning is part of AI. So that's really lifted that up. But what's next there is not just classification or prediction, but decision making on top of that. So we'll see AI move up the chain to actual decision making on top of just the basic machine learning. So optimization, things like that. Another category is what we call strategic reasoning. Traditionally in games like chess, or checkers and now Go, people have fallen to AI and now we did this in January in poker as well, after 14 years of research. So now we can actually take real strategic reasoning under imperfect information settings and apply it to various settings like business strategy optimization, automated negotiation, certain areas of finance, cyber security, and so forth. >> Go ahead. >> I'd like to interject, so we are very on it and impressed right. If we look back years ago IBM beat the worlds top chess player right. And that was an expert system and more recently Google Alpha Go beat even a more complex game, Go, and beat humans in that. But what the Professor has done recently is develop an even more complex game in a sense that it is incomplete information, it is poker. You don't know the other party's cards, unlike in the board game you would know right. This is very much real life in business negotiation in auctions, you don't quite know what the other party' thinking. So I believe now you are looking at ways I hope right, that poker playing AI software that can handle incomplete information, not knowing the other parties but still able to play expertly and apply that in business. >> I want to double down on that, I know Dave's got a question but I want to just follow this thread through. So the AI, in this case augmented intelligence, not so much artificial, because you're augmenting without the perfect information. It's interesting because one of the debates in the big data world has been, well the streaming of all this data is so high-velocity and so high-volume that we don't know what we're missing. Everyone's been trying to get at the perfect information in the streaming of the data. And this is where the machine learning if I get your point here, can do this meta reasoning or this reasoning on top of it to try to use that and say, hey let's not try to solve the worlds problems and boil the ocean over and understand it all, let's use that as a variable for AI. Did I get that right? >> Kind of, kind of I would say, in that it's not just a technical barrier to getting the big data, it's also kind of a strategic barrier. Companies, even if I could tell you all of my strategic information, I wouldn't want to. So you have to worry not just about not having all the information but are there other guys explicitly hiding information, misrepresenting and vice versa, you doing strategic action as well. Unlike in games like Go or chess, where it's perfect information, you need totally different kinds of algorithms to deal with these imperfect information games, like negotiation or strategic pricing where you have to think about the opponents responses. >> It's your hairy window. >> In advance. >> John: Knowing what you don't know. >> To your point about huge amounts of data we are talking about looking for a needle in a haystack. But when the data gets so big and the needles get so many you end up with a haystack of needles. So you need some augmentation to help you to deal with it. Because the humans would be inundated with the needles themselves. >> So is HPE sort of enabling AI or is AI driving HPC. >> I think it's both. >> Both, yeah. >> Eng: Yeah, that's right, both together. In fact AI is driving HPC because it is a new way of using that supercomputing power. Not just doing computer intensive calculation, but also doing it data intensive AI, machine learning. Then we are also driving AI because our customers are now asking the same questions, how do I transition from a computer intensive approach to a data intensive one also. This is where we come in. >> What are your thoughts on how this affects society, individuals, particularly students coming in. You mentioned Gary Kasparov losing to the IBM supercomputer. But he didn't stop there, he said I'm going to beat the supercomputer, and he got supercomputers and humans together and now holds a contest every year. So everybody talks about the impact of machines replacing humans and that's always happened. But what do you guys see, where's the future of work, of creativity for young people and the future of the economy. What does this all mean? >> You want to go first or second? >> You go ahead first. (Eng and Tuomas laughing) >> They love the fighting. >> This is a fun topic, yeah. There's a lot of worry about AI of course. But I think of AI as a tool, much like a hammer or a saw So It's going to make human lives better and it's already making human lives better. A lot of people don't even understand all the things that already have AI that are helping them out. There's this worry that there's going to be a super species that's AI that's going to take over humans. I don't think so, I don't think there's any demand for a super species of AI. Like a hammer and a saw, a hammer and a saw is better than a hammersaw, so I actually think of AI as better being separate tools for separate applications and that is very important for mankind and also nations and the world in the future. One example is our work on kidney exchange. We run the nationwide kidney exchange for the United Network for Organ Sharing, which saves hundreds of lives. This is an example not only that saves lives and makes better decisions than humans can. >> In terms of kidney candidates, timing, is all of that. >> That's a long story, but basically, when you have willing but incompatible live donors, incompatible with the patient they can swap their donors. Pair A gives to pair B gives to pair C gives to pair A for example. And we also co-invented this idea of chains where an altruist donor creates a while chain through our network and then the question of which combination of cycles and chains is the best solution. >> John: And no manual involvement, your machines take over the heavy lifting? >> It's hard because when the number of possible solutions is bigger than the number of atoms in the universe. So you have to have optimization AI actually make the decisions. So now our AI makes twice a week, these decisions for the country or 66% of the transplant centers in the country, twice a week. >> Dr. Goh would you would you add anything to the societal impact of AI? >> Yes, absolutely on the cross point on the saw and hammer. That's why these AI systems today are very specific. That's why some call them artificial specific intelligence, not general intelligence. Now whether a hundred years from now you take a hundred of these specific intelligence and combine them, whether you get an emergent property of general intelligence, that's something else. But for now, what they do is to help the analyst, the human, the decision maker and more and more you will see that as you train these models it's hard to make a lot of correct decisions. But ultimately there's a difference between a correct decision and, I believe, a right decision. Therefore, there always needs to be a human supervisor there to ultimately make the right decision. Of course, he will listen to the machine learning algorithm suggesting the correct answer, but ultimately the human values have to be applied to decide whether society accepts this decision. >> All models are wrong, some are useful. >> So on this thing there's a two benefits of AI. One is a this saves time, saves effort, which is a labor savings, automation. The other is better decision making. We're seeing the better decision making now become more of an important part instead of just labor savings or what have you. We're seeing that in the kidney exchange and now with strategic reasoning, now for the first time we can do better strategic reasoning than the best humans in imperfect information settings. Now it becomes almost a competitive need. You have to have, what I call, strategic augmentation as a business to be competitive. >> I want to get your final thoughts before we end the segment, this is more of a sharing component. A lot of young folks are coming in to computer science and or related sciences and they don't need to be a computer science major per se, but they have all the benefits of this goodness we're talking about here. Your advice, if both of you could share you opinion and thoughts in reaction to the trend where, the question we get all the time is what should young people be thinking about if they're going to be modeling and simulating a lot of new data scientists are coming in some are more practitioner oriented, some are more hard core. As this evolution of simulations and modeling that we're talking about have scale here changes, what should they know, what should be the best practice be for learning, applying, thoughts. >> For me you know the key thing is be comfortable about using tools. And for that I think the young chaps of the world as they come out of school they are very comfortable with that. So I think I'm actually less worried. It will be a new set of tools these intelligent tools, leverage them. If you look at the entire world as a single system what we need to do is to move our leveraging of tools up to a level where we become an even more productive society rather than worrying, of course we must be worried and then adapt to it, about jobs going to AI. Rather we should move ourselves up to leverage AI to be an even more productive world and then hopefully they will distribute that wealth to the entire human race, becomes more comfortable given the AI. >> Tuomas your thoughts? >> I think that people should be ready to actually for the unknown so you've got to be flexible in your education get the basics right because those basics don't change. You know, math, science, get that stuff solid and then be ready to, instead of thinking about I'm going to be this in my career, you should think about I'm going to be this first and then maybe something else I don't know even. >> John: Don't memorize the test you don't know you're going to take yet, be more adaptive. >> Yes, creativity is very important and adaptability and people should start thinking about that at a young age. >> Doctor thank you so much for sharing your input. What a great world we live in right now. A lot of opportunities a lot of challenges that are opportunities to solve with high performance computing, AI and whatnot. Thanks so much for sharing. This is theCUBE bringing you all the best coverage from HPE Discover. I'm John Furrier with Dave Vellante, we'll be back with more live coverage after this short break. Three days of wall to wall live coverage. We'll be right back. >> Thanks for having us.
SUMMARY :
covering HPE Discover 2017, brought to you and also runs the marketplace lab over there, So the patient is on the table, and the startup side that you know, Focus on the research, focus on the algorithms, done in minutes, the prices are dropping. and this is what you do. things of that as you mentioned weather. Health and safety, you mentioned gene sequencing, You could also kind of look at some of the technology So I believe now you are looking at ways So the AI, in this case augmented intelligence, and vice versa, you doing strategic action as well. So you need some augmentation to help you to deal with it. are now asking the same questions, and the future of the economy. (Eng and Tuomas laughing) and also nations and the world in the future. is the best solution. is bigger than the number of atoms in the universe. Dr. Goh would you would you add anything and combine them, whether you get an emergent property We're seeing that in the kidney exchange and or related sciences and they don't need to be and then adapt to it, about jobs going to AI. for the unknown so you've got to be flexible John: Don't memorize the test you don't know and adaptability and people should start thinking This is theCUBE bringing you all
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Gary Kasparov | PERSON | 0.99+ |
Tuomas Sandholm | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Tuomas | PERSON | 0.99+ |
30 years | QUANTITY | 0.99+ |
66% | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
10 thousand days | QUANTITY | 0.99+ |
January | DATE | 0.99+ |
Three days | QUANTITY | 0.99+ |
two doctors | QUANTITY | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
One | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
Eng Lim Goh | PERSON | 0.99+ |
Pittsburgh Supercomputing Center | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Both | QUANTITY | 0.99+ |
twice a week | QUANTITY | 0.99+ |
Strategic Machine Inc. | ORGANIZATION | 0.99+ |
seventh year | QUANTITY | 0.99+ |
two benefits | QUANTITY | 0.99+ |
Hewlett Packard Enterprise | ORGANIZATION | 0.99+ |
today | DATE | 0.98+ |
HPE Discover | ORGANIZATION | 0.98+ |
Carnegie Mellon | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
One example | QUANTITY | 0.98+ |
Carnegie Mellon University of Computer Science | ORGANIZATION | 0.98+ |
United Network for Organ Sharing | ORGANIZATION | 0.98+ |
two great guests | QUANTITY | 0.98+ |
second year | QUANTITY | 0.98+ |
tomorrows | DATE | 0.97+ |
Dr. | PERSON | 0.97+ |
seven years | QUANTITY | 0.97+ |
Goh | PERSON | 0.97+ |
second | QUANTITY | 0.96+ |
first time | QUANTITY | 0.96+ |
Two big | QUANTITY | 0.96+ |
Go | TITLE | 0.95+ |
10 thousand tries | QUANTITY | 0.94+ |
one | QUANTITY | 0.94+ |
next day | DATE | 0.91+ |
last few years | DATE | 0.91+ |
a hundred years | QUANTITY | 0.91+ |
single system | QUANTITY | 0.88+ |
SGI | ORGANIZATION | 0.88+ |
hundreds of lives | QUANTITY | 0.87+ |
chess | TITLE | 0.86+ |
last 30 years | DATE | 0.86+ |
pair A | OTHER | 0.85+ |
hundred | QUANTITY | 0.84+ |
HPE Discover 2017 | EVENT | 0.83+ |
HPE Discover | EVENT | 0.82+ |
pair B | OTHER | 0.81+ |
14 years | QUANTITY | 0.8+ |
SiliconANGLE | ORGANIZATION | 0.79+ |
2017 | DATE | 0.78+ |
transplant centers | QUANTITY | 0.75+ |
five | QUANTITY | 0.73+ |
Eng | PERSON | 0.72+ |
last | QUANTITY | 0.71+ |
years ago | DATE | 0.7+ |
every | QUANTITY | 0.68+ |
VP | PERSON | 0.67+ |
theCube | ORGANIZATION | 0.66+ |
pair C | OTHER | 0.59+ |
Alpha Go | COMMERCIAL_ITEM | 0.57+ |
Wrap Up - IBM Machine Learning Launch - #IBMML - #theCUBE
(jazzy intro music) [Narrator] Live from New York, it's the Cube! Covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts: Dave Vellante and Stu Miniman. >> Welcome back to New York City, everybody. This is theCUBE, the leader in live tech coverage. We've been covering, all morning, the IBM Machine Learning announcement. Essentially what IBM did is they brought Machine Learning to the z platform. My co-host and I, Stu Miniman, have been talking to a number of guests, and we're going to do a quick wrap here. You know, Stu, my take is, when we first heard about this, and the world first heard about this, we were like, "Eh, okay, that's nice, that's interesting." But what it underscores is IBM's relentless effort to continue to keep z relevant. We saw it with the early Linux stuff, we're now seeing it with all the OpenSource and Spark tooling. You're seeing IBM make big positioning efforts to bring analytics and transactions together, and the simple point is, a lot of the world's really important data runs on mainframes. You were just quoting some stats, which were pretty interesting. >> Yeah, I mean, Dave, you know, one of the biggest challenges we know in IT is migrating. Moving from one thing to another is really tough. I love the comment from Barry Baker. Well, if I need to change my platform, by the time I've moved it, that whole digital transformation, we've missed that window. It's there. We know how long that takes: months, quarters. I was actually watching Twitter, and it looks like Chris Maddern is here. Chris was the architect of Venmo, which my younger sisters, all the millennials that I know, everybody uses Venmo. He's here, and he was like, "Almost all the banks, airlines, and retailers "still run on mainframes in 2017, and it's growing. "Who knew?" You've got a guy here that's developing really cool apps that was finding this interesting, and that's an angle I've been looking at today, Dave, is how do you make it easy for developers to leverage these platforms that are already there? The developers aren't going to need to care whether it's a mainframe or a cloud or x86 underneath. IBM is giving you the options, and as a number of our guests said, they're not looking to solve all the problems here. Here's taking this really great, new type of application using Machine Learning and making it available on that platform that so many of their customers already use. >> Right, so we heard a little bit of roadmap here: the ML for z goes GA in Q1, and then we don't have specific timeframes, but we're going to see Power platform pick this up. We heard from Jean-Francois Puget that they'll have an x86 version, and then obviously a cloud version. It's unclear what that hybrid cloud will look like. It's a little fuzzy right now, but that's something that we're watching. Obviously a lot of the model development and training is going to live in the cloud, but the scoring is going to be done locally is how the data scientists like to think about these things. So again, Stu, more mainframe relevance. We've got another cycle coming soon for the mainframe. We're two years into the z13. When IBM has mainframe cycles, it tends to give a little bump to earnings. Now, granted, a smaller and smaller portion of the company's business is mainframe, but still, mainframe drags a lot of other software with it, so it remains a strategic component. So one of the questions we get a lot is what's IBM doing in so-called hardware? Of course, IBM says it's all software, but we know they're still selling boxes, right? So, all the hardware guys, EMC, Dell, IBM, HPE, et cetera. A lot of software content, but it's still a hardware business. So there's really two platforms there: there's the z and there's the Power. And those are both strategic to IBM. It sold its x86 business because it didn't see it as strategic. They just put Bob Picciano in charge of the Power business, so there's obviously real commitments to those platforms. Will they make a dent in the market share numbers? Unclear. It looks like it's steady as she goes, not dramatic increase in share. >> Yeah, and Dave, I didn't hear anybody come in here and say this offering is going to say, well let me dump x86 and go buy mainframe. That's not the target that I heard here. I would have loved to hear a little bit more as to where this fits into the broader IOT strategy. We talked a little bit on the intro, Dave. There's a lot of reasons why data's going to stick at the edge when we look at the numbers. For the huge growth of public cloud, the amount of data in public cloud hasn't caught up to the equivalent of what it would be in data centers itself. What I mean by that is, we usually spend, say 30% on average for storage costs inside a data center. If we look at public cloud, it's more around 10%. So, at AWS Reinvent, I talked to a number of the ecosystem partners, that started to see things like data lakes starting to appear in the cloud. This solution isn't in the data lake family, but it's with the analytics and everything that's happening with streaming and machine learning. It's large repositories of data and huge transactions of data that are happening in the mainframe, and just trying to squint through where all the data lives, and the new waves of technologies coming in. We heard how this can tie into some of the mobile and streaming activities that aren't on the mainframe, so that it can pull them into the other decisions, but some broader picture that I'm sure IBM will be able to give in the future. >> Well, normally you would expect a platform that is however many decades old the mainframe is, after the whole mainframe downsizing trend, you would expect there would be a managed decline in that business. I mean, you're seeing it in a lot of places now. We've talked about this, with things like Symmetrics, right? You minimize and focus the R&D investments, and you try to manage cost, you manage the decline of the business. IBM has almost sort of flipped that. They say, okay, we've got DB2, we're going to continue to invest in that platform. We've got our major subsystems, we're going to enhance the platform with Open Source technologies. We've got a big enough base that we can continue to mine perpetually. The more interesting thing to me about this announcement is it underscores how IBM is leveraging its analytics platform. So, we saw the announcement of the Watson Data Platform last September, which was sort of this end-to-end data pipeline collaboration between different persona engine, which is quite unique in the marketplace, a lot of differentiation there. Still some services. Last week at Spark Summit, I talked to some of the users and some of the partners of the Watson Data Platform. They said it's great, we love it, it's probably the most robust in the marketplace, but it's still a heavy lift. It still requires a fair amount of services, and IBM's still pushing those services. So IBM still has a large portion of the company still a services company. So, not surprising there, but as I've said many many times, the challenge IBM has is to really drive that software business, simplify the deployment and management of that software for its customers, which is something that I think it's working hard on doing. And the other thing is you're seeing IBM leverage those platforms, those analytics platforms, into different hardware segments, or hardware/cloud segments, whether it's BlueMix, z, Power, so, pushing it out through the organization. IBM still has a stack, like Oracle has a stack, so wherever it can push its own stack, it's going to do that, cuz the margins are better. At the same time, I think it understands very well, it's got to have open source choice. >> Yeah, absolutely, and that's something we heard loud and clear here, Dave, which is what we expect from IBM: choice of language, choice of framework. When I hear the public cloud guys, it's like, "Oh, well here's kind of the main focus we have, "and maybe we'll have a little bit of choice there." Absolutely the likes of Google and Amazon are working with open source, but at least first blush, when I look at things, it looks like once IBM fleshes this out -- and as we've said, it's the Spark to start and others that they're adding on -- but IBM could have a broader offering than I expect to see from some of the public cloud guys. We'll see. As you know, Dave, Google's got their cloud event in a couple of weeks in San Francisco. We'll be covering that, and of course Amazon, you expect their regular cadence of announcements that they'll make. So, definitely a new front in the Cloud Wars as it were, for machine learning. >> Excellent! Alright, Stu, we got to wrap, cuz we're broadcasting the livestream. We got to go set up for that. Thanks, I really appreciate you coming down here and co-hosting with me. Good event. >> Always happy to come down to the Big Apple, Dave. >> Alright, good. Alright, thanks for watching, everybody! So, check out SiliconAngle.com, you'll get all the new from this event and around the world. Check out SiliconAngle.tv for this and other CUBE activities, where we're going to be next. We got a big spring coming up, end of winter, big spring coming in this season. And check out WikiBon.com for all the research. Thanks guys, good job today, that's a wrap! We'll see you next time. This is theCUBE, we're out. (jazzy music)
SUMMARY :
New York, it's the Cube! a lot of the world's really important data the biggest challenges we Obviously a lot of the model a number of the ecosystem partners, the challenge IBM has is to really kind of the main focus we have, We got to go set up for that. down to the Big Apple, Dave. and around the world.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Chris | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Barry Baker | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Chris Maddern | PERSON | 0.99+ |
2017 | DATE | 0.99+ |
Bob Picciano | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Dell | ORGANIZATION | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Stu | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
Last week | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
30% | QUANTITY | 0.99+ |
two platforms | QUANTITY | 0.99+ |
two years | QUANTITY | 0.99+ |
Linux | TITLE | 0.99+ |
Alrig | PERSON | 0.99+ |
last September | DATE | 0.99+ |
Jean-Francois Puget | PERSON | 0.99+ |
first | QUANTITY | 0.99+ |
both | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Watson Data Platform | TITLE | 0.98+ |
Venmo | ORGANIZATION | 0.97+ |
Spark Summit | EVENT | 0.97+ |
Q1 | DATE | 0.96+ |
Big Apple | LOCATION | 0.96+ |
EMC | ORGANIZATION | 0.95+ |
HPE | ORGANIZATION | 0.95+ |
BlueMix | TITLE | 0.94+ |
Spark | TITLE | 0.91+ |
WikiBon.com | ORGANIZATION | 0.9+ |
IBM Machine Learning Launch | EVENT | 0.89+ |
one thing | QUANTITY | 0.86+ |
AWS Reinvent | ORGANIZATION | 0.82+ |
around 10% | QUANTITY | 0.8+ |
x86 | COMMERCIAL_ITEM | 0.78+ |
SiliconAngle.tv | ORGANIZATION | 0.77+ |
#IBMML | TITLE | 0.76+ |
z13 | COMMERCIAL_ITEM | 0.74+ |
end | DATE | 0.71+ |
Machine Learning | TITLE | 0.65+ |
x86 | TITLE | 0.62+ |
CUBE | ORGANIZATION | 0.56+ |
OpenSource | TITLE | 0.56+ |
TITLE | 0.54+ | |
Learning | TITLE | 0.5+ |
decades | QUANTITY | 0.48+ |
Symmetrics | TITLE | 0.46+ |
SiliconAngle.com | ORGANIZATION | 0.43+ |
theCUBE | ORGANIZATION | 0.41+ |
Wars | TITLE | 0.35+ |
Barry Baker, IBM - IBM Machine Learning Launch - #IBMML - #theCUBE
>> [Narrator] Live from New York, it's theCUBE! Covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts: Dave Vellante and Stu Miniman. >> Hi everybody, we're back, this is theCUBE. We're live at the IBM Machine Learning Launch Event. Barry Baker is here, he's the Vice President of Offering Management for z Systems. Welcome to theCUBE, thanks for coming on! >> Well, it's my first time, thanks for having me! >> A CUBE newbie, alright! Let's get right into it! >> [Barry Baker] Go easy! >> So, two years ago, January of 2015, we covered the z13 launch. The big theme there was bringing analytics and transactions together, z13 being the platform for that. Today, we're hearing about machine learning on mainframe. Why machine learning on mainframe, Barry? >> Well, for one, it is all about the data on the platform, and the applications that our clients have on the platform. And it becomes a very natural fit for predictive analytics and what you can get from machine learning. So whether you're trying to do churn analysis or fraud detection at the moment of the transaction, it becomes a very natural place for us to inject what is pretty advanced capability from a machine learning perspective into the mainframe environment. We're not trying to solve all analytics problems on the mainframe, we're not trying to become a data lake, but for the applications and the data that reside on the platform, we believe it's a prime use case that our clients are waiting to adopt. >> Okay, so help me think through the use case of I have all this transaction data on the mainframe. Not trying to be a data lake, but I've got this data lake elsewhere, that might be useful for some of the activity I want to do. How do I do that? I'm presuming I'm not extracting my sensitive transaction data and shipping it into the data lake. So, how am I getting access to some of that social data or other data? >> Yeah, and we just saw an example in the demo pad before, whereby the bulk of the data you want to perform scoring on, and also the machine learning on to build your models, is resident on the mainframe, but there does exist data out there. In the example we just saw, it was social data. So the demo that was done was how you can take and use IBM Bluemix and get at key pieces of social data. Not a whole mass of the volume of unstructured data that lives out there. It's not about bringing that to the platform and doing machine learning on it. It's about actually taking a subset of that data, a filtered subset that makes sense to be married with the bigger data set that sits on the platform. And so that's how we envision it. We provide a number of ways to do that through the IBM Machine Learning offering, where you can marry data sources from different places. But really, the bulk of the data needs to be on z and on the platform for it to make sense to have this workload running there. >> Okay. One of the big themes, of course, that IBM puts forth is platform modernization, application modernization. I think it kind of started with Linux on z? Maybe there were other examples, but that was a big one. I don't know what the percentage is, but a meaningful percentage of workloads running on z are Linux-based, correct? >> Yeah, so, the way I would view it is it's still today that the majority of workload on the platform is z/OS based, but Linux is one of our fastest growing workloads on the platform. And it is about how do you marry and bring other capabilities and other applications closer to the systems of record that is sitting there on z/OS. >> So, last week, at AnacondaCON, you announced Anaconda on z, certainly Spark, a lot of talk on Spark. Give us the update on the sort of tooling. >> We recognized a few years back that Spark was going to be key to our platform longer-term. So, contrary to what people have seen from z in the past, we jumped on it fast. We view it as an enabling technology, an enabling piece of infrastructure that allows for analytics solutions to be built and brought to market really rapidly. And the machine learning announcement today is proof of that. In a matter of months, we've been able to take the cloud-based IBM Watson Machine Learning offering and have the big chunk of it run on the mainframe, because of the investment we made in spark a year and a half ago, two years ago. We continue to invest in Spark, we're at 2.0.2 level. The announcement last week around Anaconda is, again, how do we continue to bring the right infrastructure, from an analytics perspective, onto the platform. And you'll see later, maybe in the session, where the roadmap for ML isn't just based on Spark. The roadmap for ML also requires us to go after and provide new runtimes and new languages on the platform, like Python and Anaconda in particular. So, it's a coordinated strategy where we're laying the foundation on the infrastructure side to enable the solutions from the analytics unit. >> Barry, when I hear about streaming, it reminds me of the general discussion we've been having with customers about digital transformation. How does mainframe fit into that digital mandate that you hear from customers? >> That's a great, great question. From our perspective, we've come out of the woods of many of our discussions with clients being about, I need to move off the platform, and rather, I need to actually leverage this platform, because the time it's going to take me to move off this platform, by the time I do that, digital's going to overwash me and I'm going to be gone." So the very first step that our clients take, and some of our leading clients take, on the platform for digital transformation, is moving toward standard RESTful APIs, taking z/OS Connect Enterprise Edition, putting that in front of their core, mission-critical applications and data stores, and enabling those assets to be exposed externally. And what's happening is those clients then build out new engaging mobile web apps that are then coming directly back to the mainframe at those high value assets. But in addition, what that is driving is a whole other set of interaction patterns that we're actually able to see on the mainframe in how they're being used. So, opening up the API channel is the first step our clients are taking. Next is how do they take the 200 billion lines of COBOL code that is out there in the wild, running on these systems, and how do they over time modernize it? And we have some leading clients that are doing very tight integration whereby they have a COBOL application, and as they want to make changes to it, we give them the ability to make changes in it, but do it in Java, or do it in another language, a more modern language, tightly integrated with the COBOL runtime. So, we call that progressive modernization. It's not about come in and replace the whole app and rewrite that thing. That's one next step on the journey, and then as the clients start to do that, they start to really need to lay down a continuous integration, continuous delivery tool chain, building a whole dev ops end-to-end flow. That's kind of the path that our clients are on for really getting much more faster and getting more productivity out of their development side of things. And in turn, the platform is now becoming a platform that they can deliver results on, just like they could on any other platform. >> That's big because a lot of customers use to complain, well, I can't get COBOL skills or, you know, and so IBM's answer was often, well, we got 'em. You can outsource it to us and that's not always the preferred approach so, glad to hear you're addressing that. On the dev ops discussion, you know, a lot of times dev ops is about breaking stuff. How about the main frame workload's all about not breaking stuff so, waterfall, more traditional methodologies are still appropriate. Can you help us understand how customers are dealing with that, sort of, schism. >> Yeah, I think dev ops, some people would come at it and say, that's just about moving fast and breaking some eggs and cleaning up the mess and then moving forward from but from our perspective it's, that's not it, right? That can't be it for our customers because of the criticality of these systems will not allow that so from our, our dev ops model is not so much about move fast and break some eggs, it's about move fast in smaller increments and in establishing clear chains and a clear pipeline with automated test suites getting executed and run at each phase of the pipeline before you move to production. So, we're not going to... And our approach is not to compromise on quality as you kind of move towards dev ops and we have, internally, our major subsystems right? So, KIX, IMS, DB2. They're all on their own journey to deliver and move towards continuous integration in dev ops internally. So, we're eating our own... We're dog fooding this here, right? We're building our own teams around this and we're not seeing a decline in quality. In fact, as we start to really fix and move testing to the left, as they call it, shift left testing, right? Earlier in the cycle you regression test. We are seeing better quality come because of that effort. >> You put forth this vision, as I said, at the top of this segment. Vision, this vision of bringing data in analytics, in transactions together. That was the Z13 announcement. But the reality is, a lot of customers would have their main frame and then they'd have, you know, some other data warehouse, some infiniband pipe, maybe to that data warehouse was there approximation of real time. So, the vision that you put forth was to consolidate that. And has that happened? Are you starting to do that? What are they doing with the data warehouse? >> So, we're starting to see it. I mean, and frankly, we have clients that struggle with that model, right? And that's precisely why we have a very strong point of view that says, if this is data that you're going to get value from, from an analytics perspective and you can use it on the platform, moving it off the platform is going to create a number of challenges for you. And we've seen it first hand. We've seen companies that ETL the data off the platform. They end up with 9, 10, 12 copies of the data. As soon as you do that, the data is, it's old, it's stale and so any insights you derive are then going to be potentially old and stale as well. The other side of it is, our customers in the industries that heavy users of the mainframe, finance, banking, healthcare. These are heavily regulated industries that are getting more regulated. And they're under more pressure to ensure governance and, in their meeting, the various regulation needs. As soon as you start to move that data off the platform, your problem just got that much harder. So, we are seeing a shift in approaches and it's going to take some time for clients to get past this, right? Because, enterprise data warehouse is a pretty big market and there's a lot of them out there but we're confident that for specific use cases, it makes a great deal of sense to leave the data where it is bring the analytics as close to that data as possible, and leverage the insight right there at the point of impact as opposed to pushing it off. >> How about the economics? So, I have talked, certainly talked to customers that understand it for a lot of the work that they're doing. Doing it on the Z platform is more cost effective than maybe, try to manage a bunch of, you know, bespoke X86 boxes, no question. But at the end of the day, there's still that CAPEX. What is IBM doing to help customers, sort of, absorb, you know, the costs and bring together, more aggressively, analytic and transaction data. >> Yeah, so, in agreement a 100%, I think we can create the best technology in the world but if we don't close on the financials, it's not going to go anywhere, it's not going to get, it's not going to move. So, from an analytics perspective, just starting at the ground level with spark, even underneath the spark layer, there are things we've done in the hardware to accelerate performance and so that's one layer. Then you move into spark. Well, spark is running on our java, our JDK and it takes advantage of using and being moved off to the ziip offload processors. So, those processors alone are lower cost than general purpose processors. We then have additionally thought this through, in terms of working with clients and seeing that, you know, a typical use case for running spark on the platform, they require three or four ziips and then a hundred, two hundred gig of additional memory. We've come at that as a, let's do a bundled offer and with you that comes in and says, for that workload, we're going to come in with a different price point for you. So, the other side of it is, we've been delivering over the last couple of years, ways to isolate workload from a software license cost perspective, right. 'Cause the other knock that people will say is, as I add new workload it impacts all the rest of my software Well, no. There are multiple paths forward for you to isolate that workload, add new workload to the platform and not have it impact your existing MLC charges so we continue to actually evolve that and make that easier to do but that's something we're very focused on. >> But that's more than just, sort of an LPAR or... >> Yeah, so there's other ways we could do that with... (mumbles) We're IBM so there's acronyms right. So there's ZCAP and there's all other pricing mechanisms that we can take advantage of to help you, you know, the way I simply say it is, we have to enable for new workload, we need to enable the pricing to be supportive of growth, right, not protecting and so we are very focused on, how do we do this in the right way that clients can adopt it, take advantage of the capabilities and also do it in a cost effective way. >> And what about security? That's another big theme that you guys have put forth. What's new there? >> Yeah so we have a lot underway from the security perspective. I'm going to say stay tuned, more to come there but there's a heavy investment, again, going back to what our clients are struggling with and that we hear in day in and day out, is around how do I ensure, you know, and how do I do encryption pervasively across the platform for all of the data being managed by the system, how do I do that with ease, and how do I do that without having to drive changes at the application layer, having to drive operational changes. How do I enable these systems to get that much more secure with these and low cost. >> Right, because if you... In an ideal world you'd encrypt everything but there's a cost of doing that. There are some downstream nuances with things like compression >> Yup. >> And so forth so... Okay, so more to come there. We'll stay tuned. >> More to come. >> Alright, we'll give you the final word. Big day for you, guys so congratulations on the announcement You got a bunch of customers who're comin' in very shortly. >> Yeah no... It's extremely, we're excited to be here. We think that the combination of IBM systems, working with the IBM analytics team to put forward an offering that pulls key aspects of Watson and delivers it on the mainframe is something that will get noticed and actually solve some real challenges so we're excited. >> Great. Barry, thanks very much for coming to theCUBE, appreciate it >> Thanks for having me. Thanks for going easy on me. >> You're welcome. Keep it right there. We'll be back with our next guest, right after this short break. (techno music)
SUMMARY :
brought to you by IBM. Barry Baker is here, he's the analytics and transactions together, that reside on the platform, we believe So, how am I getting access to and also the machine learning on to build your models, One of the big themes, of course, that the majority of workload on the platform is z/OS based, you announced Anaconda on z, and have the big chunk of it run on the mainframe, it reminds me of the general discussion we've been having because the time it's going to take me to move On the dev ops discussion, you know, a lot of times dev ops Earlier in the cycle you regression test. So, the vision that you put forth was to consolidate that. moving it off the platform is going to create But at the end of the day, there's still that CAPEX. and make that easier to do but the way I simply say it is, we have to enable That's another big theme that you guys have put forth. and that we hear in day in and day out, but there's a cost of doing that. Okay, so more to come there. Alright, we'll give you the final word. and delivers it on the mainframe Barry, thanks very much for coming to theCUBE, appreciate it Thanks for going easy on me. We'll be back with our next guest,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Barry | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Barry Baker | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
9 | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
COBOL | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
100% | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
10 | QUANTITY | 0.99+ |
Linux | TITLE | 0.99+ |
first step | QUANTITY | 0.99+ |
two years ago | DATE | 0.99+ |
a year and a half ago | DATE | 0.99+ |
200 billion lines | QUANTITY | 0.99+ |
12 copies | QUANTITY | 0.99+ |
first time | QUANTITY | 0.98+ |
one layer | QUANTITY | 0.98+ |
January of 2015 | DATE | 0.98+ |
today | DATE | 0.98+ |
Today | DATE | 0.98+ |
three | QUANTITY | 0.98+ |
Spark | TITLE | 0.97+ |
Anaconda | ORGANIZATION | 0.97+ |
z/OS | TITLE | 0.96+ |
2.0.2 level | QUANTITY | 0.94+ |
IBM Machine Learning Launch Event | EVENT | 0.94+ |
each phase | QUANTITY | 0.93+ |
AnacondaCON | ORGANIZATION | 0.93+ |
X86 | COMMERCIAL_ITEM | 0.92+ |
z | TITLE | 0.91+ |
Anaconda | TITLE | 0.91+ |
Vice President | PERSON | 0.91+ |
z Systems | ORGANIZATION | 0.9+ |
java | TITLE | 0.9+ |
z13 | TITLE | 0.9+ |
a hundred | QUANTITY | 0.89+ |
one | QUANTITY | 0.88+ |
four ziips | QUANTITY | 0.88+ |
ML | TITLE | 0.86+ |
One | QUANTITY | 0.85+ |
Bluemix | COMMERCIAL_ITEM | 0.82+ |
z/OS Connect Enterprise Edition | TITLE | 0.76+ |
Spark | ORGANIZATION | 0.76+ |
two hundred gig | QUANTITY | 0.75+ |
a few years back | DATE | 0.74+ |
last | DATE | 0.69+ |
first | QUANTITY | 0.69+ |
Anaconda | LOCATION | 0.68+ |
z13 | COMMERCIAL_ITEM | 0.68+ |
one next | QUANTITY | 0.66+ |
theCUBE | ORGANIZATION | 0.64+ |
Watson | TITLE | 0.64+ |
Z13 | ORGANIZATION | 0.62+ |
spark | ORGANIZATION | 0.61+ |
ZCAP | TITLE | 0.6+ |
#IBMML | TITLE | 0.57+ |
lake | LOCATION | 0.57+ |
KIX | TITLE | 0.46+ |
years | DATE | 0.44+ |
Jean Francois Puget, IBM | IBM Machine Learning Launch 2017
>> Announcer: Live from New York, it's theCUBE, covering the IBM machine learning launch event. Brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Alright, we're back. Jean Francois Puget is here, he's the distinguished engineer for machine learning and optimization at IBM analytics, CUBE alum. Good to see you again. >> Yes. >> Thanks very much for coming on, big day for you guys. >> Jean Francois: Indeed. >> It's like giving birth every time you guys give one of these products. We saw you a little bit in the analyst meeting, pretty well attended. Give us the highlights from your standpoint. What are the key things that we should be focused on in this announcement? >> For most people, machine learning equals machine learning algorithms. Algorithms, when you look at newspapers or blogs, social media, it's all about algorithms. Our view that, sure, you need algorithms for machine learning, but you need steps before you run algorithms, and after. So before, you need to get data, to transform it, to make it usable for machine learning. And then, you run algorithms. These produce models, and then, you need to move your models into a production environment. For instance, you use an algorithm to learn from past credit card transaction fraud. You can learn models, patterns, that correspond to fraud. Then, you want to use those models, those patterns, in your payment system. And moving from where you run the algorithm to the operation system is a nightmare today, so our value is to automate what you do before you run algorithms, and then what you do after. That's our differentiator. >> I've had some folks in theCUBE in the past have said years ago, actually, said, "You know what, algorithms are plentiful." I think he made the statement, I remember my friend Avi Mehta, "Algorithms are free. "It's what you do with them that matters." >> Exactly, that's, I believe in autonomy that open source won for machine learning algorithms. Now the future is with open source, clearly. But it solves only a part of the problem you're facing if you want to action machine learning. So, exactly what you said. What do you do with the results of algorithm is key. And open source people don't care much about it, for good reasons. They are focusing on producing the best algorithm. We are focusing on creating value for our customers. It's different. >> In terms of, you mentioned open source a couple times, in terms of customer choice, what's your philosophy with regard to the various tooling and platforms for open source, how do you go about selecting which to support? >> Machine learning is fascinating. It's overhyped, maybe, but it's also moving very quickly. Every year there is a new cool stuff. Five years ago, nobody spoke about deep learning. Now it's everywhere. Who knows what will happen next year? Our take is to support open source, to support the top open source packages. We don't know which one will win in the future. We don't know even if one will be enough for all needs. We believe one size does not fit all, so our take is support a curated list of mid-show open source. We start with Spark ML for many reasons, but we won't stop at Spark ML. >> Okay, I wonder if we can talk use cases. Two of my favorite, well, let's just start with fraud. Fraud has become much, much better over the past certainly 10 years, but still not perfect. I don't know if perfection is achievable, but lot of false positives. How will machine learning affect that? Can we expect as consumers even better fraud detection in more real time? >> If we think of the full life cycle going from data to value, we will provide a better answer. We still use machine learning algorithm to create models, but a model does not tell you what to do. It will tell you, okay, for this credit card transaction coming, it has a high probability to be fraud. Or this one has a lower priority, uh, probability. But then it's up to the designer of the overall application to make decisions, so what we recommend is to use machine learning data prediction but not only, and then use, maybe, (murmuring). For instance, if your machine learning model tells you this is a fraud with a high probability, say 90%, and this is a customer you know very well, it's a 10-year customer you know very well, then you can be confident that it's a fraud. Then if next fraud tells you this is 70% probability, but it's a customer since one week. In a week, we don't know the customer, so the confidence we can get in machine learning should be low, and there you will not reject the transaction immediately. Maybe you will enter, you don't approve it automatically, maybe you will send a one-time passcode, or you enter a serve vendor system, but you don't reject it outright. Really, the idea is to use machine learning predictions as yet another input for making decisions. You're making decision informed on what you could learn from your past. But it's not replacing human decision-making. Our approach with IBM, you don't see IBM speak much about artificial intelligence in general because we don't believe we're here to replace humans. We're here to assist humans, so we say, augmented intelligence or assistance. That's the role we see for machine learning. It will give you additional data so that you make better decisions. >> It's not the concept that you object to, it's the term artificial intelligence. It's really machine intelligence, it's not fake. >> I started my career as a PhD in artificial intelligence, I won't say when, but long enough. At that time, there were already promise that we have Terminator in the next decade and this and that. And the same happened in the '60s, or it was after the '60s. And then, there is an AI winter, and we have a risk here to have an AI winter because some people are just raising red flags that are not substantiated, I believe. I don't think that technology's here that we can replace human decision-making altogether any time soon, but we can help. We can certainly make some proficient, more efficient, more productive with machine learning. >> Having said that, there are a lot of cognitive functions that are getting replaced, maybe not by so-called artificial intelligence, but certainly by machines and automation. >> Yes, so we're automating a number of things, and maybe we won't need to have people do quality check and just have an automated vision system detect defects. Sure, so we're automating more and more, but this is not new, it has been going on for centuries. >> Well, the list evolved. So, what can humans do that machines can't, and how would you expect that to change? >> We're moving away from IMB machine learning, but it is interesting. You know, each time there is a capacity that a machine that will automate, we basically redefine intelligence to exclude it, so you know. That's what I foresee. >> Yeah, well, robots a while ago, Stu, couldn't climb stairs, and now, look at that. >> Do we feel threatened because a robot can climb a stair faster than us? Not necessarily. >> No, it doesn't bother us, right. Okay, question? >> Yeah, so I guess, bringing it back down to the solution that we're talking about today, if I now am doing, I'm doing the analytics, the machine learning on the mainframe, how do we make sure that we don't overrun and blow out all our MIPS? >> We recommend, so we are not using the mainframe base compute system. We recommend using ZIPS, so additional calls to not overload, so it's a very important point. We claim, okay, if you do everything on the mainframe, you can learn from operational data. You don't want to disturb, and you don't want to disturb takes a lot of different meanings. One that you just said, you don't want to slow down your operation processings because you're going to hurt your business. But you also want to be careful. Say we have a payment system where there is a machine learning model predicting fraud probability, a part of the system. You don't want a young bright data scientist decide that he had a great idea, a great model, and he wants to push his model in production without asking anyone. So you want to control that. That's why we insist, we are providing governance that includes a lot of things like keeping track of how models were created from which data sets, so lineage. We also want to have access control and not allow anyone to just deploy a new model because we make it easy to deploy, so we want to have a role-based access and only someone someone with some executive, well, it depends on the customer, but not everybody can update the production system, and we want to support that. And that's something that differentiates us from open source. Open source developers, they don't care about governance. It's not their problem, but it is our customer problem, so this solution will come with all the governance and integrity constraints you can expect from us. >> Can you speak to, first solution's going to be on z/OS, what's the roadmap look like and what are some of those challenges of rolling this out to other private cloud solutions? >> We are going to shape this quarter IBM machine learning for Z. It starts with Spark ML as a base open source. This is not, this is interesting, but it's not all that is for machine learning. So that's how we start. We're going to add more in the future. Last week we announced we will shape Anaconda, which is a major distribution for Python ecosystem, and it includes a number of machine learning open source. We announced it for next quarter. >> I believe in the press release it said down the road things like TensorFlow are coming, H20. >> But Anaconda will announce for next quarter, so we will leverage this when it's out. Then indeed, we have a roadmap to include major open source, so major open source are the one from Anaconda (murmuring), mostly. Key deep learning, so TensorFlow and probably one or two additional, we're still discussing. One that I'm very keen on, it's called XGBoost in one word. People don't speak about it in newspapers, but this is what wins all Kaggle competitions. Kaggle is a machine learning competition site. When I say all, all that are not imagery cognition competitions. >> Dave: And that was ex-- >> XGBoost, X-G-B-O-O-S-T. >> Dave: XGBoost, okay. >> XGBoost, and it's-- >> Dave: X-ray gamma, right? >> It's really a package. When I say we don't know which package will win, XGBoost was introduced a year ago also, or maybe a bit more, but not so long ago, and now, if you have structure data, it is the best choice today. It's a really fast-moving, but so, we will support mid-show deep learning package and mid-show classical learning package like the one from Anaconda or XGBoost. The other thing we start with Z. We announced in the analyst session that we will have a power version and a private cloud, meaning XTC69X version as well. I can't tell you when because it's not firm, but it will come. >> And in public cloud as well, I guess we'll, you've got components in the public cloud today like the Watson Data Platform that you've extracted and put here. >> We have extracted part of the testing experience, so we've extracted notebooks and a graphical tool called ModelBuilder from DSX as part of IBM machine learning now, and we're going to add more of DSX as we go. But the goal is to really share code and function across private cloud and public cloud. As Rob Thomas defined it, we want with private cloud to offer all the features and functionality of public cloud, except that it would run inside a firewall. We are really developing machine learning and Watson machine learning on a command code base. It's an internal open source project. We share code, and then, we shape on different platform. >> I mean, you haven't, just now, used the word hybrid. Every now and then IBM does, but do you see that so-called hybrid use case as viable, or do you see it more, some workloads should run on prem, some should run in the cloud, and maybe they'll never come together? >> Machine learning, you basically have to face, one is training and the other is scoring. I see people moving training to cloud quite easily, unless there is some regulation about data privacy. But training is a good fit for cloud because usually you need a large computing system but only for limited time, so elasticity's great. But then deployment, if you want to score transaction in a CICS transaction, it has to run beside CICS, not cloud. If you want to score data on an IoT gateway, you want to score other gateway, not in a data center. I would say that may not be what people think first, but what will drive really the split between public cloud, private, and on prem is where you want to apply your machine learning models, where you want to score. For instance, smart watches, they are switching to gear to fit measurement system. You want to score your health data on the watch, not in the internet somewhere. >> Right, and in that CICS example that you gave, you'd essentially be bringing the model to the CICS data, is that right? >> Yes, that's what we do. That's a value of machine learning for Z is if you want to score transactions happening on Z, you need to be running on Z. So it's clear, mainframe people, they don't want to hear about public cloud, so they will be the last one moving. They have their reasons, but they like mainframe because it ties really, really secure and private. >> Dave: Public cloud's a dirty word. >> Yes, yes, for Z users. At least that's what I was told, and I could check with many people. But we know that in general the move is for public cloud, so we want to help people, depending on their journey, of the cloud. >> You've got one of those, too. Jean Francois, thanks very much for coming on theCUBE, it was really a pleasure having you back. >> Thank you. >> You're welcome. Alright, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from the Waldorf Astoria. IBM's machine learning announcement, be right back. (electronic keyboard music)
SUMMARY :
Brought to you by IBM. Good to see you again. on, big day for you guys. What are the key things that we and then what you do after. "It's what you do with them that matters." So, exactly what you said. but we won't stop at Spark ML. the past certainly 10 years, so that you make better decisions. that you object to, that we have Terminator in the next decade cognitive functions that and maybe we won't need to and how would you expect that to change? to exclude it, so you know. and now, look at that. Do we feel threatened because No, it doesn't bother us, right. and you don't want to disturb but it's not all that I believe in the press release it said so we will leverage this when it's out. and now, if you have structure data, like the Watson Data Platform But the goal is to really but do you see that so-called is where you want to apply is if you want to score so we want to help people, depending on it was really a pleasure having you back. from the Waldorf Astoria.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jean Francois | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
10-year | QUANTITY | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Avi Mehta | PERSON | 0.99+ |
New York | LOCATION | 0.99+ |
Anaconda | ORGANIZATION | 0.99+ |
70% | QUANTITY | 0.99+ |
Jean Francois Puget | PERSON | 0.99+ |
next year | DATE | 0.99+ |
Two | QUANTITY | 0.99+ |
Last week | DATE | 0.99+ |
next quarter | DATE | 0.99+ |
90% | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
one-time | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Five years ago | DATE | 0.99+ |
one word | QUANTITY | 0.99+ |
CICS | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
a year ago | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
next decade | DATE | 0.98+ |
one week | QUANTITY | 0.98+ |
first solution | QUANTITY | 0.98+ |
XGBoost | TITLE | 0.98+ |
a week | QUANTITY | 0.97+ |
Spark ML | TITLE | 0.97+ |
'60s | DATE | 0.97+ |
ModelBuilder | TITLE | 0.96+ |
one size | QUANTITY | 0.96+ |
One | QUANTITY | 0.95+ |
first | QUANTITY | 0.94+ |
Watson Data Platform | TITLE | 0.93+ |
each time | QUANTITY | 0.93+ |
Kaggle | ORGANIZATION | 0.92+ |
Stu | PERSON | 0.91+ |
this quarter | DATE | 0.91+ |
DSX | TITLE | 0.89+ |
XGBoost | ORGANIZATION | 0.89+ |
Waldorf Astoria | ORGANIZATION | 0.86+ |
Spark ML. | TITLE | 0.85+ |
z/OS | TITLE | 0.82+ |
years | DATE | 0.8+ |
centuries | QUANTITY | 0.75+ |
10 years | QUANTITY | 0.75+ |
DSX | ORGANIZATION | 0.72+ |
Terminator | TITLE | 0.64+ |
XTC69X | TITLE | 0.63+ |
IBM Machine Learning Launch 2017 | EVENT | 0.63+ |
couple times | QUANTITY | 0.57+ |
machine learning | EVENT | 0.56+ |
X | TITLE | 0.56+ |
Watson | TITLE | 0.55+ |
these products | QUANTITY | 0.53+ |
-G-B | COMMERCIAL_ITEM | 0.53+ |
H20 | ORGANIZATION | 0.52+ |
TensorFlow | ORGANIZATION | 0.5+ |
theCUBE | ORGANIZATION | 0.49+ |
CUBE | ORGANIZATION | 0.37+ |
Bryan Smith, Rocket Software - IBM Machine Learning Launch - #IBMML - #theCUBE
>> Announcer: Live from New York, it's theCUBE, covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Welcome back to New York City, everybody. We're here at the Waldorf Astoria covering the IBM Machine Learning Launch Event, bringing machine learning to the IBM Z. Bryan Smith is here, he's the vice president of R&D and the CTO of Rocket Software, powering the path to digital transformation. Bryan, welcome to theCUBE, thanks for coming on. >> Thanks for having me. >> So, Rocket Software, Waltham, Mass. based, close to where we are, but a lot of people don't know about Rocket, so pretty large company, give us the background. >> It's been around for, this'll be our 27th year. Private company, we've been a partner of IBM's for the last 23 years. Almost all of that is in the mainframe space, or we focused on the mainframe space, I'll say. We have 1,300 employees, we call ourselves Rocketeers. It's spread around the world. We're really an R&D focused company. More than half the company is engineering, and it's spread across the world on every continent and most major countries. >> You're esstenially OEM-ing your tools as it were. Is that right, no direct sales force? >> About half, there are different lenses to look at this, but about half of our go-to-market is through IBM with IBM-labeled, IBM-branded products. We've always been, for the side of products, we've always been the R&D behind the products. The partnership, though, has really grown. It's more than just an R&D partnership now, now we're doing co-marketing, we're even doing some joint selling to serve IBM mainframe customers. The partnership has really grown over these last 23 years from just being the guys who write the code to doing much more. >> Okay, so how do you fit in this announcement. Machine learning on Z, where does Rocket fit? >> Part of the announcement today is a very important piece of technology that we developed. We call it data virtualization. Data virtualization is really enabling customers to open their mainframe to allow the data to be used in ways that it was never designed to be used. You might have these data structures that were designed 10, 20, even 30 years ago that were designed for a very specific application, but today they want to use it in a very different way, and so, the traditional path is to take that data and copy it, to ETL it someplace else they can get some new use or to build some new application. What data virtualization allows you to do is to leave that data in place but access it using APIs that developers want to use today. They want to use JSON access, for example, or they want to use SQL access. But they want to be able to do things like join across IMS, DB2, and VSAM all with a single query using an SQL statement. We can do that relational databases and non-relational databases. It gets us out of this mode of having to copy data into some other data store through this ETL process, access the data in place, we call it moving the applications or the analytics to the data versus moving the data to the analytics or to the applications. >> Okay, so in this specific case, and I have said several times today, as Stu has heard me, two years ago IBM had a big theme around the z13 bringing analytics and transactions together, this sort of extends that. Great, I've got this transaction data that lives behind a firewall somewhere. Why the mainframe, why now? >> Well, I would pull back to where I said where we see more companies and organizations wanting to move applications and analytics closer to the data. The data in many of these large companies, that core business-critical data is on the mainframe, and so, being able to do more real time analytics without having to look at old data is really important. There's this term data gravity. I love the visual that presents in my mind that you have these different masses, these different planets if you will, and the biggest, massivest planet in that solar system really is the data, and so, it's pulling the smaller satellites if you will into this planet or this star by way of gravity because data is, data's a new currency, data is what the companies are running on. We're helping in this announcement with being able to unlock and open up all mainframe data sources, even some non-mainframe data sources, and using things like Spark that's running on the platform, that's running on z/OS to access that data directly without having to write any special programming or any special code to get to all their data. >> And the preferred place to run all that data is on the mainframe obviously if you're a mainframe customer. One of the questions I guess people have is, okay, I get that, it's the transaction data that I'm getting access to, but if I'm bringing transaction and analytic data together a lot of times that analytic data might be in social media, it might be somewhere else not on the mainframe. How do envision customers dealing with that? Do you have tooling them to do that? >> We do, so this data virtualization solution that I'm talking about is one that is mainframe resident, but it can also access other data sources. It can access DB2 on Linux Windows, it can access Informix, it can access Cloudant, it can access Hadoop through IBM's BigInsights. Other feeds like Twitter, like other social media, it can pull that in. The case where you'd want to do that is where you're trying to take that data and integrate it with a massive amount of mainframe data. It's going to be much more highly performant by pulling this other small amount of data into, next to that core business data. >> I get the performance and I get the security of the mainframe, I like those two things, but what about the economics? >> Couple of things. One, IBM when they ported Spark to z/OS, they did it the right way. They leveraged the architecture, it wasn't just a simple port of recompiling a bunch of open source code from Apache, it was rewriting it to be highly performant on the Z architecture, taking advantage of specialty engines. We've done the same with the data virtualization component that goes along with that Spark on z/OS offering that also leverages the architecture. We actually have different binaries that we load depending on which architecture of the machine that we're running on, whether it be a z9, an EC12, or the big granddaddy of a z13. >> Bryan, can you speak the developers? I think about, you're talking about all this mobile and Spark and everything like that. There's got to be certain developers that are like, "Oh my gosh, there's mainframe stuff. "I don't know anything about that." How do you help bridge that gap between where it lives in the tools that they're using? >> The best example is talking about embracing this API economy. And so, developers really don't care where the stuff is at, they just want it to be easy to get to. They don't have to code up some specific interface or language to get to different types of data, right? IBM's done a great job with the z/OS Connect in opening up the mainframe to the API economy with ReSTful interfaces, and so with z/OS Connect combined with Rocket data virtualization, you can come through that z/OS Connect same path using all those same ReSTful interfaces pushing those APIs out to tools like Swagger, which the developers want to use, and not only can you get to the applications through z/OS Connect, but we're a service provider to z/OS Connect allowing them to also get to every piece of data using those same ReSTful APIs. >> If I heard you correctly, the developer doesn't need to even worry about that it's on mainframe or speak mainframe or anything like that, right? >> The goal is that they never do. That they simply see in their tool-set, again like Swagger, that they have data as well as different services that they can invoke using these very straightforward, simple ReSTful APIs. >> Can you speak to the customers you've talked to? You know, there's certain people out in the industry, I've had this conversation for a few years at IBM shows is there's some part of the market that are like, oh, well, the mainframe is this dusty old box sitting in a corner with nothing new, and my experience has been the containers and cool streaming and everything like that, oh well, you know, mainframe did virtualization and Linux and all these things really early, decades ago and is keeping up with a lot of these trends with these new type of technologies. What do you find in the customers that, how much are they driving forward on new technologies, looking for that new technology and being able to leverage the assets that they have? >> You asked a lot of questions there. The types of customers certainly financial and insurance are the big two, but that doesn't mean that we're limited and not going after retail and helping governments and manufacturing customers as well. What I find is talking with them that there's the folks who get it and the folks who don't, and the folks who get it are the ones who are saying, "Well, I want to be able "to embrace these new technologies," and they're taking things like open source, they're looking at Spark, for example, they're looking at Anaconda. Last week, we just announced at the Anaconda Conference, we stepped on stage with Continuum, IBM, and we, Rocket, stood up there talking about this partnership that we formed to create this ecosystem because the development world changes very, very rapidly. For a while, all the rage was JDBC, or all the rage was component broker, and so today it's Spark and Anaconda are really in the forefront of developers' minds. We're constantly moving to keep up with developers because that's where the action's happening. Again, they don't care where the data is housed as long as you can open that up. We've been playing with this concept that came up from some research firm called two-speed IT where you have maybe your core business that has been running for years, and it's designed to really be slow-moving, very high quality, it keeps everything running today, but they want to embrace some of their new technologies, they want to be able to roll out a brand-new app, and they want to be able to update that multiple times a week. And so, this two-speed IT says, you're kind of breaking 'em off into two separate teams. You don't have to take your existing infrastructure team and say, "You must embrace every Agile "and every DevOps type of methodology." What we're seeing customers be successful with is this two-speed IT where you can fracture these two, and now you need to create some nice integration between those two teams, so things like data virtualization really help with that. It opens up and allows the development teams to very quickly access those assets on the mainframe in this case while allowing those developers to very quickly crank out an application where quality is not that important, where being very quick to respond and doing lots of AB testing with customers is really critical. >> Waterfall still has its place. As a company that predominately, or maybe even exclusively is involved in mainframe, I'm struck by, it must've been 2008, 2009, Paul Maritz comes in and he says VMWare our vision is to build the software mainframe. And of course the world said, "Ah, that's, mainframe's dead," we've been hearing that forever. In many respects, I accredit the VMWare, they built sort of a form of software mainframe, but now you hear a lot of talk, Stu, about going back to bare metal. You don't hear that talk on the mainframe. Everything's virtualized, right, so it's kind of interesting to see, and IBM uses the language of private cloud. The mainframe's, we're joking, the original private cloud. My question is you're strategy as a company has been always focused on the mainframe and going forward I presume it's going to continue to do that. What's your outlook for that platform? >> We're not exclusively by the mainframe, by the way. We're not, we have a good mix. >> Okay, it's overstating that, then. It's half and half or whatever. You don't talk about it, 'cause you're a private company. >> Maybe a little more than half is mainframe-focused. >> Dave: Significant. >> It is significant. >> You've got a large of proportion of the company on mainframe, z/OS. >> So we're bullish on the mainframe. We continue to invest more every year. We invest, we increase our investment every year, and so in a software company, your investment is primarily people. We increase that by double digits every year. We have license revenue increases in the double digits every year. I don't know many other mainframe-based software companies that have that. But I think that comes back to the partnership that we have with IBM because we are more than just a technology partner. We work on strategic projects with IBM. IBM will oftentimes stand up and say Rocket is a strategic partner that works with us on hard problem-solving customers issues every day. We're bullish, we're investing more all the time. We're not backing away, we're not decreasing our interest or our bets on the mainframe. If anything, we're increasing them at a faster rate than we have in the past 10 years. >> And this trend of bringing analytics and transactions together is a huge mega-trend, I mean, why not do it on the mainframe? If the economics are there, which you're arguing that in many use cases they are, because of the value component as well, then the future looks pretty reasonable, wouldn't you say? >> I'd say it's very, very bright. At the Anaconda Conference last week, I was coming up with an analogy for these folks. It's just a bunch of data scientists, right, and during most of the breaks and the receptions, they were just asking questions, "Well, what is a mainframe? "I didn't know that we still had 'em, "and what do they do?" So it was fun to educate them on that. But I was trying to show them an analogy with data warehousing where, say that in the mid-'90s it was perfectly acceptable to have a separate data warehouse separate from your transaction system. You would copy all this data over into the data warehouse. That was the model, right, and then slowly it became more important that the analytics or the BI against that data warehouse was looking at more real time data. So then it became more efficiencies and how do we replicate this faster, and how do we get closer to, not looking at week-old data but day-old data? And so, I explained that to them and said the days of being able to do analytics against old data that's copied are going away. ETL, we're also bullish to say that ETL is dead. ETL's future is very bleak. There's no place for it. It had its time, but now it's done because with data virtualization you can access that data in place. I was telling these folks as they're talking about, these data scientists, as they're talking about how they look at their models, their first step is always ETL. And so I told them this story, I said ETL is dead, and they just look at me kind of strange. >> Dave: Now the first step is load. >> Yes, there you go, right, load it in there. But having access from these platforms directly to that data, you don't have to worry about any type of a delay. >> What you described, though, is still common architecture where you've got, let's say, a Z mainframe, it's got an InfiniBand pipe to some exit data warehouse or something like that, and so, IBM's vision was, okay, we can collapse that, we can simplify that, consolidate it. SAP with HANA has a similar vision, we can do that. I'm sure Oracle's got their vision. What gives you confidence in IBM's approach and legs going forward? >> Probably due to the advances that we see in z/OS itself where handling mixed workloads, which it's just been doing for many of the 50 years that it's been around, being able to prioritize different workloads, not only just at the CPU dispatching, but also at the memory usage, also at the IO, all the way down through the channel to the actual device. You don't see other operating systems that have that level of granularity for managing mixed workloads. >> In the security component, that's what to me is unique about this so-called private cloud, and I say, I was using that software mainframe example from VMWare in the past, and it got a good portion of the way there, but it couldn't get that last mile, which is, any workload, any application with the performance and security that you would expect. It's just never quite got there. I don't know if the pendulum is swinging, I don't know if that's the accurate way to say it, but it's certainly stabilized, wouldn't you say? >> There's certainly new eyes being opened every day to saying, wait a minute, I could do something different here. Muscle memory doesn't have to guide me in doing business the way I have been doing it before, and that's this muscle memory I'm talking about of this ETL piece. >> Right, well, and a large number of workloads in mainframe are running Linux, right, you got Anaconda, Spark, all these modern tools. The question you asked about developers was right on. If it's independent or transparent to developers, then who cares, that's the key. That's the key lever this day and age is the developer community. You know it well. >> That's right. Give 'em what they want. They're the customers, they're the infrastructure that's being built. >> Bryan, we'll give you the last word, bumper sticker on the event, Rocket Software, your partnership, whatever you choose. >> We're excited to be here, it's an exciting day to talk about machine learning on z/OS. I say we're bullish on the mainframe, we are, we're especially bullish on z/OS, and that's what this even today is all about. That's where the data is, that's where we need the analytics running, that's where we need the machine learning running, that's where we need to get the developers to access the data live. >> Excellent, Bryan, thanks very much for coming to theCUBE. >> Bryan: Thank you. >> And keep right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from New York City. Be right back. (electronic keyboard music)
SUMMARY :
Event, brought to you by IBM. powering the path to close to where we are, but and it's spread across the Is that right, no direct sales force? from just being the Okay, so how do you or the analytics to the data versus Why the mainframe, why now? data is on the mainframe, is on the mainframe obviously It's going to be much that also leverages the architecture. There's got to be certain They don't have to code up some The goal is that they never do. and my experience has been the containers and the folks who get it are the ones who You don't hear that talk on the mainframe. the mainframe, by the way. It's half and half or whatever. half is mainframe-focused. of the company on mainframe, z/OS. in the double digits every year. the days of being able to do analytics directly to that data, you don't have it's got an InfiniBand pipe to some for many of the 50 years I don't know if that's the in doing business the way I is the developer community. They're the customers, bumper sticker on the the developers to access the data live. very much for coming to theCUBE. This is theCUBE, we're
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Bryan | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Paul Maritz | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Rocket Software | ORGANIZATION | 0.99+ |
50 years | QUANTITY | 0.99+ |
2009 | DATE | 0.99+ |
New York City | LOCATION | 0.99+ |
2008 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
27th year | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
first step | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
JDBC | ORGANIZATION | 0.99+ |
1,300 employees | QUANTITY | 0.99+ |
Continuum | ORGANIZATION | 0.99+ |
Last week | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
Anaconda | ORGANIZATION | 0.99+ |
two things | QUANTITY | 0.99+ |
mid-'90s | DATE | 0.99+ |
Spark | TITLE | 0.99+ |
Rocket | ORGANIZATION | 0.99+ |
z/OS Connect | TITLE | 0.99+ |
10 | DATE | 0.99+ |
two teams | QUANTITY | 0.99+ |
Linux | TITLE | 0.99+ |
today | DATE | 0.99+ |
two-speed | QUANTITY | 0.99+ |
two separate teams | QUANTITY | 0.99+ |
Z. Bryan Smith | PERSON | 0.99+ |
SQL | TITLE | 0.99+ |
Bryan Smith | PERSON | 0.99+ |
z/OS | TITLE | 0.98+ |
two years ago | DATE | 0.98+ |
ReSTful | TITLE | 0.98+ |
Swagger | TITLE | 0.98+ |
last week | DATE | 0.98+ |
decades ago | DATE | 0.98+ |
DB2 | TITLE | 0.98+ |
HANA | TITLE | 0.97+ |
IBM Machine Learning Launch Event | EVENT | 0.97+ |
Anaconda Conference | EVENT | 0.97+ |
Hadoop | TITLE | 0.97+ |
Spark | ORGANIZATION | 0.97+ |
One | QUANTITY | 0.97+ |
Informix | TITLE | 0.96+ |
VMWare | ORGANIZATION | 0.96+ |
More than half | QUANTITY | 0.95+ |
z13 | COMMERCIAL_ITEM | 0.95+ |
JSON | TITLE | 0.95+ |
Steven Astorino, IBM - IBM Machine Learning Launch - #IBMML - #theCUBE
>> Announcer: Live from New York, it's the CUBE. Covering the IBM Machine Learning Launch Event. Brought to you by IBM. Now here are your hosts Dave Vellante and Stu Miniman. >> Welcome back to New York City everybody the is The CUBE the leader in live tech coverage. We're here at the IBM Machine Learning Launch Event, bringing machine learning to the Z platform. Steve Astorino is here, he's the VP for Development for the IBM Private Cloud Analytics Platform. Steve, good to see you, thanks for coming on. >> Hi how are you? >> Good thanks, how you doing? >> Good, good. >> Down from Toronto. So this is your baby. >> It is >> This product right? >> It is. So you developed this thing in the labs and now you point it at platforms. So talk about, sort of, what's new here today specifically. >> So today we're launching and announcing our machine learning, our IBM machine learning product. It's really a new solution that allows, obviously, machine learning to be automated and for data scientists and line of business, business analysts to work together and create models to be able to apply machine learning, do predictions and build new business models in the end. To provide better services for their customers. >> So how is it different than what we knew as Watson machine learning? Is it the same product pointed at Z or is it different? >> It's a great question. So Watson is our cloud solution, it's our cloud brand, so we're building something on private cloud for the private cloud customers and enterprises. Same product built for private cloud as opposed to public cloud. Think of it more as a branding and Watson is sort of a bigger solution set in the cloud. >> So it's your product, your baby, what's so great about it? How does it compare with what else is in the marketplace? Why should we get excited about this product? >> Actually, a bunch of things. It's great for many angles, what we're trying to do, obviously it's based on open source, it's an open platform just like what we've been talking about with the other products that we've been launching over the last six months to a year. It's based on Spark, you know we're bringing in all the open source technology, to your fingertips. As well as we're integrating with IBM's top-notch research and capabilities that we're driving in-house, integrating them together and being able to provide one experience to be able to do machine learning. That's at a very high level, also if you think about it there's three things that we're calling out, there's freedom, basically being able to choose what tools you want to use, what environments you want to use, what language you want to use, whether it's Python, Scala, R, right there's productivity. So we really enable and make it simple to be productive and build these machine learning models and then an application developer can leverage and use within their application. The other one is trust. IBM is very well known for its enterprise level capabilities, whether it's governance, whether its trust of the data, how to manage the data, but also more importantly, we're creating something called The Feedback Loop which allows the models to stay current and the data scientists, the administrators, know when these models, for example, is degrading. To make sure it's giving you the right outcome. >> OK, so you mention it's built on Spark. When I think about the efforts to build a data pipeline I think I've got to ingest the data, I've got to explore, I've got to process it and clean it up and then I've got to ultimately serve whomever, the business. >> Right, Right. >> What pieces of that does Spark unify and simplify? >> So we leveraged Spark to able to, obviously for the analytics. When you're building a model you one, have your choice of tooling that you want to use, whether it's programmatic or not. That's one of the value propositions we're bringing forward. But then we create these models, we train them, we evaluate them, we leverage Spark for that. Then obviously, we're trying to bring the models where the data is. So one of the key value proposition is we operationalize these models very simply and quickly. Just at a click of a button you can say hey deploy this model now and we deploy it right on where the data is in this case we're launching it on mainframe first. So Spark on the mainframe, we're deploying the model there and you can score the model directly in Spark on the mainframe. That's a huge value add, get better performance. >> Right, okay, just in terms of differentiates from the competition, you're the only company I think, providing machine learning on Z, so. >> Definitely, definitely. >> That's pretty easy, but in terms of the capabilities that you have, how are you different from the competition? When you talk to clients and they say well what about this vendor or that vendor, how do you respond? >> So let me talk about one of the research technologies that we're launching as part of this called CADS, Cognitive Assistant for Data Scientists. This is a feature where essentially, it takes the complexity out of building a model where you tell it, or you give it the algorithms you want to work with and the CADS assistant basically returns which one is the best which one performs the best. Now, all of a sudden you have the best model to use without having to go and spend, potentially weeks, on figuring out which one that is. So that's a huge value proposition. >> So automating the choice of the algorithm, an algorithm to choose the algorithm. what have you found in terms of it's level of accuracy in terms of the best fit? >> Actually it works really well. And in fact we have a live demo that we'll be doing today, where it shows CADS coming back with a 90% accurate model in terms of the data that we're feeding it and outcome it will give you in terms of what model to use. It works really well. >> Choosing an algorithm is not like choosing a programming language right, this bias if I like Scala or R or whatever, Java, Python okay fine, I've got skill sets associated with that. Algorithm choice is one that's more scientific, I guess? >> It is more scientific, it's based on the algorithm, the statistical algorithm and the selection of the algorithm or the model itself is a huge deal because that's where you're going to drive your business. If you're offering a new service that's where you're providing that solution from, so it has to be the right algorithm the right model so that you can build that more efficiently. >> What are you seeing as the big barriers to customer adopting machine learning? >> I think everybody, I mean it's the hottest thing around right now, everybody wants machine learning it's great, it's a huge buzz. The hardest thing is they know they want it, but don't really know how to apply it into their own environment, or they think they don't have the right skills. So, that actually one of the things that we're going after, to be able to enable them to do that. We're for example working on building different industry-based examples to showcase here's how you would use it in your environment. So last year when we did the Watson data platform we did a retail example, now today we're doing a finance example, a churn example with customers potentially churning and leaving a bank. So we're looking at all those different scenarios, and then also we're creating hubs, locations we're launching today also, announcing today, actually Dinesh will be doing that. There is a hub in Silicon Valley where it would allow customers to come in and work with us and we help them figure out how they can leverage machine learning. It is a great way to interact with our customers and be able to do that. >> So Steve nirvana is, and you gave that example, the retail example in September, when you launched Watson Data Platform, the nirvana in this world is you can use data, and maybe put in an offer, or save a patients life or effect an outcome in real time. So the retail example was just that. If I recall, you were making an offer real-time it was very fast, live demo it wasn't just a fakey. The example on churn, is the outcome is to effect that customer's decisions so that they don't leave? Is that? >> Yes, pretty much, Essentially what we are looking at is , we're using live data, we're using social media data bringing in Twitter sentiment about a particular individual for example, and try to predict if this customer, if this user is happy with the service that they are getting or not. So for example, people will go and socialize, oh I went to this bank and I hated this experience, or they really got me upset or whatever. Bringing that data from Twitter, so open data and merging it with the bank's data, banks have a lot of data they can leverage and monetize. And then making an assessment using machine learning to predict is this customer going to leave me or not? What probability do they have that they are going to leave me or not based on the machine learning model. The example or scenario we are using now, if we think they are going to leave us, we're going to make special offers to them. It's a way to enhance your service for those customers. So that they don't leave you. >> So operationalizing that would be a call center has some kind on dashboard that says red, green, yellow, boom heres an offer that you should make, and that's done in near real time. In fact, real time is before you lose the customer. That's as good a definition as anything else. >> But it's actually real-time, and when we call it the scoring of the data, so as the data transaction is coming in, you can actually make that assessment in real time, it's called in-transaction scoring where you can make that right on the fly and be able to determine is this customer at risk or not. And then be able to make smarter decisions to that service you are providing on whether you want to offer something better. >> So is the primary use case for this those streams those areas I'm getting you know, whether it be, you mentioned Twitter data, maybe IoT, you're getting can we point machine learning at just archives of data and things written historically or is it mostly the streams? >> It's both of course and machine learning is based on historical data right and that's hot the models are built. The more accurate or more data you have on historical data, the more accurate that you picked the right model and you'll get the better predictition of what's going to happen next time. So it's exactly, it's both. >> How are you helping customers with that initial fit? My understanding is how big of a data set do you need, Do I have enough to really model where I have, how do you help customers work through that? >> So my opinion is obvious to a certain extent, the more data you have as your sample set, the more accurate your model is going to be. So if we have one that's too small, your prediction is going to be inaccurate. It really depends on the scenario, it depends on how many features or the fields you have you're looking at within your dataset. It depends on many things, and it's variable depending on the scenario, but in general you want to have a good chunk of historical data that you can build expertise on right. >> So you've worked on both the Watson Services in the public cloud and now this private cloud, is there any differentiation or do you see significant use case different between those two or is it just kind of where the data lives and we're going to do similar activities there. >> So it is similar. At the end of the day, we're trying to provide similar products on both public cloud and private cloud. But for this specific case, we're launching it on mainframe that's a different angle at this. But we know that's where the biggest banks, the insurance companies, the biggest retailers in the world are, and that's where the biggest transactions are running and we really want to help them leverage machine learning and get their services to the next level. I think it's going to be a huge differentiator for them. >> Steve, you gave an example before of Twitter sentiment data. How would that fit in to this announcement. So I've got this ML on Z and I what API into the twitter data? How does that sort of all get adjusted and consolidated? >> So we allow hooks to be able to access data from different sources, bring in data. That is part of the ingest process. Then once you have that data there into data frames into the machine learning product, now you're feeding into a statistical algorithm to figure out what the best prediction is going to be, and the best model's going to be. >> I have a slide that you guys are sharing on the data scientist workflow. It starts with ingestion, selection, preparation, generation, transform, model. It's a complex set of tasks, and typically historically, at least in the last fIve or six years, different tools to de each of those. And not just different tools, multiples of different tools. That you had to cobble together. If I understand it correctly the Watson Data Platform was designed to really consolidate that and simplify that, provide collaboration tools for different personas, so my question is this. Because you were involved in that product as well. And I was excited about it when I saw it, I talked to people about it, sometimes I hear the criticism of well IBM just took a bunch of legacy products threw them together, threw and abstraction layer on top and is now going to wrap a bunch of services around it. Is that true? >> Absolutely not. Actually, you may have heard a while back IBM had made a big shift into design first design methodology. So we started with the Watson Data Platform, the Data Science Experience, they started with design first approach. We looked at this, we said what do we want the experience to be, for which persona do we want to target. Then we understood what we wanted the experience to be and then we leverage IBM analytics portfolio to be able to feed in and provide and integrate those services together to fit into that experience. So, its not a dumping ground for, I'll take this product, it's part of Watson Data Platform, not at all the case. It was the design first, and then integrate for that experience. >> OK, but there are some so-called legacy products in there, but you're saying you picked the ones that were relevant and then was there additional design done? >> There was a lot of work involved to take them from a traditional product, to be able to componentize, create a micro service architecture, I mean the whole works to be able to redesign it and fit into this new experience. >> So microservices architecture, runs on cloud, I think it only runs on cloud today right? >> Correct, correct. >> OK, maybe roadmap without getting too specific. What should we be paying attention to in the future? >> Right now we're doing our first release. Definitely we want to target any platform behind the firewall. So we don't have specific dates, but now we started with machine learning on a mainframe and we want to be able to target the other platforms behind the firewall and the private cloud environment. Definitely we should be looking at that. Our goal is to make, I talked about the feedback loop a little bit, so that is essentially once you deploy the model we actually look at that model you could schedule in a valuation, automatically, within the machine learning product. To be able to say, this model is still good enough. And if it's not we automatically flag it, and we look at the retraining process and redeployment process to make sure you always have the most up to date model. So this is truly machine learning where it requires very little to no intervention from a human. We're going to continue down that path and continue that automation in providing those capabilities so there's a bigger roadmap, there's a lot of things we're looking at. >> We've sort of looked at our big data analyst George Gilbert has talked about you had batch and you had interactive, not the sort of emergent workload is this continuous, streaming data. How do you see the adoption. First of all, is it a valid assertion? That there is a new class of workload, and then how do you see that adoption occurring? Is it going to be a dominant force over the next 10 years? >> Yeah, I think so. Like I said there is a huge buzz around machine learning in general and artificial intelligence, deep learning, all of these terms you hear about. I think as users and customers get more comfortable with understanding how they're going to leverage this in their enterprise. This real-time streaming of data and being able to do analytics on the fly and machine learning on the fly. It's a big deal and it will really helps them be more competitive in their own space with the services we're providing. >> OK Steve, thanks very much for coming on The CUBE. We'll give you the last word. The event, very intimate event a lot of customers coming in very shortly here in just a couple of hours. Give us the bumper sticker. >> All of that's very exciting, we're very excited, this is a big deal for us, that's why whenever IBM does a signature moment it's a big deal for us and we got something cool to talk about, we're very excited about that. Lot's of clients coming so there's an entire session this afternoon, which will be live streamed as well. So it's great, I think we have a differentiating product and we're already getting that feedback from our customers. >> Well congratulations, I love the cadence that you're on. We saw some announcements in September, we're here in February, I expect we're going to see more innovation coming out of your labs in Toronto, and cross IBM so thank you very much for coming on The CUBE. >> Thank you. >> You're welcome OK keep it right there everybody, we'll be back with our next guest right after this short break. This is The CUBE we're live from New York City. (energetic music)
SUMMARY :
Brought to you by IBM. for the IBM Private So this is your baby. and now you point it at platforms. and create models to be able for the private cloud the last six months to a year. the data, I've got to explore, So Spark on the mainframe, from the competition, you're the best model to use without So automating the of the data that we're feeding it Algorithm choice is one that's and the selection and be able to do that. the retail example in September, when you based on the machine learning model. boom heres an offer that you should make, and be able to determine on historical data, the more accurate the more data you have as your sample set, in the public cloud and and get their services to the next level. to this announcement. and the best model's going to be. and is now going to wrap a the experience to be, I mean the whole works attention to in the future? to make sure you always and then how do you see and machine learning on the fly. We'll give you the last word. So it's great, I think we and cross IBM so thank you very This is The CUBE we're
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Steve Astorino | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
September | DATE | 0.99+ |
Toronto | LOCATION | 0.99+ |
90% | QUANTITY | 0.99+ |
February | DATE | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
Scala | TITLE | 0.99+ |
New York City | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
Python | TITLE | 0.99+ |
ORGANIZATION | 0.99+ | |
two | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
R | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
first release | QUANTITY | 0.98+ |
three things | QUANTITY | 0.98+ |
IBM Machine Learning Launch Event | EVENT | 0.97+ |
one experience | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
Watson Data Platform | TITLE | 0.96+ |
first approach | QUANTITY | 0.95+ |
Watson | TITLE | 0.95+ |
Steve nirvana | PERSON | 0.94+ |
Watson Data Platform | TITLE | 0.93+ |
Spark | TITLE | 0.93+ |
six years | QUANTITY | 0.92+ |
First | QUANTITY | 0.91+ |
Watson Services | ORGANIZATION | 0.91+ |
this afternoon | DATE | 0.9+ |
first | QUANTITY | 0.89+ |
last six months | DATE | 0.89+ |
each | QUANTITY | 0.86+ |
#IBMML | TITLE | 0.82+ |
Astorino | PERSON | 0.77+ |
Dinesh | ORGANIZATION | 0.76+ |
CUBE | ORGANIZATION | 0.74+ |
next 10 years | DATE | 0.72+ |
Private Cloud Analytics Platform | TITLE | 0.71+ |
a year | QUANTITY | 0.65+ |
first design methodology | QUANTITY | 0.65+ |
of clients | QUANTITY | 0.62+ |
Watson | ORGANIZATION | 0.55+ |
Loop | OTHER | 0.48+ |
James Kobielus, IBM - IBM Machine Learning Launch - #IBMML - #theCUBE
>> [Announcer] Live from New York, it's the Cube. Covering the IBM Machine Learning Launch Event. Brought to you by IBM. Now here are your hosts Dave Vellante and Stu Miniman. >> Welcome back to New York City everybody, this is the CUBE. We're here live at the IBM Machine Learning Launch Event. Bringing analytics and transactions together on Z, extending an announcement that IBM made a couple years ago, sort of laid out that vision, and now bringing machine learning to the mainframe platform. We're here with Jim Kobielus. Jim is the Director of IBM's Community Engagement for Data Science and a long time CUBE alum and friend. Great to see you again James. >> Great to always be back here with you. Wonderful folks from the CUBE. You ask really great questions and >> Well thank you. >> I'm prepared to answer. >> So we saw you last week at Spark Summit so back to back, you know, continuous streaming, machine learning, give us the lay of the land from your perspective of machine learning. >> Yeah well machine learning very much is at the heart of what modern application developers build and that's really the core secret sauce in many of the most disruptive applications. So machine learning has become the core of, of course, what data scientists do day in and day out or what they're asked to do which is to build, essentially artificial neural networks that can process big data and find patterns that couldn't normally be found using other approaches. And then as Dinesh and Rob indicated a lot of it's for regression analysis and classification and the other core things that data scientists have been doing for a long time, but machine learning has come into its own because of the potential for great automation of this function of finding patterns and correlations within data sets. So today at the IBM Machine Learning Launch Event, and we've already announced it, IBM Machine Learning for ZOS takes that automation promised to the next step. And so we're real excited and there'll be more details today in the main event. >> One of the most funs I had, most fun I had last year, most fun interviews I had last year was with you, when we interviewed, I think it was 10 data scientists, rock star data scientists, and Dinesh had a quote, he said, "Machine learning is 20% fun, 80% elbow grease." And data scientists sort of echoed that last year. We spent 80% of our time wrangling data. >> [Jim] Yeah. >> It gets kind of tedious. You guys have made announcements to address that, is the needle moving? >> To some degree the needle's moving. Greater automation of data sourcing and preparation and cleansing is ongoing. Machine learning is being used for that function as well. But nonetheless there is still a lot of need in the data science, sort of, pipeline for a lot of manual effort. So if you look at the core of what machine learning is all about, it's supervised learning involves humans, meaning data scientists, to train their algorithms with data and so that involves finding the right data and then of course doing the feature engineering which is a very human and creative process. And then to be training the data and iterating through models to improve the fit of the machine learning algorithms to the data. In many ways there's still a lot of manual functions that need expertise of data scientists to do it right. There's a lot of ways to do machine learning wrong you know there's a lot of, as it were, tricks of the trade you have to learn just through trial and error. A lot of things like the new generation of things like generative adversarial models ride on machine learning or deep learning in this case, a multilayered, and they're not easy to get going and get working effectively the first time around. I mean with the first run of your training data set, so that's just an example of how, the fact is there's a lot of functions that can't be fully automated yet in the whole machine learning process, but a great many can in fact, especially data preparation and transformation. It's being automated to a great degree, so that data scientists can focus on the more creative work that involves subject matter expertise and really also application development and working with larger teams of coders and subject matter experts and others, to be able to take the machine learning algorithms that have been proved out, have been trained, and to dry them to all manner of applications to deliver some disruptive business value. >> James, can you expand for us a little bit this democratization of before it was not just data but now the machine learning, the analytics, you know, when we put these massive capabilities in the broader hands of the business analysts the business people themselves, what are you seeing your customers, what can they do now that they couldn't do before? Why is this such an exciting period of time for the leveraging of data analytics? >> I don't know that it's really an issue of now versus before. Machine learning has been around for a number of years. It's artificial neural networks at the very heart, and that got going actually in many ways in the late 50s and it steadily improved in terms of sophistication and so forth. But what's going on now is that machine learning tools have become commercialized and refined to a greater degree and now they're in a form in the cloud, like with IBM machine learning for the private cloud on ZOS, or Watson machine learning for the blue mixed public cloud. They're at a level of consumability that they've never been at before. With software as a service offering you just, you pay for it, it's available to you. If you're a data scientist you being doing work right away to build applications, derive quick value. So in other words, the time to value on a machine learning project continues to shorten and shorten, due to the consumability, the packaging of these capabilities and to cloud offerings and into other tools that are prebuilt to deliver success. That's what's fundamentally different now and it's just an ongoing process. You sort of see the recent parallels with the business intelligence market. 10 years ago BI was reporting and OLEP and so forth, was only for the, what we now call data scientists or the technical experts and all that area. But in the last 10 years we've seen the business intelligence community and the industry including IBM's tools, move toward more self service, interactive visualization, visual design, BI and predictive analytics, you know, through our cognos and SPSS portfolios. A similar dynamic is coming in to the progress of machine learning, the democratization, to use your term, the more self service model wherein everybody potentially will be able to be, to do machine learning, to build machine learning and deep learning models without a whole of university training. That day is coming and it's coming fairly rapidly. It's just a matter of the maturation of this technology in the marketplace. >> So I want to ask you, you're right, 1950s it was artificial neural networks or AI, sort of was invented I guess, the concept, and then in the late 70s and early 80s it was heavily hyped. It kind of died in the late 80s or in the 90s, you never heard about it even the early 2000s. Why now, why is it here now? Is it because IBM's putting so much muscle behind it? Is it because we have Siri? What is it that has enabled that? >> Well I wish that IBM putting muscle behind a technology can launch anything to success. And we've done a lot of things in that regard. But the thing is, if you look back at the historical progress of AI, I mean, it's older than me and you in terms of when it got going in the middle 50s as a passion or a focus of computer scientists. What we had for the last, most of the last half century is AI or expert systems that were built on having to do essentially programming is right, declared a rule defining how AI systems could process data whatever under various scenarios. That didn't prove scalable. It didn't prove agile enough to learn on the fly from the statistical patterns within the data that you're trying to process. For face recognition and voice recognition, pattern recognition, you need statistical analysis, you need something along the lines of an artificial neural network that doesn't have to be pre-programmed. That's what's new now about in the last this is the turn of this century, is that AI has become predominantly now focused not so much on declarative rules, expert systems of old, but statistical analysis, artificial neural networks that learn from the data. See the, in the long historical sweep of computing, we have three eras of computing. The first era before the second world war was all electromechanical computing devices like IBM's start of course, like everybody's, was in that era. The business logic was burned into the hardware as it were. The second era from the second world war really to the present day, is all about software, programming, it's COBAL, 4trans, C, Java, where the business logic has to be developed, coded by a cadre of programmers. Since the turn of this millennium and really since the turn of this decade, it's all moved towards the third era, which is the cognitive era, where you're learning the business rules automatically from the data itself, and that involves machine learning at its very heart. So most of what has been commercialized and most of what is being deployed in the real world working, successful AI, is all built on artificial neural networks and cognitive computing in the way that I laid out. Where, you still need human beings in the equation, it can't be completely automated. There's things like unsupervised learning that take the automation of machine learning to a greater extent, but you still have the bulk of machine learning is supervised learning where you have training data sets and you need experts, data scientists, to manage that whole process, that over time supervised learning is evolving towards who's going to label the training data sets, especially when you have so much data flooding in from the internet of things and social media and so forth. A lot of that is being outsourced to crowd sourcing environments in terms of the ongoing labeling of data for machine learning projects of all sorts. That trend will continue a pace. So less and less of the actual labeling of the data for machine learning will need to be manually coded by data scientists or data engineers. >> So the more data the better. See I would argue in the enablement pie. You're going to disagree with that which is good. Let's have a discussion [Jim Laughs]. In the enablement pie, I would say the profundity of Hadup was two things. One is I can leave data where it is and bring code to data. >> [Jim] Yeah. >> 5 megabytes of code to petabyte of data, but the second was the dramatic reduction in the cost to store more data, hence my statement of the more data the better, but you're saying, meh maybe not. Certainly for compliance and other things you might not want to have data lying around. >> Well it's an open issue. How much data do you actually need to find the patterns of interest to you, the correlations of interest to you? Sampling of your data set, 10% sample or whatever, in most cases that might be sufficient to find the correlations you're looking for. But if you're looking for some highly deepened rare nuances in terms of anomalies or outliers or whatever within your data set, you may only find those if you have a petabyte of data of the population of interest. So but if you're just looking for broad historical trends and to do predictions against broad trends, you may not need anywhere near that amount. I mean, if it's a large data set, you may only need five to 10% sample. >> So I love this conversation because people have been on the CUBE, Abi Metter for example said, "Dave, sampling is dead." Now a statistician said that's BS, no way. Of course it's not dead. >> Storage isn't free first of all so you can't necessarily save and process all the data. Compute power isn't free yet, memory isn't free yet, so forth so there's lots... >> You're working on that though. >> Yeah sure, it's asymptotically all moving towards zero. But the bottom line is if the underlying resources, including the expertise of your data scientists that's not for free, these are human beings who need to make a living. So you've got to do a lot of things. A, automate functions on the data science side so that your, these experts can radically improve their productivity. Which is why the announcement today of IBM machine learning is so important, it enables greater automation in the creation and the training and deployment of machine learning models. It is a, as Rob Thomas indicated, it's very much a multiplier of productivity of your data science teams, the capability we offer. So that's the core value. Because our customers live and die increasingly by machine learning models. And the data science teams themselves are highly inelastic in the sense that you can't find highly skilled people that easily at an affordable price if you're a business. And you got to make the most of the team that you have and help them to develop their machine learning muscle. >> Okay, I want to ask you to weigh in on one of Stu's favorite topics which is man versus machine. >> Humans versus mechanisms. Actually humans versus bots, let's, okay go ahead. >> Okay so, you know a lot of discussions, about, machines have always replaced humans for jobs, but for the first time it's really beginning to replace cognitive functions. >> [Jim] Yeah. >> What does that mean for jobs, for skill sets? The greatest, I love the comment, the greatest chess player in the world is not a machine. It's humans and machines, but what do you see in terms of the skill set shift when you talk to your data science colleagues in these communities that you're building? Is that the right way to think about it, that it's the creativity of humans and machines that will drive innovation going forward. >> I think it's symbiotic. If you take Watson, of course, that's a star case of a cognitive AI driven machine in the cloud. We use a Watson all the time of course in IBM. I use it all the time in my job for example. Just to give an example of one knowledge worker and how he happens to use AI and machine learning. Watson is an awesome search engine. Through multi-structure data types and in real time enabling you to ask a sequence of very detailed questions and Watson is a relevance ranking engine, all that stuff. What I've found is it's helped me as a knowledge worker to be far more efficient in doing my upfront research for anything that I might be working on. You see I write blogs and I speak and I put together slide decks that I present and so forth. So if you look at knowledge workers in general, AI as driving far more powerful search capabilities in the cloud helps us to eliminate a lot of the grunt work that normally was attended upon doing deep research into like a knowledge corpus that may be preexisting. And that way we can then ask more questions and more intelligent questions and really work through our quest for answers far more rapidly and entertain and rule out more options when we're trying to develop a strategy. Because we have all the data at our fingertips and we've got this expert resource increasingly in a conversational back and forth that's working on our behalf predictively to find what we need. So if you look at that, everybody who's a knowledge worker which is really the bulk now of the economy, can be far more productive cause you have this high performance virtual assistant in the cloud. I don't know that it's really going, AI or deep learning or machine learning, is really going to eliminate a lot of those jobs. It'll just make us far smarter and more efficient doing what we do. That's, I don't want to belittle, I don't want to minimize the potential for some structural dislocation in some fields. >> Well it's interesting because as an example, you're like the, you're already productive, now you become this hyper-productive individual, but you're also very creative and can pick and choose different toolings and so I think people like you it's huge opportunities. If you're a person who used to put up billboards maybe it's time for retraining. >> Yeah well maybe you know a lot of the people like the research assistants and so forth who would support someone like me and most knowledge worker organizations, maybe those people might be displaced cause we would have less need for them. In the same way that one of my very first jobs out of college before I got into my career, I was a file clerk in a court in Detroit, it's like you know, a totally manual job, and there was no automation or anything. You know that most of those functions, I haven't revisited that court in recent years, I'm sure are automated because you have this thing called computers, especially PCs and LANs and so forth that came along since then. So a fair amount of those kinds of feather bedding jobs have gone away and in any number of bureaucracies due to automation and machine learning is all about automation. So who knows where we'll all end up. >> Alright well we got to go but I wanted to ask you about... >> [Jim] I love unions by the way. >> And you got to meet a lot of lawyers I'm sure. >> Okay cool. >> So I got to ask you about your community of data scientists that you're building. You've been early on in that. It's been a persona that you've really tried to cultivate and collaborate with. So give us an update there. What's your, what's the latest, what's your effort like these days? >> Yeah, well, what we're doing is, I'm on a team now that's managing and bringing together all of our program for community engagement programs for really for across portfolio not just data scientists. That involves meet ups and hack-a-thons and developer days and user groups and so forth. These are really important professional forums for our customers, our developers, our partners, to get together and share their expertise and provide guidance to each other. And these are very very important for these people to become very good at, to help them, get better at what they do, help them stay up to speed on the latest technologies. Like deep learning, machine learning and so forth. So we take it very seriously at IBM that communities are really where customers can realize value and grow their human capital ongoing so we're making significant investments in growing those efforts and bringing them together in a unified way and making it easier for like developers and IT administrators to find the right forums, the right events, the right content, within IBM channels and so forth, to help them do their jobs effectively and machine learning is at the heart, not just of data science, but other professions within the IT and business analytics universe, relying more heavily now on machine learning and understanding the tools of the trade to be effective in their jobs. So we're bringing, we're educating our communities on machine learning, why it's so critically important to the future of IT. >> Well your content machine is great content so congratulations on not only kicking that off but continuing it. Thanks Jim for coming on the CUBE. It's good to see you. >> Thanks for having me. >> You're welcome. Alright keep it right there everybody, we'll be back with our next guest. The CUBE, we're live from the Waldorf-Astoria in New York City at the IBM Machine Learning Launch Event right back. (techno music)
SUMMARY :
Brought to you by IBM. Great to see you again James. Wonderful folks from the CUBE. so back to back, you know, continuous streaming, and that's really the core secret sauce in many One of the most funs I had, most fun I had last year, is the needle moving? of the machine learning algorithms to the data. of machine learning, the democratization, to use your term, It kind of died in the late 80s or in the 90s, So less and less of the actual labeling of the data So the more data the better. but the second was the dramatic reduction in the cost the correlations of interest to you? because people have been on the CUBE, so you can't necessarily save and process all the data. and the training and deployment of machine learning models. Okay, I want to ask you to weigh in Actually humans versus bots, let's, okay go ahead. but for the first time it's really beginning that it's the creativity of humans and machines and in real time enabling you to ask now you become this hyper-productive individual, In the same way that one of my very first jobs So I got to ask you about your community and machine learning is at the heart, Thanks Jim for coming on the CUBE. in New York City at the IBM Machine Learning
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jim Kobielus | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jim | PERSON | 0.99+ |
Dinesh | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
James | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
James Kobielus | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
Jim Laughs | PERSON | 0.99+ |
five | QUANTITY | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Detroit | LOCATION | 0.99+ |
1950s | DATE | 0.99+ |
last year | DATE | 0.99+ |
New York | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
10 data scientists | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Siri | TITLE | 0.99+ |
Dave | PERSON | 0.99+ |
10% | QUANTITY | 0.99+ |
5 megabytes | QUANTITY | 0.99+ |
Abi Metter | PERSON | 0.99+ |
two things | QUANTITY | 0.99+ |
first time | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
second | QUANTITY | 0.99+ |
90s | DATE | 0.99+ |
ZOS | TITLE | 0.99+ |
Rob | PERSON | 0.99+ |
last half century | DATE | 0.99+ |
today | DATE | 0.99+ |
early 2000s | DATE | 0.98+ |
Java | TITLE | 0.98+ |
one | QUANTITY | 0.98+ |
C | TITLE | 0.98+ |
10 years ago | DATE | 0.98+ |
first run | QUANTITY | 0.98+ |
late 80s | DATE | 0.98+ |
Watson | TITLE | 0.97+ |
late 70s | DATE | 0.97+ |
late 50s | DATE | 0.97+ |
zero | QUANTITY | 0.97+ |
IBM Machine Learning Launch Event | EVENT | 0.96+ |
early 80s | DATE | 0.96+ |
4trans | TITLE | 0.96+ |
second world war | EVENT | 0.95+ |
IBM Machine Learning Launch Event | EVENT | 0.94+ |
second era | QUANTITY | 0.94+ |
IBM Machine Learning Launch | EVENT | 0.93+ |
Stu | PERSON | 0.92+ |
first jobs | QUANTITY | 0.92+ |
middle 50s | DATE | 0.91+ |
couple years ago | DATE | 0.89+ |
agile | TITLE | 0.87+ |
petabyte | QUANTITY | 0.85+ |
BAL | TITLE | 0.84+ |
this decade | DATE | 0.81+ |
three eras | QUANTITY | 0.78+ |
last 10 years | DATE | 0.78+ |
this millennium | DATE | 0.75+ |
third era | QUANTITY | 0.72+ |
Dinesh Nirmal, IBM - IBM Machine Learning Launch - #IBMML - #theCUBE
>> [Announcer] Live from New York, it's theCube, covering the IBM Machine Learning Launch Event brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Welcome back to the Waldorf Astoria, everybody. This is theCube, the worldwide leader in live tech coverage. We're covering the IBM Machine Learning announcement. IBM bringing machine learning to its zMainframe, its private cloud. Dinesh Nirmel is here. He's the Vice President of Analytics at IBM and a Cube alum. Dinesh, good to see you again. >> Good to see you, Dave. >> So let's talk about ML. So we went through the big data, the data lake, the data swamp, all this stuff with the dupe. And now we're talking about machine learning and deep learning and AI and cognitive. Is it same wine, new bottle? Or is it an evolution of data and analytics? >> Good. So, Dave, let's talk about machine learning. Right. When I look at machine learning, there's three pillars. The first one is the product. I mean, you got to have a product, right. And you got to have a different shared set of functions and features available for customers to build models. For example, Canvas. I mean, those are table stakes. You got to have a set of algorithms available. So that's the product piece. >> [Dave] Uh huh. >> But then there's the process, the process of taking that model that you built in a notebook and being able to operationalize it. Meaning able to deploy it. That is, you know, I was talking to one of the customers today, and he was saying, "Machine learning is 20% fun and 80% elbow grease." Because that operationalizing of that model is not easy. Although they make it sound very simple, it's not. So if you take a banking, enterprise banking example, right? You build a model in the notebook. Some data sense build it. Now you have to take that and put it into your infrastructure or production environment, which has been there for decades. So you could have a third party software that you cannot change. You could have a set of rigid rules that already is there. You could have applications that was written in the 70's and 80's that nobody want to touch. How do you all of a sudden take the model and infuse in there? It's not easy. And so that is a tremendous amount of work. >> [Dave] Okay. >> The third pillar is the people or the expertise or the experience, the skills that needs to come through, right. So the product is one. The process of operationalizing and getting it into your production environment is another piece. And then the people is the third one. So when I look at machine learning, right. Those are three key pillars that you need to have to have a successful, you know, experience of machine learning. >> Okay, let's unpack that a little bit. Let's start with the differentiation. You mentioned Canvas, but talk about IBM specifically. >> [Dinesh] Right. What's so great about IBM? What's the differentiation? >> Right, exactly. Really good point. So we have been in the productive side for a very long time, right. I mean, it's not like we are coming into ML or AI or cognitive yesterday. We have been in that space for a very long time. We have SPSS predictive analytics available. So even if you look from all three pillars, what we are doing is we are, from a product perspective, we are bringing in the product where we are giving a choice or a flexibility to use the language you want. So there are customers who only want to use R. They are religious R users. They don't want to hear about anything else. There are customers who want to use Python, you know. They don't want to use anything else. So how do we give that choice of languages to our customers to say use any language you want. Or execution engines, right? Some folks want to use Park as execution engine. Some folks want to use R or Python, so we give that choice. Then you talked about Canvas. There are folks who want to use the GUI portion of the Canvas or a modeler to build models, or there are, you know, tekkie guys that we'll approach who want to use notebook. So how do you give that choice? So it becomes kind of like a freedom or a flexibility or a choice that we provide, so that's the product piece, right? We do that. Then the other piece is productivity. So one of the customers, the CTO of (mumbles) TV's going to come on stage with me during the main session, talk about how collaboration helped from an IBM machine learning perspective because their data scientists are sitting in New York City, our data scientists who are working with them are sitting in San Jose, California. And they were real time collaborating using notebooks in our ML projects where they can see the real time. What changes their data scientists are making. They can slack messages between each other. And that collaborative piece is what really helped us. So collaboration is one. Right from a productivity piece. We introduced something called Feedback Loop, whereby which your model can get trained. So today, you deploy a model. It could lose the score, and it could get degraded over time. Then you have to take it off-line and re-train, right? What we have done is like we introduced the Feedback Loops, so when you deploy your model, we give you two endpoints. The first endpoint is, basically, a URI, for you to plug-in your application when you, you know, run your application able call the scoring API. The second endpoint is this feedback endpoint, where you can choose to re-train the model. If you want three hours, if you want it to be six hours, you can do that. So we bring that flexibility, we bring that productivity into it. Then, the management of the models, right? How do we make sure that once you develop the model, you deploy the model. There's a life cycle involved there. How do you make sure that we enable, give you the tools to manage the model? So when you talk about differentiation, right? We are bringing differentiation on all three pillars. From a product perspective, with all the things I mentioned. From a deployment perspective. How do we make sure we have different choices of deployment, whether it's streaming, whether it's realtime, whether it's batch. You can do deployment, right? The Feedback Loop is another one. Once you deployed, how do we keep re-training it. And the last piece I talked about is the expertise or the people, right? So we are today announcing IBM Machine Learning Hub, which will become one place where our customers can go, ask questions, get education sessions, get training, right? Work together to build models. I'll give you an example, that although we are announcing hub, the IBM Machine Learning Hub today, we have been working with America First Credit Union for the last month or so. They approached us and said, you know, their underwriting takes a long time. All the knowledge is embedded in 15 to 20 human beings. And they want to make sure a machine should be able to absorb that knowledge and make that decision in minutes. So it takes hours or days. >> [Dave] So, Stu, before you jump in, so I got, put the portfolio. You know, you mentioned SPSS, expertise, choice. The collaboration, which I think you really stressed at the announcement last fall. The management of the models, so you can continuously improve it. >> Right. >> And then this knowledge base, what you're calling the hub. And I could argue, I guess, that if I take any one of those individual pieces, there, some of your competitors have them. Your argument would be it's all there. >> It all comes together, right? And you have to make sure that all three pillars come together. And customers see great value when you have that. >> Dinesh, customers today are used to kind of the deployment model on the public cloud, which is, "I want to activate a new service," you know. I just activate it, and it's there. When I think about private cloud environments, private clouds are operationally faster, but it's usually not miniature hours. It's usually more like months to deploy projects, which is still better than, you know, kind of, I think, before big data, it was, you know, oh, okay, 18 months to see if it works, and let's bring that down to, you know, a couple of months. Can you walk us through what does, you know, a customer today and says, "Great, I love this approach. "How long does it take?" You know, what's kind of the project life cycle of this? And how long will it take them to play around and pull some of these levers before they're, you know, getting productivity out of it? >> Right. So, really good questions, Stu. So let me back one step. So, in private cloud, we are going, we have new initiative called Download and Go, where our goal is to have our desktop products be able to install on your personal desktop in less than five clicks, in less than fifteen minutes. That's the goal. So the other day, you know, the team told me it's ready. That the first product is ready where you can go less than five clicks, fifteen minutes. I said the real test is I'm going to bring my son, who's five years old. Can he install it, and if he can install it, you know, we are good. And he did it. And I have a video to prove it, you know. So after the show, I will show you because and that's, when you talk about, you know, in the private cloud side, or the on-premise side, it has been a long project cycle. What we want is like you should be able to take our product, install it, and get the experience in minutes. That's the goal. And when you talk about private cloud and public cloud, another differentiating factor is that now you get the strength of IBM public cloud combined with the private cloud, so you could, you know, train your model in public cloud, and score on private cloud. You have the same experience. Not many folks, not many competitors can offer that, right? So that's another . .. >> [Stu] So if I get that right. If I as a customer have played around with the machine learning in Bluemix, I'm going to have a similar look, feel, API. >> Exactly the same, so what you have in Bluemix, right? I mean, so you have the Watson in Bluemix, which, you know, has deep learning, machine learning--all those capabilities. What we have done is we have done, is like, we have extracted the core capabilities of Watson on private cloud, and it's IBM Machine Learning. But the experience is the same. >> I want to talk about this notion of operationalizing analytics. And it ties, to me anyway, it ties into transformation. You mentioned going from Notebook to actually being able to embed analytics in workflow of the business. Can you double click on that a little bit, and maybe give some examples of how that has helped companies transform? >> Right. So when I talk about operationalizing, when you look at machine learning, right? You have all the way from data, which is the most critical piece, to building or deploying the model. A lot of times, data itself is not clean. I'll give you an example, right. So >> OSYX. >> Yeah. And when we are working with an insurance company, for example, the data that comes in. For example, if you just take gender, a lot of times the values are null. So we have to build another model to figure out if it's male or female, right? So in this case, for example, we have to say somebody has done a prostate exam. Obviously, he's a male. You know, we figured that. Or has a gynocology exam. It's a female. So we have to, you know, there's a lot of work just to get that data cleansed. So that's where I mentioned it's, you know, machine learning is 20% fun, 80% elbow grease because it's a lot of grease there that you need to make sure that you cleanse the data. Get that right. That's the shaping piece of it. Then, comes the building the model, right. And then, once you build the model on that data comes the operationalization of that model, which in itself is huge because how do you make sure that you infuse that model into your current infrastructure, which is where a lot of skill set, a lot of experience, and a lot of knowledge that comes in because you want to make sure, unless you are a start-up, right? You already have applications and programs and third-party vendors applications worth running for years, or decades, for that matter. So, yeah, so that's operationalization's a huge piece. Cleansing of the data is a huge piece. Getting the model right is another piece. >> And simplifying the whole process. I think about, I got to ingest the data. I've now got to, you know, play with it, explore. I've got to process it. And I've got to serve it to some, you know, some business need or application. And typically, those are separate processes, separate tools, maybe different personas that are doing that. Am I correct that your announcement in the Fall addressed that workflow. How is it being, you know, deployed and adopted in the field? How is it, again back to transformation, are you seeing that people are actually transforming their analytics processes and ultimately creating outcomes that they expect? >> Huge. So good point. We announced data science experience in the Fall. And the customers that who are going to speak with us today on stage, are the customers who have been using that. So, for example, if you take AFCU, America First Credit Union, they worked with us. In two weeks, you know, talk about transformation, we were able to absorb the knowledge of their underwriters. You know, what (mumbles) is in. Build that, get that features. And was able to build a model in two weeks. And the model is predicting 90%, with 90% accuracy. That's what early tests are showing. >> [Dave] And you say that was in a couple of weeks. You were, you developed that model. >> Yeah, yeah, right. So when we talk about transformation, right? We couldn't have done that a few years ago. We have transformed where the different personas can collaborate with each other, and that's a collaboration piece I talked about. Real time. Be able to build a model, and put it in the test to see what kind of benefits they're getting. >> And you've obviously got edge cases where people get really sophisticated, but, you know, we were sort of talking off camera, and you know like the 80/20 rule, or maybe it's the 90/10. You say most use cases can be, you know, solved with regression and classification. Can you talk about that a little more? >> So, so when we talk about machine learning, right? To me, I would say 90% of it is regression or classification. I mean there are edge case of our clustering and all those things. But linear regression or a classification can solve most of the, most of our customers problems, right? So whether it's fraud detection. Or whether it's underwriting the loan. Or whether you're trying to determine the sentiment analysis. I mean, you can kind of classify or do regression on it. So I would say that 90% of the cases can be covered, but like I said, most of the work is not about picking the right algorithm, but it's also about cleansing the data. Picking the algorithm, then comes building the model. Then comes deployment or operationalizing the model. So there's a step process that's involved, and each step involves some amount of work. So if I could make one more point on the technology and the transformation we have done. So even with picking the right algorithm, we automated, so you as a data scientist don't need to, you know, come in and figure out if I have 50 classifiers and each classifier has four parameters. That's 200 different combinations. Even if you take one hour on each combination, that's 200 hours or nine days that takes you to pick the right combination. What we have done is like in IBM Machine Learning we have something called cognitive assistance for data science, which will help you pick the right combination in minutes instead of days. >> So I can see how regression scales, and in the example you gave of classification, I can see how that scales. If you've got a, you know, fixed classification or maybe 200 parameters, or whatever it is, that scales, what happens, how are people dealing with, sort of automating that classification as things change, as they, some kind of new disease or pattern pops up. How do they address that at scale? >> Good point. So as the data changes, the model needs to change, right? Because everything that model knows is based on the training data. Now, if the data has changed, the symptoms of cancer or any disease has changed, obviously, you have to retrain that model. And that's where I talk about the, where the feedback loop comes in, where we will automatically retrain the model based on the new data that's coming in. So you, as an end user, for example, don't need to worry about it because we will take care of that piece also. We will automate that, also. >> Okay, good. And you've got a session this afternoon with you said two clients, right? AFCU and Kaden dot TV, and you're on, let's see, at 2:55. >> Right. >> So you folks watching the live stream, check that out. I'll give you the last word, you know, what shall we expect to hear there. Show a little leg on your discussion this afternoon. >> Right. So, obviously, I'm going to talk about the different shading factors, what we are delivering IBM Machine Learning, right? And I covered some of it. There's going to be much more. We are going to focus on how we are making freedom or flexibility available. How are we going to do productivity, right? Gains for our data scientists and developers. We are going to talk about trust, you know, the trust of data that we are bringing in. Then I'm going to bring the customers in and talk about their experience, right? We are delivering a product, but we already have customers using it, so I want them to come on stage and share the experiences of, you know, it's one thing you hear about that from us, but it's another thing that customers come and talk about it. So, and the last but not least is we are going to announce our first release of IBM Machine Learning on Z because if you look at 90% of the transactional data, today, it runs through Z, so they don't have to off-load the data to do analytics on it. We will make machine learning available, so you can do training and scoring right there on Z for your real time analytics, so. >> Right. Extending that theme that we talked about earlier, Stu, bringing analytics and transactions together, which is a big theme of the Z 13 announcement two years ago. Now you're seeing, you know, machine learning coming on Z. The live stream starts at 2 o'clock. Silicon Angle dot com had an article up on the site this morning from Maria Doucher on the IBM announcement, so check that out. Dinesh, thanks very much for coming back on theCube. Really appreciate it, and good luck today. >> Thank you. >> All right. Keep it right there, buddy. We'll be back with our next guest. This is theCube. We're live from the Waldorf Astoria for the IBM Machine Learning Event announcement. Right back.
SUMMARY :
brought to you by IBM. Dinesh, good to see you again. the data lake, the data swamp, And you got to have a different shared set So if you take a banking, to have a successful, you know, experience Let's start with the differentiation. What's the differentiation? the Feedback Loops, so when you deploy your model, The management of the models, so you can And I could argue, I guess, And customers see great value when you have that. and let's bring that down to, you know, So the other day, you know, the machine learning in Bluemix, I mean, so you have the Watson in Bluemix, Can you double click on that a little bit, when you look at machine learning, right? So we have to, you know, And I've got to serve it to some, you know, So, for example, if you take AFCU, [Dave] And you say that was in a couple of weeks. and put it in the test to see what kind You say most use cases can be, you know, we automated, so you as a data scientist and in the example you gave of classification, So as the data changes, with you said two clients, right? So you folks watching the live stream, you know, the trust of data that we are bringing in. on the IBM announcement, for the IBM Machine Learning Event announcement.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
20% | QUANTITY | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
AFCU | ORGANIZATION | 0.99+ |
15 | QUANTITY | 0.99+ |
one hour | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
Dinesh Nirmal | PERSON | 0.99+ |
Dinesh Nirmel | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
200 hours | QUANTITY | 0.99+ |
six hours | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
Dave | PERSON | 0.99+ |
80% | QUANTITY | 0.99+ |
less than fifteen minutes | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
fifteen minutes | QUANTITY | 0.99+ |
Maria Doucher | PERSON | 0.99+ |
America First Credit Union | ORGANIZATION | 0.99+ |
50 classifiers | QUANTITY | 0.99+ |
nine days | QUANTITY | 0.99+ |
three hours | QUANTITY | 0.99+ |
two clients | QUANTITY | 0.99+ |
Kaden dot TV | ORGANIZATION | 0.99+ |
less than five clicks | QUANTITY | 0.99+ |
18 months | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
two weeks | QUANTITY | 0.99+ |
200 different combinations | QUANTITY | 0.99+ |
Dinesh | PERSON | 0.99+ |
each classifier | QUANTITY | 0.99+ |
200 parameters | QUANTITY | 0.99+ |
each combination | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
today | DATE | 0.99+ |
each step | QUANTITY | 0.99+ |
two years ago | DATE | 0.99+ |
three key pillars | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
first product | QUANTITY | 0.98+ |
one step | QUANTITY | 0.98+ |
two endpoints | QUANTITY | 0.98+ |
third one | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
Watson | TITLE | 0.98+ |
2 o'clock | DATE | 0.98+ |
last month | DATE | 0.98+ |
first endpoint | QUANTITY | 0.98+ |
three pillars | QUANTITY | 0.98+ |
Silicon Angle dot com | ORGANIZATION | 0.98+ |
70's | DATE | 0.97+ |
80's | DATE | 0.97+ |
this afternoon | DATE | 0.97+ |
Z 13 | TITLE | 0.97+ |
Z | TITLE | 0.97+ |
last fall | DATE | 0.96+ |
Bluemix | TITLE | 0.96+ |
yesterday | DATE | 0.95+ |
2:55 | DATE | 0.95+ |
Rob Thomas, IBM | IBM Machine Learning Launch
>> Narrator: Live from New York, it's theCUBE. Covering the IBM Machine Learning Launch Event. Brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Welcome back to New York City, everybody this is theCUBE, we're here at the IBM Machine Learning Launch Event, Rob Thomas is here, he's the general manager of the IBM analytics group. Rob, good to see you again. >> Dave, great to see you, thanks for being here. >> Yeah it's our pleasure. So two years ago, IBM announced the Z platform, and the big theme was bringing analytics and transactions together. You guys are sort of extending that today, bringing machine learning. So the news just hit three minutes ago. >> Rob: Yep. >> Take us through what you announced. >> This is a big day for us. The announcement is we are going to bring machine learning to private Clouds, and my observation is this, you look at the world today, over 90% of the data in the world cannot be googled. Why is that? It's because it's behind corporate firewalls. And as we've worked with clients over the last few years, sometimes they don't want to move their most sensitive data to the public Cloud yet, and so what we've done is we've taken the machine learning from IBM Watson, we've extracted that, and we're enabling that on private Clouds, and we're telling clients you can get the power of machine learning across any type of data, whether it's data in a warehouse, a database, unstructured content, email, you name it we're bringing machine learning everywhere. To your point, we were thinking about, so where do we start? And we said, well, what is the world's most valuable data? It's the data on the mainframe. It's the transactional data that runs the retailers of the world, the banks of the world, insurance companies, airlines of the world, and so we said we're going to start there because we can show clients how they can use machine learning to unlock value in their most valuable data. >> And which, you say private Cloud, of course, we're talking about the original private Cloud, >> Rob: Yeah. >> Which is the mainframe, right? >> Rob: Exactly. >> And I presume that you'll extend that to other platforms over time is that right? >> Yeah, I mean, we're going to think about every place that data is managed behind a firewall, we want to enable machine learning as an ingredient. And so this is the first step, and we're going to be delivering every quarter starting next quarter, bringing it to other platforms, other repositories, because once clients get a taste of the idea of automating analytics with machine learning, what we call continuous intelligence, it changes the way they do analytics. And, so, demand will be off the charts here. >> So it's essentially Watson ML extracted and placed on Z, is that right? And describe how people are going to be using this and who's going to be using it. >> Sure, so Watson on the Cloud today is IBM's Cloud platform for artificial intelligence, cognitive computing, augmented intelligence. A component of that is machine learning. So we're bringing that as IBM machine learning which will run today on the mainframe, and then in the future, other platforms. Now let's talk about what it does. What it is, it's a single-place unified model management, so you can manage all your models from one place. And we've got really interesting technology that we pulled out of IBM research, called CADS, which stands for the Cognitive Assistance for Data Scientist. And the idea behind CADS is, you don't have to know which algorithm to choose, we're going to choose the algorithm for you. You build your model, we'll decide based on all the algorithms available on open-source what you built for yourself, what IBM's provided, what's the best way to run it, and our focus here is, it's about productivity of data science and data scientists. No company has as many data scientists as they want, and so we've got to make the ones they do have vastly more productive, and so with technology like CADS, we're helping them do their job more efficiently and better. >> Yeah, CADS, we've talked about this in theCUBE before, it's like an algorithm to choose an algorithm, and makes the best fit. >> Rob: Yeah. >> Okay. And you guys addressed some of the collaboration issues at your Watson data platform announcement last October, so talk about the personas who are asking you to give me access to mainframe data, and give me, to tooling that actually resides on this private Cloud. >> It's definitely a data science persona, but we see, I'd say, an emerging market where it's more the business analyst type that is saying I'd really like to get at that data, but I haven't been able to do that easily in the past. So giving them a single pane of glass if you will, with some light data science experience, where they can manage their models, using CADS to actually make it more productive. And then we have something called a feedback loop that's built into it, which is you build a model running on Z, as you get new data in, these are the largest transactional systems in the world so there's data coming in every second. As you get new data in, that model is constantly updating. The model is learning from the data that's coming in, and it's becoming smarter. That's the whole idea behind machine learning in the first place. And that's what we've been able to enable here. Now, you and I have talked through the years, Dave, about IBM's investment in Spark. This is one of the first, I would say, world-class applications of Spark. We announced Spark on the mainframe last year, what we're bringing with IBM machine learning is leveraging Spark as an execution engine on the mainframe, and so I see this as Spark is finally coming into the mainstream, when you talk about Spark accessing the world's greatest transactional data. >> Rob, I wonder if you can help our audience kind of squint through a compare and contrast, public Cloud versus what you're offering today, 'cause one thing, public Cloud adding new services, machine learning seemed like one of those areas that we would add, like IBM had done with a machine learning platform. Streaming, absolutely you hear mobile streaming applications absolutely happened in the public Cloud. Is cost similar in private Cloud? Can I get all the services? How will IBM and your customer base keep up with that pace of innovation that we've seen from IBM and others in the public Cloud on PRIM? >> Yeah, so, look, my view is it's not an either or. Because when you look at this valuable data, clients want to do some of it in public Cloud, they want to keep a lot of it in the system that they built on PRIMA. So our job is, how do we actually bridge that gap? So I see machine learning like we've talked about becoming much more of a hybrid capability over time because the data they want to move to the Cloud, they should do that. The economics are great. The data, doing it on private Cloud, actually the economics are tremendous as well. And so we're delivering an elastic infrastructure on private Cloud as well that can scale the public Cloud. So to me it's not either or, it's about what everybody wants as Cloud features. They want the elasticity, they want a creatable interface, they want the economics of Cloud, and our job is to deliver that in both places. Whether it's on the public Cloud, which we're doing, or on the private Cloud. >> Yeah, one of the thought exercises I've gone through is if you follow the data, and follow the applications, it's going to show you where customers are going to do things. If you look at IOT, if you look at healthcare, there's lots of uses that it's going to be on PRIMA it's going to be on the edge, I got to interview Walmart a couple of years ago at the IBM Ed show, and they leveraged Z globally to use their sales, their enablement, and obviously they're not going to use AWS as their platform. What's the trends, what do you hear form their customers, how much of the data, are there reasons why it needs to stay at the edge? It's not just compliance and governance, but it's just because that's where the data is and I think you were saying there's just so much data on the Z series itself compared to in other environments. >> Yeah, and it's not just the mainframe, right? Let's be honest, there's just massive amounts of data that still sits behind corporate firewalls. And while I believe the end destination is a lot of that will be on public Cloud, what do you do now? Because you can't wait until that future arrives. And so the place, the biggest change I've seen in the market in the last year is clients are building private Clouds. It's not traditional on-premise deployments, it's, they're building an elastic infrastructure behind their firewall, you see it a lot in heavily-regulated industries, so financial services where they're dealing with things like GDPR, any type of retailer who's dealing with things like PCI compliance. Heavy-regulated industries are saying, we want to move there, but we got challenges to solve right now. And so, our mission is, we want to make data simple and accessible, wherever it is, on private Cloud or public Cloud, and help clients on that journey. >> Okay, so carrying through on that, so you're now unlocking access to mainframe data, great, if I have, say, a retail example, and I've got some data science, I'm building some models, I'm accessing the mainframe data, if I have data that's elsewhere in the Cloud, how specifically with regard to this announcement will a practitioner execute on that? >> Yeah, so, one is you could decide one place that you want to land your data and have it be resonant, so you could do that. We have scenarios where clients are using data science experience on the Cloud, but they're actually leaving the data behind the firewalls. So we don't require them to move the data, so our model is one of flexibility in terms of how they want to manage their data assets. Which I think is unique in terms of IBM's approach to that. Others in the market say, if you want to use our tools, you have to move your data to our Cloud, some of them even say as you click through the terms, now we own your data, now we own your insights, that's not our approach. Our view is it's your data, if you want to run the applications in the Cloud, leave the data where it is, that's fine. If you want to move both to the Cloud, that's fine. If you wanted to leave both on private Cloud, that's fine. We have capabilities like Big SQL where we can actually federate data across public and private Clouds, so we're trying to provide choice and flexibility when it comes to this. >> And, Rob, in the context of this announcement, that would be, that example you gave, would be done through APIs that allow me access to that Cloud data is that right? >> Yeah, exactly, yes. >> Dave: Okay. >> So last year we announced something called Data Connect, which is basically, think of it as a bus between private and public Cloud. You can leverage Data Connect to seamlessly and easily move data. It's very high-speed, it uses our Aspera technology under the covers, so you can do that. >> Dave: A recent acquisition. >> Rob, IBM's been very active in open source engagement, in trying to help the industry sort out some of the challenges out there. Where do you see the state of the machine learning frameworks Google of course has TensorFlow, we've seen Amazon pushing at MXNet, is IBM supporting all of them, there certain horses that you have strong feelings for? What are your customers telling you? >> I believe in openness and choice. So with IBM machine learning you can choose your language, you can use Scala, you can use Java, you can use Python, more to come. You can choose your framework. We're starting with Spark ML because that's where we have our competency and that's where we see a lot of client desire. But I'm open to clients using other frameworks over time as well, so we'll start to bring that in. I think the IT industry always wants to kind of put people into a box. This is the model you should use. That's not our approach. Our approach is, you can use the language, you can use the framework that you want, and through things like IBM machine learning, we give you the ability to tap this data that is your most valuable data. >> Yeah, the box today has just become this mosaic and you have to provide access to all the pieces of that mosaic. One of the things that practitioners tell us is they struggle sometimes, and I wonder if you could weigh in on this, to invest either in improving the model or capturing more data and they have limited budget, and they said, okay. And I've had people tell me, no, you're way better off getting more data in, I've had people say, no no, now with machine learning we can advance the models. What are you seeing there, what are you advising customers in that regard? >> So, computes become relatively cheap, which is good. Data acquisitions become relatively cheap. So my view is, go full speed ahead on both of those. The value comes from the right algorithms and the right models. That's where the value is. And so I encourage clients, even think about maybe you separate your teams. And you have one that's focused on data acquisition and how you do that, and another team that's focused on model development, algorithm development. Because otherwise, if you give somebody both jobs, they both get done halfway, typically. And the value is from the right models, the right algorithms, so that's where we stress the focus. >> And models to date have been okay, but there's a lot of room for improvement. Like the two examples I like to use are retargeting, ad retargeting, which, as we all know as consumers is not great. You buy something and then you get targeted for another week. And then fraud detection, which is actually, for the last ten years, quite good, but there's still a lot of false positives. Where do you see IBM machine learning taking that practical use case in terms of improving those models? >> Yeah, so why are there false positives? The issue typically comes down to the quality of data, and the amount of data that you have that's why. Let me give an example. So one of the clients that's going to be talking at our event this afternoon is Argus who's focused on the healthcare space. >> Dave: Yeah, we're going to have him on here as well. >> Excellent, so Argus is basically, they collect data across payers, they're focused on healthcare, payers, providers, pharmacy benefit managers, and their whole mission is how do we cost-effectively serve different scenarios or different diseases, in this case diabetes, and how do we make sure we're getting the right care at the right time? So they've got all that data on the mainframe, they're constantly getting new data in, it could be about blood sugar levels, it could be about glucose, it could be about changes in blood pressure. Their models will get smarter over time because they built them with IBM machine learning so that what's cost-effective today may not be the most effective or cost-effective solution tomorrow. But we're giving them that continuous intelligence as data comes in to do that. That is the value of machine learning. I think sometimes people miss that point, they think it's just about making the data scientists' job easier, that productivity is part of it, but it's really about the voracity of the data and that you're constantly updating your models. >> And the patient outcome there, I read through some of the notes earlier, is if I can essentially opt in to allow the system to adjudicate the medication or the claim, and if I do so, I can get that instantaneously or in near real-time as opposed to have to wait weeks and phone calls and haggling. Is that right, did I get that right? >> That's right, and look, there's two dimensions. It's the cost of treatment, so you want to optimize that, and then it's the effectiveness. And which one's more important? Well, they're both actually critically important. And so what we're doing with Argus is building, helping them build models where they deploy this so that they're optimizing both of those. >> Right, and in the case, again, back to the personas, that would be, and you guys stressed this at your announcement last October, it's the data scientist, it's the data engineer, it's the, I guess even the application developer, right? Involved in that type of collaboration. >> My hope would be over time, when I talked about we view machine learning as an ingredient across everywhere that data is, is you want to embed machine learning into any applications that are built. And at that point you no longer need a data scientist per se, for that case, you can just have the app developer that's incorporating that. Whereas another tough challenge like the one we discussed, that's where you need data scientists. So think about, you need to divide and conquer the machine learning problem, where the data scientist can play, the business analyst can play, the app developers can play, the data engineers can play, and that's what we're enabling. >> And how does streaming fit in? We talked earlier about this sort of batch, interactive, and now you have this continuous sort of work load. How does streaming fit? >> So we use streaming in a few ways. One is very high-speed data ingest, it's a good way to get data into the Cloud. We also can do analytics on the fly. So a lot of our use case around streaming where we actually build analytical models into the streaming engine so that you're doing analytics on the fly. So I view that as, it's a different side of the same coin. It's kind of based on your use case, how fast you're ingesting data if you're, you know, sub-millisecond response times, you constantly have data coming in, you need something like a streaming engine to do that. >> And it's actually consolidating that data pipeline, is what you described which is big in terms of simplifying the complexity, this mosaic of a dupe, for example and that's a big value proposition of Spark. Alright, we'll give you the last word, you've got an audience outside waiting, big announcement today; final thoughts. >> You know, we talked about machine learning for a long time. I'll give you an analogy. So 1896, Charles Brady King is the first person to drive an automobile down the street in Detroit. It was 20 years later before Henry Ford actually turned it from a novelty into mass appeal. So it was like a 20-year incubation period where you could actually automate it, you could make it more cost-effective, you could make it simpler and easy. I feel like we're kind of in the same thing here where, the data era in my mind began around the turn of the century. Companies came onto the internet, started to collect a lot more data. It's taken us a while to get to the point where we could actually make this really easy and to do it at scale. And people have been wanting to do machine learning for years. It starts today. So we're excited about that. >> Yeah, and we saw the same thing with the steam engine, it was decades before it actually was perfected, and now the timeframe in our industry is compressed to years, sometimes months. >> Rob: Exactly. >> Alright, Rob, thanks very much for coming on theCUBE. Good luck with the announcement today. >> Thank you. >> Good to see you again. >> Thank you guys. >> Alright, keep it right there, everybody. We'll be right back with our next guest, we're live from the Waldorf Astoria, the IBM Machine Learning Launch Event. Be right back. [electronic music]
SUMMARY :
Brought to you by IBM. Rob, good to see you again. Dave, great to see you, and the big theme was bringing analytics and we're telling clients you can get it changes the way they do analytics. are going to be using this And the idea behind CADS and makes the best fit. so talk about the personas do that easily in the past. in the public Cloud. Whether it's on the public Cloud, and follow the applications, And so the place, that you want to land your under the covers, so you can do that. of the machine learning frameworks This is the model you should use. and you have to provide access to and the right models. for the last ten years, quite good, and the amount of data to have him on here as well. That is the value of machine learning. the system to adjudicate It's the cost of treatment, Right, and in the case, And at that point you no and now you have this We also can do analytics on the fly. in terms of simplifying the complexity, King is the first person and now the timeframe in our industry much for coming on theCUBE. the IBM Machine Learning Launch Event.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Henry Ford | PERSON | 0.99+ |
Rob | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Detroit | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Charles Brady King | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Scala | TITLE | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
New York | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
two dimensions | QUANTITY | 0.99+ |
1896 | DATE | 0.99+ |
Java | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
Argus | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
Python | TITLE | 0.99+ |
20-year | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
Argus | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
two examples | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
both jobs | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
next quarter | DATE | 0.99+ |
two years ago | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
ORGANIZATION | 0.98+ | |
first person | QUANTITY | 0.98+ |
three minutes ago | DATE | 0.98+ |
20 years later | DATE | 0.98+ |
Watson | TITLE | 0.98+ |
last October | DATE | 0.97+ |
IBM Machine Learning Launch Event | EVENT | 0.96+ |
IBM Machine Learning Launch Event | EVENT | 0.96+ |
Spark ML | TITLE | 0.96+ |
both places | QUANTITY | 0.95+ |
One | QUANTITY | 0.95+ |
IBM Machine Learning Launch Event | EVENT | 0.94+ |
MXNet | ORGANIZATION | 0.94+ |
Watson ML | TITLE | 0.94+ |
Data Connect | TITLE | 0.94+ |
Cloud | TITLE | 0.93+ |