Towards Understanding the Fundamental Limits of Analog, Continuous Time Computing

>> Hello everyone. My name is Zoltan Toroczkai. I am from University of Notre Dame, Physics department, and I'd like to thank the organizers for their kind invitation to participate in this very interesting and promising workshop. Also like to say that I look forward to collaborations with the Redefine Lab and Yoshian collaborators on the topics of this work. So today I'll briefly talk about, our attempt to understand, the fundamental limits of analog, continuous-time computing at least from the point of view of Boolean Satisfiability problem-solving using ordinary differential equations. But I think the issues that we raise during this occasion actually apply to other approaches, analog approaches as well, until to other problems as well. I think everyone here, knows what Boolean Satisfiability problems are. You have N Boolean variables, you have M clauses. Each a disjunction of K literals. Literal is a variable or it's negation. And the goal is to find an assignment to the variable such that all the clauses are true. This is a decision type problem from the NP class, which means you can check in polynomial time for satisfiability of any assignment. And the 3-SAT is NP-complete with K, 3 or larger, which means an efficient 3-SAT solver, (clears throat) implies an efficient solver for all the problems in the NP clause because all the problems in the NP clause can be reduced in polynomial time to 3-SAT. As a matter of fact you can, reduce the NP-complete problems into each other. You can go from 3-SAT to Set Packing or to Maximum Independent Set which is the set packing in graph theoretic notions or terms, to the ising graph SAT problem decision version. This is useful when you are comparing different approaches or working on different kinds of problems. When not all the clauses can be satisfied, you're looking at the optimization version of SAT, called Max-SAT and the goal here is to find the assignment that satisfies the maximum number of clauses, and this is from the NP-hard class. In terms of applications, if we had an efficient SAT solver, or NP-complete problem solver, it would literally, positively influence thousands of problems in applications in industry and science. I'm not going to read this. But this of course gives us some motivation, to work on these kind of problems. Now, our approach to SAT solving, involves embedding the problem in a continuous space, and you use all these to do that. So instead of working zeros and ones, we work with minus one and plus ones, and if we allow the corresponding variables, to change continuously between the two bounds, we formulate the problem with the help of a Clause Matrix. If, if a clause does not contain a variable or its negation, the corresponding matrix element is zero. If it contains the variable in positive form it's one. If it contains the variable in negated form, it's negative one. And now we use this to formulate these products, called clause violation functions, one for every clause, which rarely continues between zero and one and beyond zero if and only if the clause itself is true. Then we form... We define, also define the dynamics, search dynamics in this and the M-dimensional hypercube, where the search happens and if there exists solutions they're sitting in some of the corners of this hypercube. So we define this energy, potential or landscape function as shown here in a way that it, this is zero if and only if all the clauses, all the Kms are zero. All the clauses are satisfied, keeping these auxiliary variables, Ams always positive. And therefore what we do here is a dynamics that is essentially a gradient descent on this potential energy landscape. If you are to keep all the Ams constant then it would get stuck in some local minimum. However what do you do here is, we couple it with the dynamics. We couple it with the clause violation functions as shown here. And if you didn't have these Am here, just had just the Kms, for example, you have essentially, both case you have a positive feedback. You have a decreasing variable, but in that case you'll still get stuck, would still behave... We'll still find solutions better than the constant version or still would get stuck. Only when we put here this Am, which makes them dynamics in this variable exponential like, only then it keeps searching until it finds a solution. And there's a reason for that, that I'm not going to talk about here, but essentially boils down to performing a gradient descent on a globally time-varying landscape. And, and, and this is what works. Now, I'm going to talk about the good or bad, and maybe the ugly. This is, this is... What's good is that it's a hyperbolic dynamical system, which means that if you take any domain in the search space that doesn't have a solution in it or any solution in it, then the number of trajectories in it, the case exponentially quickly and the decay rate is a characteristic, invariant characteristic of the dynamics itself with the dynamical systems called the escape rate. The inverse of that is the timescale in which you find solutions by this dynamical system. And you can see here some trajectories, they are curved because it's, it's not linear but it's transiently curved to give, if there are solutions of course, we could see eventually, it does lead to the solutions. Now, in terms of performance, here what you show, for a bunch of, constraint densities, defined by, M over N, the ratio between clauses to variables, for random SAT problems, is random 3-SAT problems. And they, they, as, as function of N, and we look at, monitor the wall time, the wall clock time, and it, it behaves quite well, it behaves as a, as a polynomialy, until you actually hit, or reach the set on set transition, where the hardest problems are found. But what's more interesting is if you monitor the continuous-time t, the performance in terms of the analog continuous-time t, because that seems to be a polynomial. And the way we show that, is we can see the random K-SAT or random 3-SAT for a fixed constraint density. And we here, what you show here is at the, right at the threshold where it's really hard. And, (clears throat) we monitor the fraction of problems that we have not been able to solve it. We select thousands of problems at that cost rate ratio and we solve them with our algorithm, and we monitor the fraction of problems that have not yet been solved by continuous-time t. And these, as you see these decays exponentially in different decay rates for different system sizes and in this spot shows that this decay rate behaves polynomialy. or actually as a power law. So if you combine these two, you find that the time needed to solve all problems, except maybe appeared fraction of them, scales polynomialy with problem size. So you have polynomial continuous-time complexity. And this is also true, for other types of very hard constraints of the SAT problem such as exact color, because you can always transform them into 3-SAT as we discussed before, Ramsay coloring and, and on these problems, even algorithms like a survey propagation wheel will fail. But this doesn't mean that P equals NP because what you have, first of all, if you were to implement these equations in a device, whose behavior is described by these ODEs, then of course, t the continuous-time variable, becomes a physical wall clock time. And that would be polynomialy scaling but you have other variables, auxiliary variables, which fluctuate in an exponential manner. So if they represent currents or voltages in your realization and it would be an exponential cost algorithm. But this is some kind of trade between time and energy while I know how to generate energy or I don't know how to generate time but I know how to generate energy so it could be useful. But there's other issues as well, especially if you're trying to do this on a digital machine, but also happens, problems happen, appear, other problems appear on in physical devices as well as we discuss later. So if you implement these in GPU, you can, then you can get an order of two magnitude speedup, and you can also modify this, to solve Max-SAT problems quite efficiently, we are competitive with the best, heuristics solvers, this is all the problems in 2016, Max-SAT competition. So, so this, this, this is definitely, this is like a good approach, but there's of course, interesting limitations, I would say interesting, because it kind of makes you think about what it needs and how you can explore this, these observations in understanding better analog continuous-time complexity. If you monitor the discrete number, the number of discrete steps, done by the Runge Kutta integrator, and you solve this on a digital machine. You're using some kind of integrator, and, you know, using the same approach, but now you measure the number of problems you haven't solved, by a given number of discrete steps taken by the integrator. You find out, you have exponential discrete-time complexity. And of course, this is a problem. And if you look closely, what happens, even though the analog mathematical trajectory, that's the red curve here, if you monitor what happens in discrete time, the integrator fluctuates very little. So this is like you know, third or four digits precision, but fluctuates like crazy. So it really is like the integration freezes out, and this is because of the phenomenon of stiffness that I'll talk a little bit, more about a little bit later. So you know, it may look like an integration issue on your digital machines that you could improve and you could definitely improve, but actually the issue is bigger than that. It's, it's deeper than that because on a digital machine there is no time energy conversion. So the auxiliary variables are efficiently represented in a digital machine, so there's no exponential fluctuating current or voltage in your computer when you do this. So if e is not equal NP, then the exponential time complexity or exponential cost complexity has to hit you somewhere. And this is how. But you know one would be tempted to think maybe, this wouldn't be an issue in a analog device, and to some extent is true. Analog devices can be orders of magnitude faster, but they also suffer from their own problems because P not equal NP affects that clause of followers as well. So, indeed if you look at other systems, like Coherent Ising Machine with Measurement-Feedback, or Polariton Condensate Graphs or Oscillator Networks, they all hinge on some kind of, our ability to control real variables with arbitrarily high precision, and Oscillator Networks, you want to read out arbitrarily close frequencies. In case of CIMs, we require identical analog amplitudes which is hard to keep and they kind of fluctuate away from one another, shift away from one another, And, and if you control that, of course, then you can control the performance. So, actually one can ask if whether or not this is a universal bottleneck, and it seems so, as I will argue next. We can recall a fundamental result by A. Schönhage, Graham Schönhage from 1978 who says that, it's a purely computer science proof, that, "If you are able to compute, "the addition, multiplication, division "of real variables with infinite precision then, "you could solve NP-complete problems in polynomial time." He doesn't actually propose a solid work, he just shows mathematically that this will be the case. Now, of course, in real world, you have loss of precision. So the next question is, "How does that affect the computation of our problems?" This is what we are after. Loss of precision means information loss or entropy production. So what we are really looking at, the relationship between hardness and cost of computing of a problem. (clears throat) And according to Sean Harget, there is this left branch, which in principle could be polynomial time, but the question, whether or not this is achievable, that is not achievable, but something more achievable that's on the right-hand side. You know, there's always going to be some information loss, some entropy generation that could keep you away from, possibly from polynomial time. So this is what we'd like to understand. And this information loss, the source of this is not just noise, as, as I will argue in any physical system, but it's also of algorithmic nature. So that is a questionable area or, or approach, but Schönhage's result is purely theoretical, no actual solver is proposed. So we can ask, you know, just theoretically, out of curiosity, "Would in principle be such solvers?" Because he's not proposing a solver. In such properties in principle, if you were to look mathematically, precisely what that solver does, would have the right properties. And I argue, yes, I don't have a mathematical proof but I have some arguments that this would be the case. And this is the case for actually our sitdia solver, that if you could calculate, it's subjectivity in a loss this way, then it would be... Would solve NP-complete problems in polynomial continuous-time. Now, as a matter of fact, this is a bit more difficult question because time in all these can be re-scaled however you want. So what Bournez says, that you actually have to measure the length of the trajectory which is an invariant of the dynamical system or the property of the dynamical system, not of it's parametrization. And we did that. So Shubha Kharel my student did that, by first improving on the stiffness of the problem of the integrations using the implicit solvers and some smart tricks, such that you actually are closer to the actual trajectory and using the same approach to know, what fraction of problems you can solve. We did not give a length of the trajectory, you find that it is polynomialy scaling with the problem size. So we have polynomial scale complexity. That means that our solver is both poly-length, and as it is defined, it's also poly-time analog solver. But if you look at as a discrete algorithm, which will measure the discrete steps on a digital machine, it is an exponential solver, and the reason is because of all this stiffness. So every integrator has to truncate, digitize and truncate the equations. And what it has to do is to keep the integration within this so-called Stimpy TD gen for, for that scheme. And you have to keep this product within Eigenvalues of the Jacobian and the step size within this region, if you use explicit methods, you want to stay within this region. But what happens, that some of the eigenvalues grow fast for stiff problems, and then you're, you're forced to reduce that t, so the product stays in this bounded domain, which means that now you have to, we are forced to take smaller and smaller time steps, so you're, you're freezing out the integration and what I will show you, that's the case. Now you can move to implicit solvers, which is a new trick, in this case, your stability domain is actually on the outside, but what happens in this case, is some of the eigenvalues of the Jacobian, also for this instant start to move to zero, as they are moving to zero, they are going to enter this instability region. So your solver is going to try to keep it out, so it's going to increase the delta t, but if you increase that t, you increase the truncation errors, so you get randomized in the large search space. So it's, it's really not, not willing to work out. Now, one can sort of, introduce a theory or a language to discuss computational, analog computational complexity, using the language from dynamical systems theory. But basically I don't have time to go into this but you have for hard problems, the chaotic object the chaotic saddle in the middle of the search space somewhere, and that dictates how the dynamics happens and invariant properties of the dynamics, of course, of that saddle is what determines performance and many things. So an important measure that we find that, is also helpful in describing, this analog complexity is the so-called Kolmogorov or metric entropy. And basically what this does in an intuitive way, is to describe the rate at which the uncertainty, containing the insignificant digits of a trajectory in the back, they flow towards the significant ones, as you lose information because of errors being, grown or, or or, or developed into larger errors in an exponential, at an exponential rate because you have positive Lyapunov exponents. But this is an invariant property. It's the property of the set of all these, not how you compute them. And it's really the intrinsic rate of accuracy loss of a dynamical system. As I said that you have in such a high dimensional dynamical system, you have positive and negative Lyapunov exponents, as many as the total is the dimension of the space and user dimension, the number of unstable manufactured dimensions and assets now more stable many forms dimensions. And there's an interesting and I think important Pesin equality, equality called the Pesin equality, that connects the information theoretic, as per the rate of information loss with the geometric data each trajectory separate minus cut part which is the escape rate that I already talked about. Now, one can actually prove a simple theorem strike back of the calculation. The idea here is that, you know the rate at which the largest rate at which the closely started trajectory, separate from one another. So now you can say that, that is fine, as long as my trajectory finds the solution, before the trajectory separate too quickly. In that case, I can have the hope, that if I start from some region of the face space, several closely started trajectories, they kind of go into the same solution over time and that's, that's, that's this upper bound of this limit. And it is really showing that it has to be... It's an exponentially smaller number, but it depends on the N, dependence of the exponent right here, which combines information loss rate and the solution time performance. So these, if these exponent here or there, has a large independence, so even a linear independence, then you really have to start trajectories, exponentially closer to one another, in order to end up in the same order. So this is sort of like the, the direction that you are going into, and this formulation is applicable to, to all dynamical systems, deterministic dynamical systems. (clears throat) And I think we can expand this further because the, there is a way of getting the expression for the escape rates in terms of N the number of variables from cycle expansions, that I don't have time to talk about, but it's kind of like a program that you can try to pursue. And this is it. So uh, uh... The conclusions, I think are self-explanatory. I think there is a lot of future in, in analog continuous-time computing. They can be efficient by orders of magnitude than digital ones in solving NP-hard problems, because first of all, many of the systems lack of von Neumann bottleneck, there's parallelism involved and you can also have a larger spectrum of continuous-time dynamical algorithms than discrete ones. And, and, you know, but we also have to be mindful of what are the possibilities, what are the limits? And one, one open question, if any important open question is you know, "What are these limits? "Is there some kind of no-go theorem that tells you that, "you can never perform better than this limit "or, or that limit?" And I think that's, that's the exciting part to, to derive these, these limits and to get to an understanding about what's possible in this, in this area. Thank you.

Published Date : Sep 21 2020

SUMMARY :

in some of the corners of this hypercube.

ENTITIES

Entity	Category	Confidence
Zoltan Toroczkai	PERSON	0.99+
Sean Harget	PERSON	0.99+
1978	DATE	0.99+
2016	DATE	0.99+
A. Schönhage	PERSON	0.99+
Graham Schönhage	PERSON	0.99+
Shubha Kharel	PERSON	0.99+
Bournez	PERSON	0.99+
University of Notre Dame	ORGANIZATION	0.99+
Schönhage	PERSON	0.99+
Redefine Lab	ORGANIZATION	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
third	QUANTITY	0.99+
thousands	QUANTITY	0.99+
two bounds	QUANTITY	0.99+
Ramsay	PERSON	0.98+
zero	QUANTITY	0.98+
one	QUANTITY	0.98+
each trajectory	QUANTITY	0.98+
both	QUANTITY	0.98+
four digits	QUANTITY	0.96+
first	QUANTITY	0.96+
von Neumann	PERSON	0.96+
Jacobian	OTHER	0.95+
Each	QUANTITY	0.9+
Yoshian	ORGANIZATION	0.9+
Lyapunov	OTHER	0.9+
zeros	QUANTITY	0.87+
thousands of problems	QUANTITY	0.86+
two magnitude	QUANTITY	0.85+
both case	QUANTITY	0.81+
minus one	QUANTITY	0.78+
3-SAT	OTHER	0.74+
one open question	QUANTITY	0.73+
Runge Kutta	PERSON	0.73+
plus	QUANTITY	0.66+
3	OTHER	0.65+
every	QUANTITY	0.65+
Clause	OTHER	0.56+
ones	QUANTITY	0.55+
N Boolean	OTHER	0.55+
SAT	OTHER	0.46+
problems	QUANTITY	0.44+

Show Wrap | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's three Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back. We're here to wrap up the M I T. Chief data officer officer, information quality. It's hashtag m i t CDO conference. You're watching the Cube. I'm David Dante, and Paul Gill is my co host. This is two days of coverage. We're wrapping up eyes. Our analysis of what's going on here, Paul, Let me let me kick it off. When we first started here, we talked about that are open. It was way saw the chief data officer role emerged from the back office, the information quality role. When in 2013 the CEO's that we talked to when we asked them what was their scope. We heard things like, Oh, it's very wide. Involves analytics, data science. Some CEOs even said Oh, yes, security is actually part of our purview because all the cyber data so very, very wide scope. Even in some cases, some of the digital initiatives were sort of being claimed. The studios were staking their claim. The reality was the CDO also emerged out of highly regulated industries financialservices healthcare government. And it really was this kind of wonky back office role. And so that's what my compliance, that's what it's become again. We're seeing that CEOs largely you're not involved in a lot of the emerging. Aye, aye initiatives. That's what we heard, sort of anecdotally talking to various folks At the same time. I feel as though the CDO role has been more fossilized than it was before. We used to ask, Is this role going to be around anymore? We had C I. Ose tell us that the CEO Rose was going to disappear, so you had both ends of the spectrum. But I feel as though that whatever it's called CDO Data's our chief analytics off officer, head of data, you know, analytics and governance. That role is here to stay, at least for for a fair amount of time and increasingly, issues of privacy and governance. And at least the periphery of security are gonna be supported by that CD a role. So that's kind of takeaway Number one. Let me get your thoughts. >> I think there's a maturity process going on here. What we saw really in 2016 through 2018 was, ah, sort of a celebration of the arrival of the CDO. And we're here, you know, we've got we've got power now we've got an agenda. And that was I mean, that was a natural outcome of all this growth and 90% of organizations putting sea Dios in place. I think what you're seeing now is a realization that Oh, my God, this is a mess. You know what I heard? This year was a lot less of this sort of crowing about the ascendance of sea Dios and Maura about We've got a big integration problem of big data cleansing problem, and we've got to get our hands down to the nitty gritty. And when you talk about, as you said, we had in here so much this year about strategic initiatives, about about artificial intelligence, about getting involved in digital business or customer experience transformation. What we heard this year was about cleaning up data, finding the data that you've got organizing it, applying meditator, too. It is getting in shape to do something with it. There's nothing wrong with that. I just think it's part of the natural maturation process. Organizations now have to go through Tiu to the dirty process of cleaning up this data before they can get to the next stage, which was a couple of three years out for most of >> the second. Big theme, of course. We heard this from the former head of analytics. That G s K on the opening keynote is the traditional methods have failed the the Enterprise Data Warehouse, and we've actually studied this a lot. You know, my analogy is often you snake swallowing a basketball, having to build cubes. E D W practitioners would always used to call it chasing the chips until we come up with a new chip. Oh, we need that because we gotta run faster because it's taking us hours and hours, weeks days to run these analytics. So that really was not an agile. It was a rear view mirror looking thing. And Sarbanes Oxley saved the E. D. W. Business because reporting became part of compliance thing perspective. The master data management piece we've heard. Do you consistently? We heard Mike Stone Breaker, who's obviously a technology visionary, was right on. It doesn't scale through this notion of duping. Everything just doesn't work and manually creating rules. It's just it's just not the right approach. This we also heard the top down data data enterprise data model doesn't works too complicated, can operationalize it. So what they do, they kick the can to governance. The Duke was kind of a sidecar, their big data that failed to live up to its promises. And so it's It's a big question as to whether or not a I will bring that level of automation we heard from KPMG. Certainly, Mike Stone breaker again said way heard this, uh, a cz well, from Andy Palmer. They're using technology toe automate and scale that big number one data science problem, which is? They spend all their time wrangling data. We'll see if that if that actually lives up >> to his probable is something we did here today from several of our guests. Was about the promise of machine learning to automate this day to clean up process and as ah Mark Ramsay kick off the conference saying that all of these efforts to standardize data have failed in the past. This does look, He then showed how how G s K had used some of the tools that were represented here using machine learning to actually clean up the data at G S. K. So there is. And I heard today a lot of optimism from the people we talked to about the capability of Chris, for example, talking about the capability of machine learning to bring some order to solve this scale scale problem Because really organizing data creating enterprise data models is a scale problem, and the only way you can solve that it's with with automation, Mike Stone breaker is right on top of that. So there was optimism at this event. There was kind of an ooh, kind of, ah, a dismay at seeing all the data problems they have to clean up, but also promised that tools are on the way that could do that. >> Yeah, The reason I'm an optimist about this role is because data such a hard problem. And while there is a feeling of wow, this is really a challenge. There's a lot of smart people here who are up for the challenge and have the d n a for it. So the role, that whole 360 thing. We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, which is really bringing machine intelligence to the table. We haven't heard that as much at this event. It's now front and center. It's just another example of a I injecting itself into virtually every aspect every corner of the industry. And again, I often jokes. Same wine, new bottle. Our industry has a habit of doing that, but it's cyclical, but it is. But we seem to be making consistent progress. >> And the machine learning, I thought was interesting. Several very guest spoke to machine learning being applied to the plumbing projects right now to cleaning up data. Those are really self contained projects. You can manage those you can. You can determine out test outcomes. You can vet the quality of the of the algorithms. It's not like you're putting machine learning out there in front of the customer where it could potentially do some real damage. There. They're vetting their burning in machine, learning in a environment that they control. >> Right, So So, Amy, Two solid days here. I think that this this conference has really grown when we first started here is about 130 people, I think. And now it was 500 registrants. This'd year. I think 600 is the sort of the goal for next year. Moving venues. The Cube has been covering this all but one year since 2013. Hope to continue to do that. Paul was great working with you. Um, always great work. I hope we can, uh we could do more together. We heard the verdict is bringing back its conference. You put that together. So we had column. Mahoney, um, had the vertical rock stars on which was fun. Com Mahoney, Mike Stone breaker uh, Andy Palmer and Chris Lynch all kind of weighed in, which was great to get their perspectives kind of the days of MPP and how that's evolved improving on traditional relational database. And and now you're Stone breaker. Applying all these m i. Same thing with that scale with Chris Lynch. So it's fun to tow. Watch those guys all Boston based East Coast folks some news. We just saw the news hit President Trump holding up jet icon contractors is we've talked about. We've been following that story very closely and I've got some concerns over that. It's I think it's largely because he doesn't like Bezos in The Washington Post Post. Exactly. You know, here's this you know, America first. The Pentagon says they need this to be competitive with China >> and a I. >> There's maybe some you know, where there's smoke. There's fire there, so >> it's more important to stick in >> the eye. That's what it seems like. So we're watching that story very closely. I think it's I think it's a bad move for the executive branch to be involved in those type of decisions. But you know what I know? Well, anyway, Paul awesome working with you guys. Thanks. And to appreciate you flying out, Sal. Good job, Alex Mike. Great. Already wrapping up. So thank you for watching. Go to silicon angle dot com for all the news. Youtube dot com slash silicon angles where we house our playlist. But the cube dot net is the main site where we have all the events. It will show you what's coming up next. We've got a bunch of stuff going on straight through the summer. And then, of course, VM World is the big kickoff for the fall season. Goto wicked bond dot com for all the research. We're out. Thanks for watching Dave. A lot day for Paul Gillon will see you next time.

Published Date : Aug 1 2019

SUMMARY :

Brought to you by in 2013 the CEO's that we talked to when we asked them what was their scope. And that was I mean, And Sarbanes Oxley saved the E. data models is a scale problem, and the only way you can solve that it's with with automation, We talked about the traditional methods, you know, kind of failing, and in the third piece that touched on, And the machine learning, I thought was interesting. We just saw the news hit President Trump holding up jet icon contractors There's maybe some you know, where there's smoke. And to appreciate you flying out, Sal.

ENTITIES

Entity	Category	Confidence
Andy Palmer	PERSON	0.99+
David Dante	PERSON	0.99+
Chris Lynch	PERSON	0.99+
Chris	PERSON	0.99+
2013	DATE	0.99+
Paul	PERSON	0.99+
Paul Gill	PERSON	0.99+
Mike Stone	PERSON	0.99+
2016	DATE	0.99+
Paul Gillon	PERSON	0.99+
Mike Stone Breaker	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2018	DATE	0.99+
Rose	PERSON	0.99+
Alex Mike	PERSON	0.99+
Bezos	PERSON	0.99+
G s K	ORGANIZATION	0.99+
Mahoney	PERSON	0.99+
Boston	LOCATION	0.99+
KPMG	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Sal	PERSON	0.99+
third piece	QUANTITY	0.99+
Dave	PERSON	0.99+
500 registrants	QUANTITY	0.99+
two days	QUANTITY	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
today	DATE	0.99+
next year	DATE	0.99+
Mark Ramsay	PERSON	0.99+
360	QUANTITY	0.99+
this year	DATE	0.99+
Maura	PERSON	0.99+
G S. K.	ORGANIZATION	0.98+
Youtube	ORGANIZATION	0.98+
Amy	PERSON	0.98+
Pentagon	ORGANIZATION	0.98+
C I. Ose	PERSON	0.98+
Sarbanes Oxley	PERSON	0.97+
first	QUANTITY	0.97+
This year	DATE	0.96+
one year	QUANTITY	0.96+
Mike Stone breaker	PERSON	0.95+
Enterprise Data Warehouse	ORGANIZATION	0.95+
Dios	PERSON	0.94+
Two solid days	QUANTITY	0.94+
second	QUANTITY	0.94+
three years	QUANTITY	0.92+
about 130 people	QUANTITY	0.91+
600	QUANTITY	0.9+
Duke	ORGANIZATION	0.89+
VM World	EVENT	0.88+
dot com	ORGANIZATION	0.85+
China	ORGANIZATION	0.84+
E. D. W.	ORGANIZATION	0.83+
Cube	ORGANIZATION	0.8+
MIT	ORGANIZATION	0.77+
East Coast	LOCATION	0.75+
M I T.	PERSON	0.75+
2019	DATE	0.74+
President Trump	PERSON	0.71+
both ends	QUANTITY	0.71+
three	QUANTITY	0.68+
M I T.	EVENT	0.64+
cube dot net	ORGANIZATION	0.59+
Chief	PERSON	0.58+
The Washington Post Post	TITLE	0.57+
America	ORGANIZATION	0.56+
Goto wicked	ORGANIZATION	0.54+
CEO	PERSON	0.54+
couple	QUANTITY	0.54+
CDO	ORGANIZATION	0.45+
Stone	PERSON	0.43+
CDOIQ	TITLE	0.24+

Michael Stonebraker, TAMR | MIT CDOIQ 2019

>> from Cambridge, Massachusetts. It's the Cube covering M I T. Chief data officer and information quality Symposium 2019. Brought to you by Silicon Angle Media. >> Welcome back to Cambridge, Massachusetts. Everybody, You're watching the Cube, the leader in live tech coverage, and we're covering the M I t CDO conference M I t. CDO. My name is David Monty in here with my co host, Paul Galen. Mike Stone breakers here. The legend is founder CTO of Of Tamer, as well as many other companies. Inventor Michael. Thanks for coming back in the Cube. Good to see again. Nice to be here. So this is kind of ah, repeat pattern for all of us. We kind of gather here in August that the CDO conference You're always the highlight of the show. You gave a talk this week on the top 10. Big data mistakes. You and I are one of the few. You were the few people who still use the term big data. I happen to like it. Sad that it's out of vogue already, but people associated with the doo doop it's kind of waning, but regardless, so welcome. How'd the talk go? What were you talking about. >> So I talked to a lot of people who were doing analytics. We're doing operation Offer operational day of data at scale, and they always make most of them make a collection of bad mistakes. And so the talk waas a litany of the blunders that I've seen people make, and so the audience could relate to the blunders about most. Most of the enterprise is represented. Make a bunch of the blunders. So I think no. One blunder is not planning on moving most everything to the cloud. >> So that's interesting, because a lot of people would would would love to debate that, but and I would imagine you probably could have done this 10 years ago in a lot of the blunders would be the same, but that's one that wouldn't have been there. But so I tend to agree. I was one of the two hands that went up this morning, and vocalist talk when he asked, Is the cloud cheaper for us? It is anyway. But so what? Why should everybody move everything? The cloud aren't there laws of physics, laws of economics, laws of the land that suggest maybe you >> shouldn't? Well, I guess 22 things and then a comment. First thing is James Hamilton, who's no techies. Techie works for Amazon. We know James. So he claims that he could stand up a server for 25% of your cost. I have no reason to disbelieve him. That number has been pretty constant for a few years, so his cost is 1/4 of your cost. Sooner or later, prices are gonna reflect costs as there's a race to the bottom of cloud servers. So >> So can I just stop you there for a second? Because you're some other date on that. All you have to do is look at a W S is operating margin and you'll see how profitable they are. They have software like economics. Now we're deploying servers. So sorry to interrupt, but so carry. So >> anyway, sooner or later, they're gonna have their gonna be wildly cheaper than you are. The second, then yet is from Dave DeWitt, whose database wizard. And here's the current technology that that Microsoft Azure is using. As of 18 months ago, it's shipping containers and parking lots, chilled water in power in Internet, Ian otherwise sealed roof and walls optional. So if you're doing raised flooring in Cambridge versus I'm doing shipping containers in the Columbia River Valley, who's gonna be a lot cheaper? And so you know the economies of scale? I mean, that, uh, big, big cloud guys are building data centers as fast as they can, using the cheapest technology around. You put up the data center every 10 years on dhe. You do it on raised flooring in Cambridge. So sooner or later, the cloud guys are gonna be a lot cheaper. And the only thing that isn't gonna the only thing that will change that equation is For example, my lab is up the street with Frank Gehry building, and we have we have an I t i t department who runs servers in Cambridge. Uh, and they claim they're cheaper than the cloud. And they don't pay rent for square footage and they don't pay for electricity. So yeah, if if think externalities, If there are no externalities, the cloud is assuredly going to be cheaper. And then the other thing is that most everybody tonight that I talk thio including me, has very skewed resource demands. So in the cloud finding three servers, except for the last day of the month on the last day of the month. I need 20 servers. I just do it. If I'm doing on Prem, I've got a provision for peak load. And so again, I'm just way more expensive. So I think sooner or later these combinations of effects was going to send everybody to the cloud for most everything, >> and my point about the operating margins is difference in price and cost. I think James Hamilton's right on it. If he If you look at the actual cost of deploying, it's even lower than the price with the market allows them to their growing at 40 plus percent a year and a 35 $40,000,000,000 run rate company sooner, Sooner or >> later, it's gonna be a race to the lot of you >> and the only guys are gonna win. You have guys have the best cost structure. A >> couple other highlights from your talk. >> Sure, I think 2nd 2nd thing like Thio Thio, no stress is that machine learning is going to be a game is going to be a game changer for essentially everybody. And not only is it going to be autonomous vehicles. It's gonna be automatic. Check out. It's going to be drone delivery of most everything. Uh, and so you can, either. And it's gonna affect essentially everybody gonna concert of, say, categorically. Any job that is easy to understand is going to get automated. And I think that's it's gonna be majorly impactful to most everybody. So if you're in Enterprise, you have two choices. You can be a disrupt or or you could be a disruptive. And so you can either be a taxi company or you can be you over, and it's gonna be a I machine learning that's going going to be determined which side of that equation you're on. So I was a big blunder that I see people not taking ml incredibly seriously. >> Do you see that? In fact, everyone I talked who seems to be bought in that this is we've got to get on the bandwagon. Yeah, >> I'm just pointing out the obvious. Yeah, yeah, I think, But one that's not quite so obvious you're is a lot of a lot of people I talked to say, uh, I'm on top of data science. I've hired a group of of 10 data scientists, and they're doing great. And when I talked, one vignette that's kind of fun is I talked to a data scientist from iRobot, which is the guys that have the vacuum cleaner that runs around your living room. So, uh, she said, I spend 90% of my time locating the data. I want to analyze getting my hands on it and cleaning it, leaving the 10% to do data science job for which I was hired. Of the 10% I spend 90% fixing the data cleaning errors in my data so that my models work. So she spends 99% of her time on what you call data preparation 1% of her time doing the job for which he was hired. So data science is not about data science. It's about data integration, data cleaning, data, discovery. >> But your new latest venture, >> so tamer does that sort of stuff. And so that's But that's the rial data science problem. And a lot of people don't realize that yet, And, uh, you know they will. I >> want to ask you because you've been involved in this by my count and starting up at least a dozen companies. Um, 99 Okay, It's a lot. >> It's not overstated. You estimated high fall. How do you How >> do you >> decide what challenge to move on? Because they're really not. You're not solving the same problems. You're You're moving on to new problems. How do you decide? What's the next thing that interests you? Enough to actually start a company. Okay, >> that's really easy. You know, I'm on the faculty of M i t. My job is to think of news new ship and investigate it, and I come up. No, I'm paid to come up with new ideas, some of which have commercial value, some of which don't and the ones that have commercial value, like, commercialized on. So it's whatever I'm doing at the time on. And that's why all the things I've commercialized, you're different >> s so going back to tamer data integration platform is a lot of companies out there claim to do it day to get integration right now. What did you see? What? That was the deficit in the market that you could address. >> Okay, great question. So there's the traditional data. Integration is extract transforming load systems and so called Master Data management systems brought to you by IBM in from Attica. Talent that class of folks. So a dirty little secret is that that technology does not scale Okay, in the following sense that it's all well, e t l doesn't scale for a different reason with an m d l e t l doesn't scale because e t. L is based on the premise that somebody really smart comes up with a global data model For all the data sources you want put together. You then send a human out to interview each business unit to figure out exactly what data they've got and then how to transform it into the global data model. How to load it into your data warehouse. That's very human intensive. And it doesn't scale because it's so human intensive. So I've never talked to a data warehouse operator who who says I integrate the average I talk to says they they integrate less than 10 data sources. Some people 20. If you twist my arm hard, I'll give you 50. So a Here. Here's a real world problem, which is Toyota Motor Europe. I want you right now. They have a distributor in Spain, another distributor in France. They have a country by country distributor, sometimes canton by Canton. Distribute distribution. So if you buy a Toyota and Spain and move to France, Toyota develops amnesia. The French French guys know nothing about you. So they've got 250 separate customer databases with 40,000,000 total records in 50 languages. And they're in the process of integrating that. It was single customer database so that they can Duke custom. They could do the customer service we expect when you cross cross and you boundary. I've never seen an e t l system capable of dealing with that kind of scale. E t l dozen scale to this level of problem. >> So how do you solve that problem? >> I'll tell you that they're a tamer customer. I'll tell you all about it. Let me first tell you why MGM doesn't scare. >> Okay. Great. >> So e t l says I now have all your data in one place in the same format, but now you've got following problems. You've got a d duplicated because if if I if I bought it, I bought a Toyota in Spain, I bought another Toyota in France. I'm both databases. So if you want to avoid double counting customers, you got a dupe. Uh, you know, got Duke 30,000,000 records. And so MGM says Okay, you write some rules. It's a rule based technology. So you write a rule. That's so, for example, my favorite example of a rule. I don't know if you guys like to downhill downhill skiing, All right? I love downhill skiing. So ski areas, Aaron, all kinds of public databases assemble those all together. Now you gotta figure out which ones are the same the same ski area, and they're called different names in different addresses and so forth. However, a vertical drop from bottom to the top is the same. Chances are they're the same ski area. So that's a rule that says how to how to put how to put data together in clusters. And so I now have a cluster for mount sanity, and I have a problem which is, uh, one address says something rather another address as something else. Which one is right or both? Right, so now you want. Now you have a gold. Let's call the golden Record problem to basically decide which, which, which data elements among a variety that maybe all associated with the same entity are in fact correct. So again, MDM, that's a rule's a rule based system. So it's a rule based technology and rule systems don't scale the best example I can give you for why Rules systems don't scale. His tamer has another customer. General Electric probably heard of them, and G wanted to do spend analytics, and so they had 20,000,000 spend transactions. Frank the year before last and spend transaction is I paid $12 to take a cab from here here to the airport, and I charged it to cost center X Y Z 20,000,000 of those so G has a pre built classification system for spend, so they have parts and underneath parts or computers underneath computers and memory and so forth. So pre existing preexisting class classifications for spend they want to simply classified 20,000,000 spent transactions into this pre existing hierarchy. So the traditional technology is, well, let's write some rules. So G wrote 500 rules, which is about the most any single human I can get there, their arms around so that classified 2,000,000 of the 20,000,000 transactions. You've now got 18 to go and another 500 rules is not going to give you 2,000,000 more. It's gonna give you love diminishing returns, right? So you have to write a huge number of rules and no one can possibly understand. So the technology simply doesn't scale, right? So in the case of G, uh, they had tamer health. Um, solve this. Solved this classification problem. Tamer used their 2,000,000 rule based, uh, tag records as training data. They used an ML model, then work off the training data classifies remaining 18,000,000. So the answer is machine learning. If you don't use machine learning, you're absolutely toast. So the answer to MDM the answer to MGM doesn't scale. You've got to use them. L The answer to each yell doesn't scale. You gotta You're putting together disparate records can. The answer is ml So you've got to replace humans by machine learning. And so that's that seems, at least in this conference, that seems to be resonating, which is people are understanding that at scale tradition, traditional data integration, technology's just don't work >> well and you got you got a great shot out on yesterday from the former G S K Mark Grams, a leader Mark Ramsay. Exactly. Guys. And how they solve their problem. He basically laid it out. BTW didn't work and GM didn't work, All right. I mean, kick it, kick the can top down data modelling, didn't work, kicked the candid governance That's not going to solve the problem. And But Tamer did, along with some other tooling. Obviously, of course, >> the Well, the other thing is No. One technology. There's no silver bullet here. It's going to be a bunch of technologies working together, right? Mark Ramsay is a great example. He used his stream sets and a bunch of other a bunch of other startup technology operating together and that traditional guys >> Okay, we're good >> question. I want to show we have time. >> So with traditional vendors by and large or 10 years behind the times, And if you want cutting edge stuff, you've got to go to start ups. >> I want to jump. It's a different topic, but I know that you in the past were critic of know of the no sequel movement, and no sequel isn't going away. It seems to be a uh uh, it seems to be actually gaining steam right now. What what are the flaws in no sequel? It has your opinion changed >> all? No. So so no sequel originally meant no sequel. Don't use it then. Then the marketing message changed to not only sequel, So sequel is fine, but no sequel does others. >> Now it's all sequel, right? >> And my point of view is now. No sequel means not yet sequel because high level language, high level data languages, air good. Mongo is inventing one Cassandra's inventing one. Those unless you squint, look like sequel. And so I think the answer is no sequel. Guys are drifting towards sequel. Meanwhile, Jason is That's a great idea. If you've got your regular data sequel, guys were saying, Sure, let's have Jason is the data type, and I think the only place where this a fair amount of argument is schema later versus schema first, and I pretty much think schema later is a bad idea because schema later really means you're creating a data swamp exactly on. So if you >> have to fix it and then you get a feel of >> salary, so you're storing employees and salaries. So, Paul salaries recorded as dollars per month. Uh, Dave, salary is in euros per week with a lunch allowance minds. So if you if you don't, If you don't deal with irregularities up front on data that you care about, you're gonna create a mess. >> No scheme on right. Was convenient of larger store, a lot of data cheaply. But then what? Hard to get value out of it created. >> So So I think the I'm not opposed to scheme later. As long as you realize that you were kicking the can down the road and you're just you're just going to give your successor a big mess. >> Yeah, right. Michael, we gotta jump. But thank you so much. Sure appreciate it. All right. Keep it right there, everybody. We'll be back with our next guest right into the short break. You watching the cue from M i t cdo Ike, you right back

Published Date : Aug 1 2019

SUMMARY :

Brought to you by We kind of gather here in August that the CDO conference You're always the highlight of the so the audience could relate to the blunders about most. physics, laws of economics, laws of the land that suggest maybe you So he claims that So can I just stop you there for a second? And so you know the and my point about the operating margins is difference in price and cost. You have guys have the best cost structure. And so you can either be a taxi company got to get on the bandwagon. leaving the 10% to do data science job for which I was hired. But that's the rial data science problem. want to ask you because you've been involved in this by my count and starting up at least a dozen companies. How do you How You're You're moving on to new problems. No, I'm paid to come up with new ideas, s so going back to tamer data integration platform is a lot of companies out there claim to do and so called Master Data management systems brought to you by IBM I'll tell you that they're a tamer customer. So the answer to MDM the I mean, kick it, kick the can top down data modelling, It's going to be a bunch of technologies working together, I want to show we have time. and large or 10 years behind the times, And if you want cutting edge It's a different topic, but I know that you in the past were critic of know of the no sequel movement, No. So so no sequel originally meant no So if you So if you if Hard to get value out of it created. So So I think the I'm not opposed to scheme later. But thank you so much.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
James	PERSON	0.99+
Mark Ramsay	PERSON	0.99+
James Hamilton	PERSON	0.99+
Paul Galen	PERSON	0.99+
Dave DeWitt	PERSON	0.99+
Toyota	ORGANIZATION	0.99+
David Monty	PERSON	0.99+
General Electric	ORGANIZATION	0.99+
2,000,000	QUANTITY	0.99+
France	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
20,000,000	QUANTITY	0.99+
10%	QUANTITY	0.99+
Michael Stonebraker	PERSON	0.99+
Cambridge	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
50	QUANTITY	0.99+
$12	QUANTITY	0.99+
Spain	LOCATION	0.99+
18,000,000	QUANTITY	0.99+
25%	QUANTITY	0.99+
20 servers	QUANTITY	0.99+
90%	QUANTITY	0.99+
Columbia River Valley	LOCATION	0.99+
99%	QUANTITY	0.99+
18	QUANTITY	0.99+
Aaron	PERSON	0.99+
Dave	PERSON	0.99+
August	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
three servers	QUANTITY	0.99+
35 $40,000,000,000	QUANTITY	0.99+
50 languages	QUANTITY	0.99+
500 rules	QUANTITY	0.99+
22 things	QUANTITY	0.99+
10 data scientists	QUANTITY	0.99+
Mike Stone	PERSON	0.99+
Cambridge, Massachusetts	LOCATION	0.99+
MGM	ORGANIZATION	0.99+
less than 10 data sources	QUANTITY	0.99+
Ian	PERSON	0.99+
Paul	PERSON	0.99+
1%	QUANTITY	0.99+
both	QUANTITY	0.99+
Toyota Motor Europe	ORGANIZATION	0.99+
Of Tamer	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
single	QUANTITY	0.99+
Attica	ORGANIZATION	0.99+
10 years ago	DATE	0.99+
yesterday	DATE	0.99+
iRobot	ORGANIZATION	0.99+
Mark Grams	PERSON	0.99+
TAMR	PERSON	0.99+
10 years	QUANTITY	0.99+
20	QUANTITY	0.98+
1/4	QUANTITY	0.98+
250 separate customer databases	QUANTITY	0.98+
Cassandra	PERSON	0.98+
First thing	QUANTITY	0.98+
30,000,000 records	QUANTITY	0.98+
both databases	QUANTITY	0.98+
18 months ago	DATE	0.98+
first	QUANTITY	0.98+
M I t CDO	EVENT	0.98+
One blunder	QUANTITY	0.98+
Tamer	PERSON	0.98+
one place	QUANTITY	0.98+
second	QUANTITY	0.97+
two choices	QUANTITY	0.97+
tonight	DATE	0.97+
each business unit	QUANTITY	0.97+
Thio Thio	PERSON	0.97+
two hands	QUANTITY	0.96+
this week	DATE	0.96+
Frank	PERSON	0.95+
Duke	ORGANIZATION	0.95+

Veda Bawo, Raymond James & Althea Davis, ING Bank | MIT CDOIQ 2019

>> From Cambridge Massachusetts, it's the CUBE, covering MIT Chief Data Officer and Information Quality Symposium 2019. Brought to you by silicon angle media. >> Welcome back to Cambridge Massachusetts everybody you're watching the cube. The leader in live tech coverage. The cubes two day coverage of MIT's CDOIQ. The chief data officer information quality event. Thirteenth year we started here in 2013. I'm Dave Vallante with my co-host Paul Gillin. Veda Bawo. Bowo. Bawo. Sorry Veda Bawo is here. Did I get that right? >> That's close enough. >> The director of data governance at Raymond James and Althea Davis the former chief data officer of ING bank challengers and growth markets. Ladies welcome to the cube thanks so much for coming on. >> Thank you. >> Thank you. >> Hi Vita, talk about your role at Raymond James. Relatively new role for you? >> It is a relatively new role. So I recently left fifth third bank as their managing director of data governance and I've moved on to Raymond James in sunny Florida. And I am now the director of data governance for Raymond James. So it's a global financial services company they do asset wealth management, investment banking, retail banking. So I'm excited, I'm very excited about it. >> So we've been talking all day and actually several years about how the chief data officer role kind of emerged from the back office of the data governance. >> Mmm >> And the information quality and now its come you know front and center. And actually we've seen a full circle because now it's all about data quality again. So Althea as the former CDO right is that a fair assessment that it sort of came out of the ashes of the back room. >> Yeah, I mean its definitely a fair assessment. That's where we got started. That's how we got our budgets that's how we got our teams. However, now we have to serve many masters. We have to deal with all of the privacy, we have to deal with the multiple compliancies. We have to deal with the data operations and we have to deal with all of the new, sexy emerging technologies. So to do AI and data science you need a lot of data. You need data rich. You need it to be knowledge management, you need it to be information management. And it needs to be intelligent. So we need to actually raise the bar on what we do and at the same time get the credibility from our sea sweet peers. >> Well I think we no longer have the. We don't have the luxury of being just a cost center anymore . >> No. >> Right, we have to generate revenue. So it's about data monetization. It's about partnering with our businesses to make sure that we're helping to drive strategy and deliver results for the broader organization. >> So you got to hit the bottom line. >> Yeah. >> Either raise revenue or cut costs >> Yeah absolutely >> You know directly that can be tangibly monetized. >> Exactly keep them out of jail. Right. Save money >> That too. >> Save money, make money. (inaudible laughter) keep them out of jail. >> Like both CDO's you do not study for this career path because it didn't exist a few years ago. So talk about your backgrounds and how you came to come into this role Veda. >> Yeah absolutely so you know you talked about you know data kind of starting in the bowels of the back office. So I am that person right. So I am an accountant by training. So I am the person who is non legally entity controllership by book journal entries I've closed the books. I've done regulatory reporting so I know what it feels like to have to deal with dirty data every single month end, every single quarter end right. And I know the pain of having to cleanse it and having to deal with our business partners and having experienced that gave me the passion to want to do better. Right so I want to influence my partners upstream to do better as well as to take away some of the pain points that my teams experiencing over and over again it really was groundhog day. So that really made me feel passionate about going into the data discipline. Right and so you know the benefit is great it's not an easy journey but yeah out of accounting finance and that kind of back office operational support was boring right. A data evangelist and some passionate were about it. >> Which made sense because you have to have quality. >> Absolutely. >> Consistency. You have to have so called single version of the truth. >> Absolutely because you look regularly there's light for the financial reports to be accurate. All the time. (laughter) >> Exactly >> How about you? >> I came at it from a totally different angle. I was a marketeer so I was a business manager, a marketeer I was working with the big retail brands you know the Nikes and the Levi's strauss's of the world. So I came to it from a value chain perspective from marketing you know from rolling out retail chains across Europe. And I went from there as a line management position and all the pains of the different types of data we needed and then did quite a bit of consulting with some of the big consultancies accenture. And then rolled more into the data migration so dealing with those huge change projects and having teams from all of the world. And knowing the pains what all of the guys didn't want to work on. I got it all on my plate. But it put me in position to be a really solid chief data officer. >> Somebody it was called like data chicks or something like that (laughter) and I snuck in I was like the lone >> Data chicks >> I was like the lone data dude >> You can be a data chick. It's okay no judgement here. >> And so one of the things that one of the CDO's said there. She was a woman obviously. And she said you know I think that and the stat was there was a higher proportion of women as CDO's than there were across tech which is like I don't know fifty seventeen percent. And she's positive that the reason was because it's like a thankless job that nobody wants and so I just wonder as woman CDO your thoughts on that is that true. >> Well first of all we're the newest to the table right so you're the new kid on the block it doesn't matter if you're man or woman you're the new kid on the block so you know the CFO's got the four thousand year history behind him or her. The CIO or CTO they've got the fifty, sixty year up on us. So we're new. So you have to calve out your space and I do think that a lot of women by nature like to take on things big. To do things that other people don't want to do. So I can see how women kind of fell into that. But, at the same time you know data it's an asset and it is the newest asset. And it's definitely misunderstood. So I do think that you know women you know we kind of fell into it but it was actually something that happened good for women because there's a big future in data. >> Well let's just be realistic right. Woman have unique skillset. I may be a little bias but we have a unique skillset. We're able to solve problems creatively. Right there's no one size fits all solution for data. There's no accounting pronouncement that tells me how to handle and manage my data. Right I have to kind of figure it out as I go along and pivot when something doesn't work. I think that's something that is very natural to women. >> Yeah. >> I think that contributes to us kind of taking on these roles. >> Can I just do a little survey here (laughter) We hear that the chief data officer of function is defined differently at different organizations. Now you both are in financial services. You both have a chief data function. Are you doing the same thing? (laughter) >> Absolutely not! (laughter) >> You know this is data by design. I mean I'm getting lucky I've had teams that go the whole gammon right so. From the compliancy side through to the data operations through to all of the like I said the exotics, sexy you know emerging technologies stuff with the data scientists. So I've had the whole thing. I've also had my last position at ING bank I had to you know lead a team of chief data officers across three different continents Australia, Asia and also Eastern and Western Europe. So it's totally different than you know maybe another company that they've only got to chief data officer working on data quality and data governance. >> So again another challenge of being the new kid on the block right. Defining roles and responsibilities. There's no one globally, universally accepted definition of what a chief data officer should do. >> Right >> Right is data science in or out are analytics in or out. Right. >> Security sometimes. >> Security right sometimes privacy is it or out. Do you have operational responsibilities or are you truly just a second line governance function right? There's a mixed bag out there in the industry. I don't know that we have one answer that we know for sure is true. But I do know for sure is that data is not an IT function. >> Well okay. That's really important. >> It's not an IT asset. >> Yeah. >> I want to say that it's not an IT asset. It is an information asset or a data asset which is a different asset than an IT asset or a financial asset or a human asset. >> But and that's the other big change is that fifteen. Ten to fifteen years ago data was assumed to be a liability right. >> Mmm. >> Federal rules set up a civil procedure we got to get rid of the data or you know we're going to get sued. Number one and number two is that data because it's digital you know people say data is the new oil. I always say it's not. It's more important than oil. >> It's like blood. >> Oil you can only use in one use case. Data you can reuse over and over again. >> Reuse, reuse perpetual. It goes on and on and on. And every time you reuse it the value increases. So I would agree with you it is not the new oil. It is much bigger than that and it needs to I mean I know from some of my colleagues in the profession. We talk about borrowing from other more mature disciplines to make data management, information management and knowledge management much more robust and be much more professional. We also need to be more professional about it as the data leaders. >> So when you're a little panel today. One of the things that you guys addressed is what keeps the CDO up at night. >> Yes >> I presume it's data. (laughter) >> No, no, no. >> It's our payers that don't get it. (laughter) >> That's what keeps us up at night. >> Its the sponsors that keep us up at night. (laughter) So what was that discussion like? >> So yeah I mean it was a lively discussion. Um, great attendance at the panel so we appreciate everyone who came out and supported. >> Full house. >> Definitely a full house. Great reviews so far. >> Yep. >> Okay, so the thing that definitely keeps folks up at night and I'm going to start with my standard one which is quality. Right you can have all of the fancy tools, right you can have a million data scientists but if the quality is not good or sufficient. Then you're no where. So quality is fundamentally the thing that the CDO has to always pay attention to. And there's no magic you know pill or magic right potion that's going to make the quality right. It's something that the entire organization has a rally around. And it's not a one thing done right it has to be a sustainable approach to making sure the quality is good enough so that you can actually reap the benefits or derive the value right from your data. >> Absolutely and I would say you know following on from the quality and I consider that trustworthiness of the data. I would say as a chief data officer you're coming to the table. You're coming to the executive table you need to bring it all so you need to be impactful. You need to be absolutely relevant to your peers. You also need to be able to make their teams in a position to act. So it needs to be actionable. And if you don't have all of that combination with the trustworthiness you're dead in the water. So it is a hard act and that's why there is a high attrition for chief data officers. You know it's a hard job. But I think it's very much worthwhile because this particular asset this new asset we haven't been able to even scratch the surface of what it could mean for us a society and for commercial organizations or government organizations. >> To your point it's not a technology problem when Mark Ramsay who was surveying the audience this morning. He said you know why have we had so many failures and the first hand that went up said. It's because of relations with the database. >> And I wanted to say it's not a technology problem. >> It's a hearts, minds and haves >> Absolutely. Absolutely. You couldn't make an impact to your data landscape without changing your technology. >> You said at the outset how important it is for you to show a bottom line impact. >> Right >> What's one project you've worked on or that you've led in your tenure that did that. >> If we're talking about for example I can't say specifics but if we're looking at one of institutions I worked at in an insurance firm and we looked at the customer journey. So we worked with some of the different departments that traditionally did not get access to data for them to be able to be effective at their jobs. But they wanted to do in marketing was create actually new products to make you know increase the wallet from the existing customers other things they wanted to do was for example, when there were problems with the customers instead of customer you know leaving you know the journey they were able to bring them back in by getting access to the data. So we either gave them insight like you know looking back to make sure that things didn't happen wrong the next time or we helped them giving them information so they could develop new products so this is all about going to market. So that's absolutely bottom line. It's not just all cost efficiency and products to begin . >> Yeah pipeline. (laughter) >> And that's really valid but you know. >> Absolutely so I'll give you one example where the data organization partnered with our data scientists. To try to figure out the best location for various branches. For that particular institution. And it was taking right trillions of data points right about current footprint as well as other information about geographic information that was out there publicly available. Taking that and using the analytics to figure out okay where should we have our branches, our ATM's etc... and then conslidating the footprint or expanding where appropriate. So that is bottom line impact for sure. >> I remember in the early part of the two thousands I remember reading a Harvard business review article about gut feel trumps data every time. But that's an example where no way. >> Nope. >> You could never do better with the gut than that example that you just gave. >> Absolutely. >> Veda. I want to ask you a question. I don't know if you've heard Mark Ramsays talk this morning but he sort of. He sort of declared that data governance was over. >> Mmm. >> And as the director of data governance >> Never! >> I wondered if you would disagree with that. >> Never! >> Look. >> Were you surprised? >> It's just like saying that I should stop brushing my teeth. Right I always will have to maintain a certain level of data hygiene. And I don't think that employees and executives and organizations have reached a level of maturity where I can trust them to maintain that level of hygiene independently. And therefore I need a governance function. I need to check to make sure you brush your teeth in the morning and in the evening. Right and I need you to go for your annual exam to make sure you don't have any cavities that weren't detected. Right so I think that there's still a role for governance to play. It will evolve over time for sure. Right as you know the landscape changes but I think there's still a role right for like governance. >> And that wasn't my takeaway part. I think he said that basically enterprise data warehouse fail massive data management fail. The single data model failed so we punted to governance and that's not going to solve the enterprise data problem. >> I think it's a one leg in the stool. It's one leg in the stool. ` >> Yeah I think I would really sum it up as a monolithic data storage approach failed. Like that. And then our attention went to data governance but that's not going to solve it either. Look, data management is about twelve different data capabilties it's a discipline so we give the title data governance but it means multiple things. And I think that if we're more educated and we have more confidence on what we're doing on those different areas. Plus information and knowledge management then we're way ahead of the game. I mean knowledge graphs and semantics. That puts companies you know at the top of that you know corporate inequality gap that we're looking at right now. Where you know companies are you know five and thousand times more valuable then their competition and the gap is just going to get bigger considering if some of those companies at the bottom of the gap are you know just keep on doing the same thing. >> I agree I was just trying to get you worked up. (laughter) >> Well you did. >> It's going to be a different kind of show. >> But that point you're making. Microsoft, Apple, Amazon and Google, Facebook. Top five companies in terms of market cap. And they're all data companies. They surpass all the financial services, all the energy companies, all the manufacturers. >> And Alibaba same thing. >> Oh yeah. >> They're doing the same thing. >> They're coming right up there. With four or five hundred billion. >> They're all doing the knowledge approach. They're doing all of this stuff and that's a much more comprehensive approach to looking at it as a full spectrum and if we keep on in the financial industry or any industry keep on just kind of looking at little bits and pieces. It's not going to work. It's a lot of talk but there's no action. >> We are losing right. I know that Fintechs are right fringing upon are territory. Right if Amazon can provide a credit card or lend you money or extend you credit. They're now functioning as a traditional bank would. If we're not paying attention to them as real competitors. We've lost the battle. >> That's a really important point you're making because it's all digital now. >> Absolutely. >> You used to be you'd never see companies traverse industries and now you see it Apple pay and Amazon and healthcare. >> Yeah. >> And government organizations teaming up with corporations and individuals. Everything is free flowing so that means the knowledge and the data and the information also needs to flow freely but it needs to be managed. >> Now you're into a whole realm of privacy and security. >> And regulations right. Regulations for the non right traditional banks. So we're doing banking transactions. >> Do you think traditional banks will lose control over the payment systems? >> If they don't move with the time they will. If they don't. I mean it's not something that's going to happen tomorrow but you know there is a category of bank called Challenger banks so there's a reason. You know even within their own niche there's a group of banks. >> I mean not even just payments right. Think about cash transactions like if I do money transfer am I going to my traditional bank to do it or am I going to cashapp. >> I think it's interesting particularly in the retail banking business where you know one banking app looks pretty much like other and people don't go to branches anymore and so that brand affinity that used to exist is harder and harder to maintain and I wonder what role does data play in reestablishing that connection. >> Well for me right I get really excited and sometimes annoyed when I can open up my app for my bank and I can see the pie chart of my spending. They're using my data to inform me about my behaviors sometimes a good story, sometimes a bad story. But they're using it to inform me. That's making me more loyal to that particular institution right so I can also link all of my financial accounts in that one institutions app and I can see a full list of all of my credit cards, all of my loans, all of my investments in one stop shopping. That's making me go to their app more often versus the other options that are out there. So I think we can use the data in order to endear the customer source but we have to be smart about it. >> That's the accountant in you. I just refuse to not look. (laughter) >> You can afford to not look. I can't. >> Thank you. >> Thanks for riling us up. >> Alright thank you for watching everybody we'll be right back with our next guest right after this short break. You're watching the cube from MIT in Boston, Cambridge. Right back. (atmospheric music)

Published Date : Jul 31 2019

SUMMARY :

Brought to you by silicon angle media. Did I get that right? and Althea Davis the former chief data officer Hi Vita, talk about your role at Raymond James. And I am now the director of data of the data governance. So Althea as the former CDO right is that So to do AI and data science you need a lot of data. We don't have the luxury of being and deliver results for the broader organization. Right. keep them out of jail. you came to come into this role Veda. And I know the pain of having to cleanse it You have to have so called single version of the truth. light for the financial reports to be accurate. So I came to it from a value chain perspective You can be a data chick. And she's positive that the reason was because But, at the same time you know data it's an asset Right I have to kind of figure it out as I go along I think that contributes to us kind of We hear that the chief data officer of function I had to you know lead a team of chief data officers the new kid on the block right. Right is data science in or out are I don't know that we have one answer that we know That's really important. I want to say that it's not an IT asset. But and that's the other big change is that fifteen. we got to get rid of the data or you know Data you can reuse over and over again. So I would agree with you it is not the new oil. One of the things that you guys addressed I presume it's data. It's our payers that don't get it. Its the sponsors that keep us up at night. Um, great attendance at the panel so we appreciate Great reviews so far. the thing that the CDO has to always pay attention to. So it needs to be actionable. and the first hand that went up said. You couldn't make an impact to your data it is for you to show a bottom line impact. or that you've led in your tenure that did that. actually new products to make you know increase (laughter) Absolutely so I'll give you one example I remember in the early part of the two thousands than that example that you just gave. He sort of declared that data governance was over. I need to check to make sure you brush your and that's not going to solve the enterprise data problem. It's one leg in the stool. and the gap is just going to get bigger considering I agree I was just trying to get you worked up. all the energy companies, all the manufacturers. They're coming right up there. It's not going to work. I know that Fintechs are right fringing upon are territory. That's a really important point you're industries and now you see it and the data and the information also needs to Regulations for the non right traditional banks. I mean it's not something that's going to happen tomorrow am I going to my traditional bank to do it banking business where you know one banking app looks and I can see the pie chart of my spending. I just refuse to not look. You can afford to not look. Alright thank you for watching everybody we'll

ENTITIES

Entity	Category	Confidence
Mark Ramsay	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Veda Bawo	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Dave Vallante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2013	DATE	0.99+
Europe	LOCATION	0.99+
ING Bank	ORGANIZATION	0.99+
Vita	PERSON	0.99+
five	QUANTITY	0.99+
fifty	QUANTITY	0.99+
Nikes	ORGANIZATION	0.99+
four	QUANTITY	0.99+
ING	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
Raymond James	ORGANIZATION	0.99+
Althea Davis	PERSON	0.99+
fifty seventeen percent	QUANTITY	0.99+
both	QUANTITY	0.99+
two day	QUANTITY	0.99+
Levi's	ORGANIZATION	0.99+
one leg	QUANTITY	0.99+
Mark Ramsays	PERSON	0.99+
tomorrow	DATE	0.99+
four thousand year	QUANTITY	0.99+
Australia	LOCATION	0.99+
two thousands	QUANTITY	0.99+
five hundred billion	QUANTITY	0.98+
Cambridge Massachusetts	LOCATION	0.98+
Asia	LOCATION	0.98+
Bawo	PERSON	0.98+
today	DATE	0.98+
Veda	PERSON	0.98+
single	QUANTITY	0.97+
Western Europe	LOCATION	0.97+
one example	QUANTITY	0.97+
One	QUANTITY	0.97+
Eastern	LOCATION	0.97+
one	QUANTITY	0.97+
Ten	DATE	0.97+
Boston, Cambridge	LOCATION	0.97+
Thirteenth year	QUANTITY	0.97+
fifteen	DATE	0.96+
second line	QUANTITY	0.96+
a million data scientists	QUANTITY	0.95+
Raymond James	PERSON	0.95+
five companies	QUANTITY	0.95+
fifteen years ago	DATE	0.94+
fifth third bank	QUANTITY	0.94+
Bowo	PERSON	0.94+
sixty year	QUANTITY	0.93+
trillions of data points	QUANTITY	0.92+
ING bank	ORGANIZATION	0.92+
one use case	QUANTITY	0.91+
one project	QUANTITY	0.91+
this morning	DATE	0.91+
first	QUANTITY	0.9+
thousand times	QUANTITY	0.9+
2019	DATE	0.89+
one answer	QUANTITY	0.87+
Althea	PERSON	0.86+
Challenger	ORGANIZATION	0.85+
one banking app	QUANTITY	0.84+
MIT Chief Data Officer and	EVENT	0.83+
three different continents	QUANTITY	0.82+
few years ago	DATE	0.81+
this morning	DATE	0.8+
single version	QUANTITY	0.78+
number two	QUANTITY	0.76+
Information Quality Symposium 2019	EVENT	0.75+
Harvard	ORGANIZATION	0.73+
pay	TITLE	0.72+
sunny Florida	LOCATION	0.7+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Ramsay: