Lukas Heinrich & Ricardo Rocha, CERN | KubeCon + CloudNativeCon EU 2019
>> Live from Barcelona, Spain, it's theCUBE, covering KubeCon + CloudNativeCon Europe 2019. Brought to you by Red Hat, the Cloud Native Computing Foundation and Ecosystem Partners. >> Welcome back to theCUBE, here at KubeCon CloudNativeCon 2019 in Barcelona, Spain. I'm Stu Miniman. My co-host is Corey Quinn and we're thrilled to welcome to the program two gentlemen from CERN. Of course, CERN needs no introduction. We're going to talk some science, going to talk some tech. To my right here is Ricardo Rocha, who is the computer engineer, and Lukas Heinrich, who's a physicist. So Lukas, let's start with you, you know, if you were a traditional enterprise, we'd talk about your business, but talk about your projects, your applications. What piece of, you know, fantastic science is your team working on? >> All right, so I work on an experiment that is situated with the Large Hadron Collider, so it's a particle accelerator experiments where we accelerate protons, which are hydrogen nuclei, to a very high energy, so that they almost go with the speed of light. And so, we have a large tunnel underground, 100 meters underground in Geneva, so straddling the border of France and Switzerland. And there, we're accelerating two beams. One is going clockwise. The other one is going counterclockwise, and there, we collide them. And so, I work on an experiment that kind of looks at these collisions and then analyzes this data. >> Lukas, if I can, you know, when you talk to most companies, you talk about scale, you talk about latency, you talk about performance. Those have real-world implications for your world. Do you have anything you could share there? >> Yeah, so, one of the main things that we need to do, so we collide 40 million times a second these protons, and we need to analyze them in real time, because we cannot write out all the collision data to disk because we don't have enough disk space, and so we've essentially run 10,000 core real-time application to analyze this data in real-time and see what collisions are actually most interesting, and then only those get written out to disk, so this is a system that I work on called The Trigger, and yeah, that's pretty dependent on latency. >> All right, Ricardo, luckily you know, your job's easy. We say most people you need to respond, you know, to what the business needs for you and, you know, don't worry, you can't go against the laws of physics. Well, you're working on physics here, and boy those are some hefty requirements there. Talk a little bit about that dynamic and how your team has to deal with some pretty tough challenges. >> Right, so, as Lukas was saying, we have this large amount of data. The machines can generate something around the order of a petabyte a second, and then, thanks to their hardware- and software-level triggers, they will reduce this to something that is 10 gigabytes a second, and that's what my side has to handle. So, it's still a lot of data. We are collecting something like 70 petabytes a year, and we keep adding, so right now we have, the amount of storage available is on the order of 400 petabytes. We're starting to get at a pretty large scale. And then we have to analyze all of this. So we have one big data center at CERN, which is 300,000 cores, or something like this, around that, but that's not enough, so what we've done over the last 15, 20 years, we've created this large distributed computing environment around the world. We link to many different institutes and research labs together, and this doubles our capacity. So that's our challenge, is to make sure all the effort that the physicists put into building this large machine, that, in the end, it's not the computing that is breaking the world system. We have to keep up, yup. >> One thing that I always find fascinating is people who are dealing with real problems that push our conception of what scale starts to look like, and when you're talking about things like a petabyte a second, that's beyond the comprehension of what most of us can wind up talking about. One problem that I've seen historically with a number of different infrastructure approaches is it requires a fair level of complexity to go from this problem to this problem to this problem, and you have to wind up working through a bunch of layers of abstraction, and the end result is, and at the end of all of this we can run our blog that gets eight visits a day, and that just doesn't seem to make sense. Whereas what you're talking about, that level of complexity is more than justified. So my question for you is, as you start seeing these things evolve and looking at other best practices and guidance from folks who are doing far less data-intensive applications, are you seeing that a lot of the best practices start to fall down as you're pushing theoretical boundaries of scale? >> Right, that's actually a good point. Like, the physicists are very good at getting things done, and they don't worry that much about the process, as long as in the end it works. But there's always this kind of split between the physicists and the more computing engineer where the practices, we want to establish practices, but at the end of the day, we have a large machine that has to work, so sometimes we skip a couple of steps, but we still need, there's still quite a lot of control on like data quality and the software validation and all of this. But yeah, it's a non-traditional environment in terms of IT, I would say. It's much more fast pacing than most traditional companies. >> You mentioned you had how many cores working on these problems on site? >> So in-house, we have 300,000. >> If you were to do a full migration to the public cloud, you'd almost have to repurpose that many cores just to calculating out the bill at that point. Just, because all the different dimensions, everything winds working on at that scale becomes almost completely non-trivial. I don't often say that I'm not sure public cloud can scale to the level that someone would need to. In your case, that becomes a very real concern. >> Yeah, so that's one debate we are having now, and it's, it has a lot of advantages to have the computing in-house, and also because we pretty much use it 24/7, it's a very different type of workload. So we need a lot of resources 24/7, like even the pricing is kind of calculated differently. But the issue we have now is that the accelerator will go through a major upgrade just in five years' time, where we will increase the amount of data by 100 times. Now we are talking about 70 petabytes a year and we're very soon talking about like exabytes. So the amount of computing we'll need there is just going to explode, so we need all the options. We're looking into GPUs and machine learning to change how we do computing, and we are looking at any kind of additional resources we might get, and there the public cloud will probably play a role. >> Could you speak to kind of the dynamic of how something like an upgrade of that, you know, how do you work together? I can't imagine that you just say, "Well, we built it, "whatever we needed and everything, and, you know, "throw it over the wall and make sure it works." >> Right, I mean, so I work a lot on this boundary between computing and physics, and so internally, I think we also go through the same processes as a lot of companies, that we're trying to educate people on the physics side how to go through the best practices, because it's also important. So one thing I stressed also in the keynote is this idea of reproducibility and reusability of scientific software is pretty important, so we teach people to containerize their applications and then make them reusable and stuff like that, yup. >> Anything about that relationship you can expound on? >> Yeah, so like this keynote we had yesterday is a perfect example of how this is improving a lot at CERN. We were actually using data from CMS, which was one of the experiments. Lukas is a physicist in ATLAS, which is like a computing experiment, kind of. I'm in IT, and like all this containerized infrastructure kind of is getting us all together because computing is getting much easier in terms of how to share pieces of software and even infrastructure, and this helps us a lot internally also. >> So what particular about Kubernetes helps your environment? You talk for 15 years that you've been on this distributed systems build-out, so sounds like you were the hipsters when it came to some of these solutions we're working on today. >> That has been like a major change. Lukas mentioned the container part for the software reproducibility, but I have been working on the infrastructure for, I joined CERN as a student and I've been working on the distributed infrastructure for many years, and we basically had to write our own tools, like storage systems, all the batch systems, over the years, and suddenly with this public cloud explosion and open source usage, we can just go and join communities that have requirements sometimes that are higher than ours and we can focus really on the application development. If we base, if we start writing software using Kubernetes, then not only we get this flexibility of choosing different public clouds or different infrastructures, but also we don't have to care so much about the core infrastructure, all the monitoring, log collection, restarting. Kubernetes is very important for us in this respect. We kind of remove a lot of the software we were depending on for many years. >> So these days, as you look at this build-out and what you're looking, not just what you're doing today but what you're looking to build in the upcoming years, are you viewing containers as the fundamental primitive of what empowers this? Are you looking at virtual machines as that primitive? Are you looking at functions? Where exactly do you draw the abstraction layer, as you start building this architecture? >> So, yeah, traditionally we've been using virtual machines for like the last maybe 10 years almost, or, I don't know, eight years at least, and we see containerization happening very quickly, and maybe Lukas can say a bit more about the physics, how this is important on the physics side? >> Yeah, what's been, so currently I think we are looking at containers for the main abstraction because it's also we go through things like functions as a service. What's kind of special about scientific applications is that we don't usually just have our entire code base on one software stack, right? It's not like we would deploy Node.js application or Python stack and that's it. And so, sometimes you have a complete mix between C++, Python, Fortran, and all that stuff. So this idea that we can build the entire software stack as we want it is pretty important. So even for functions as a service where, traditionally, you had just a limited choice of runtimes, this becomes important. >> Like, from our side, the virtual machines still had a very complex setup to be able to support all this diversity of software and the containerization, just all the people have to give us is like run this building block and it's kind of a standard interface, so we only have to build the infrastructure to be able to handle these pieces. >> Well, I don't think anyone can dispute that you folks are experts in taking larger things and breaking them down into constituent components thereof. I mean, you are, quite obviously, the leading world experts on that. But was there any challenge to you as you went through that process of, I don't necessarily even want to say modernizing, but in changing your viewpoint of those primitives as you've evolved, have you seen that there were challenges in gaining buy-in throughout the organization? Was there pushback? Was it culturally painful to wind up moving away from the virtual machine approach into a containerized world? >> Right, so yeah, a bit, of course. But traditionally we, like physicists really focus on their end goal. We often say that we don't count how many cores or whatever, we care about events per second, how many events we can process per second. So, it's a kind of more open-minded community maybe than traditional IT, so we don't care so much about which technology we use at some point, as long as the job gets done. So, yeah, there's a bit of traction sometimes, but there's also a push when you can demonstrate that we get a clear benefit, then it's kind of easier to push it. >> What's a little bit special maybe also for particle physics is that it's not only CERN that is the researcher. We are an international collaboration of many, many institutes all around the world that work on the same project, which is just hosted at CERN, and so it's a very flat hierarchy and people do have the freedom to try out things and so it's not like we have a top-down mandate what technology we use. And then somebody tries something out. If it works and people see a value in it then you get adoption from it. >> The collaboration with the data volumes you're talking about as well has got to be intense. I think you're a little bit beyond the, okay, we ran the experiment, we put the data in Dropbox, go ahead and download it, you'll get that in only 18 short years. It seems like there's absolutely a challenge in that. >> That was one of the key points actually in the keynote is that, so a lot of the experiments at CERN have an open data policy where we release our data, and so that's great because we think it's important for open science, but it was always a bit of an issue, like who can actually practically analyze this data for people who don't have a data center? And so one part of the keynote was that we could demonstrate that using Kubernetes and public cloud infrastructure actually becomes possible for people who don't work at CERN to analyze this large-scale scientific data sets. >> Yeah, I mean maybe just for our audience, the punchline is rediscovering the Higgs boson in the public cloud. Maybe just give our audience a little bit of taste of that. >> Right, yeah, so basically what we did is, so the Higgs boson was discovered in 2012 by both ATLAS and CMS, and a part of that data, we used open data from CMS and part of that data has now been released publicly, and basically this was a 70-terabyte data set which we, thanks to our Google Cloud partners, could put onto public cloud infrastructure and then we analyzed it on a large-scale Kubernetes cluster, and-- >> The main challenge there was that, like, we publish it and we say you probably need a month to process it, but we had like 20 minutes on the keynote, so we kind of needed a bit larger infrastructure than usual to run it down to five minutes or less. In the end, it all worked out, but that was a bit of a challenge. >> How are you approaching, I guess, making this more accessible to more people? By which I mean, not just other research institutions scattered around the world, but students, individual students, sometimes in emerging economies, where they don't have access to the kinds of resources that many of us take for granted, particularly work for a prestigious research institutions? What are you doing to make this more accessible to high school kids, for example, folks who are just dipping their toes into a world they find fascinating? >> We have entire programs, outreach programs that go to high schools. I've been doing this when I was a student in Germany. We would go to high schools and we would host workshops and people would analyze a lot of this data themselves on their computers. So we would come with a USB stick that have data on them, and they could analyze it. And so part of also the open data strategy from ATLAS is to use that open data for educational purposes. And then there are also programs in emerging countries. >> Lukas and Ricardo, really appreciate you sharing the open data, open science mission that you have with our audience. Thank you so much for joining us. >> Thank you. >> Thank you. >> All right, for Corey Quinn, I'm Stu Miniman. We're in day two of two days live coverage here at KubeCon + CloudNativeCon 2019. Thank you for watching theCUBE. (upbeat music)
SUMMARY :
Brought to you by Red Hat, What piece of, you know, fantastic science and there, we collide them. to most companies, you talk about scale, Yeah, so, one of the main things that we need to do, to what the business needs for you and, you know, and we keep adding, so right now we have, and at the end of all of this we can run our blog but at the end of the day, we have a large machine Just, because all the different dimensions, But the issue we have now is that the accelerator "whatever we needed and everything, and, you know, on the physics side how to go through the best practices, Yeah, so like this keynote we had yesterday so sounds like you were the hipsters and we basically had to write our own tools, is that we don't usually just have our entire code base just all the people have to give us But was there any challenge to you We often say that we don't count how many cores and so it's not like we have a top-down mandate okay, we ran the experiment, we put the data in Dropbox, And so one part of the keynote was that we could demonstrate in the public cloud. and we say you probably need a month to process it, And so part of also the open data strategy Lukas and Ricardo, really appreciate you sharing Thank you for watching theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Ricardo Rocha | PERSON | 0.99+ |
Corey Quinn | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
CERN | ORGANIZATION | 0.99+ |
Lukas | PERSON | 0.99+ |
ATLAS | ORGANIZATION | 0.99+ |
2012 | DATE | 0.99+ |
Geneva | LOCATION | 0.99+ |
Germany | LOCATION | 0.99+ |
Ricardo | PERSON | 0.99+ |
Lukas Heinrich | PERSON | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
20 minutes | QUANTITY | 0.99+ |
Cloud Native Computing Foundation | ORGANIZATION | 0.99+ |
70-terabyte | QUANTITY | 0.99+ |
15 years | QUANTITY | 0.99+ |
300,000 cores | QUANTITY | 0.99+ |
300,000 | QUANTITY | 0.99+ |
Node.js | TITLE | 0.99+ |
70 petabytes | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
400 petabytes | QUANTITY | 0.99+ |
10,000 core | QUANTITY | 0.99+ |
Barcelona, Spain | LOCATION | 0.99+ |
100 meters | QUANTITY | 0.99+ |
eight years | QUANTITY | 0.99+ |
KubeCon | EVENT | 0.99+ |
a month | QUANTITY | 0.99+ |
100 times | QUANTITY | 0.99+ |
Switzerland | LOCATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Fortran | TITLE | 0.98+ |
yesterday | DATE | 0.98+ |
France | LOCATION | 0.98+ |
two days | QUANTITY | 0.98+ |
Ecosystem Partners | ORGANIZATION | 0.98+ |
One problem | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
five years' | QUANTITY | 0.98+ |
18 short years | QUANTITY | 0.97+ |
CMS | ORGANIZATION | 0.97+ |
two beams | QUANTITY | 0.97+ |
two gentlemen | QUANTITY | 0.96+ |
Kubernetes | TITLE | 0.96+ |
both | QUANTITY | 0.96+ |
CloudNativeCon Europe 2019 | EVENT | 0.95+ |
40 million times a second | QUANTITY | 0.95+ |
One thing | QUANTITY | 0.94+ |
eight visits a day | QUANTITY | 0.94+ |
CloudNativeCon EU 2019 | EVENT | 0.93+ |
CloudNativeCon 2019 | EVENT | 0.93+ |
C++ | TITLE | 0.93+ |
many years | QUANTITY | 0.92+ |
KubeCon CloudNativeCon 2019 | EVENT | 0.92+ |
today | DATE | 0.91+ |
one software | QUANTITY | 0.91+ |
Dropbox | ORGANIZATION | 0.89+ |
about 70 petabytes | QUANTITY | 0.86+ |
one debate | QUANTITY | 0.86+ |
10 gigabytes a second | QUANTITY | 0.85+ |
one part | QUANTITY | 0.77+ |
a year | QUANTITY | 0.75+ |
one thing | QUANTITY | 0.74+ |
a second | QUANTITY | 0.73+ |
petabyte | QUANTITY | 0.73+ |
Seth Dobrin, IBM Analytics - IBM Fast Track Your Data 2017
>> Announcer: Live from Munich, Germany; it's The Cube. Covering IBM; fast-track your data. Brought to you by IBM. (upbeat techno music) >> For you here at the show, generally; and specifically, what are you doing here today? >> There's really three things going on at the show, three high level things. One is we're talking about our new... How we're repositioning our hybrid data management portfolio, specifically some announcements around DB2 in a hybrid environment, and some highly transactional offerings around DB2. We're talking about our unified governance portfolio; so actually delivering a platform for unified governance that allows our clients to interact with governance and data management kind of products in a more streamlined way, and help them actually solve a problem instead of just offering products. The third is really around data science and machine learning. Specifically we're talking about our machine learning hub that we're launching here in Germany. Prior to this we had a machine learning hub in San Francisco, Toronto, one in Asia, and now we're launching one here in Europe. >> Seth, can you describe what this hub is all about? This is a data center where you're hosting machine learning services, or is it something else? >> Yeah, so this is where clients can come and learn how to do data science. They can bring their problems, bring their data to our facilities, learn how to solve a data science problem in a more team oriented way; interacting with data scientists, machine learning engineers, basically, data engineers, developers, to solve a problem for their business around data science. These previous hubs have been completely booked, so we wanted to launch them in other areas to try and expand the capacity of them. >> You're hosting a round table today, right, on the main tent? >> Yep. >> And you got a customer on, you guys going to be talking about sort of applying practices and financial and other areas. Maybe describe that a little bit. >> We have a customer on from ING, Heinrich, who's the chief architect for ING. ING, IBM, and Horton Works have a consortium, if you would, or a framework that we're doing around Apache Atlas and Ranger, as the kind of open-source operating system for our unified governance platform. So much as IBM has positioned Spark as a unified, kind of open-source operating system for analytics, for a unified governance platform... For a governance platform to be truly unified, you need to be able to integrate metadata. The biggest challenge about connecting your data environments, if you're an enterprise that was not internet born, or cloud born, is that you have proprietary metadata platforms that all want to be the master. When everyone wants to be the master, you can't really get anything done. So what we're doing around Apache Atlas is we are setting up Apache Atlas as kind of a virtual translator, if you would, or a dictionary between all the different proprietary metadata platforms so that you can get a single unified view of your data environment across hybrid clouds, on premise, in the cloud, and across different proprietary vendor platforms. Because it's open-sourced, there are these connectors that can go in and out of the proprietary platforms. >> So Seth, you seem like you're pretty tuned in to the portfolio within the analytics group. How are you spending your time as the Chief Data Officer? How do you balance it between customer visits, maybe talking about some of the products, and then you're sort of day job? >> I actually have three days jobs. My job's actually split into kind of three pieces. The first, my primary mission, is really around transforming IBM's internal business unit, internal business workings, to use data and analytics to run our business. So kind of internal business unit transformation. Part of that business unit transformation is also making sure that we're compliant with regulations like GDBR and other regulations. Another third is really around kind of rethinking our offerings from a CDO perspective. As a CDO, and as you, Dave, I've only been with IBM for seven months. As a former client recently, and as a CDO, what is it that I want to see from IBM's offerings? We kind of hit on it a little bit with the unified governance platform, where I think IBM makes fantastic products. But as a client, if a salesperson shows up to me, I don't want them selling me a product, 'cause if I want an MDM solution, I'll call you up and say, "Hey, I need an MDM solution. "Give me a quote." What I want them showing up is saying, "I have a solution that's going to solve "your governance problem across your portfolio." Or, "I'm going to solve your data science problem." Or, "I'm going to help you master your data, "and manage your data across "all these different environments." So really working with the offering management and the Dev teams to define what are these three or four, kind of business platforms that we want to settle on? We know three of them at least, right? We know that we have a hybrid data management. We have unified governance. We have data science and machine learning, and you could think of the Z franchise as a fourth platform. >> Seth, can you net out how governance relates to data science? 'Cause there is governance of the statistical models, machine learning, and so forth, version control. I mean, in an end to end machine learning pipeline, there's various versions of various artifacts they have to be managed in a structured way. Is your unified governance bundle, or portfolio, does it address those requirements? Or just the data governance? >> Yeah, so the unified governance platform really kind of focuses today on data governance and how good data governance can be an enabler of rapid data science. So if you have your data all pre-governed, it makes it much quicker to get access to data and understand what you can and can't do with data; especially being here in Europe, in the context of the EU GDPR. You need to make sure that your data scientists are doing things that are approved by the user, because basically your data, you have to give explicit consent to allow things to be done with it. But long term vision is that... essentially the output of models is data, right? And how you use and deploy those models also need to be governed. So the long term vision is that we will have a governance platform for all those things, as well. I think it makes more sense for those things to be governed in the data science platform, if you would. And we... >> We often hear separate from GDPR and all that, is something called algorithmic accountability; that more is being discussed in policy circles, in government circles around the world, as strongly related to everything you're describing. Being able to trace the lineage of any algorithmic decision back to the data, the metadata, and so forth, and the machine learning models that might have driven it. Is that where IBM's going with this portfolio? >> I think that's the natural extension of it. We're thinking really in the context of them as two different pieces, but if you solve them both and you connect them together, then you have that problem. But I think you're absolutely right. As we're leveraging machine learning and artificial intelligence, in general, we need to be able to understand how we got to a decision, and that includes the model, the data, how the data was gathered, how the data was used and processed. So it is that entire pipeline, 'cause it is a pipeline. You're not doing machine learning or AI in a vacuum. You're doing it in the context of the data, and you're doing it in the context about the individuals or the organizations that you're trying to influence with the output of those models. >> I call it Dev ops for data science. >> Seth, in the early Hadoop days, the real headwind was complexity. It still is, by the way. We know that. Companies like IBM are trying to reduce that complexity. Spark helps a little bit So the technology will evolve, we get that. It seems like one of the other big headwinds right now is that most companies don't have a great understanding of how they can take data and monetize it, turn it into value. Most companies, many anyway, make the mistake of, "Well, I don't really want to sell my data," or, "I'm not really a data supplier." And they're kind of thinking about it, maybe not in the right way. But we seem to be entering a next wave here, where people are beginning to understand I can cut costs, I can do predictive maintenance, I can maybe not sell the data, but I can enhance what I'm doing and increase my revenue, maybe my customer retention. They seem to be tuning, more so; largely, I think 'cause of the chief data officer roles, helping them think that through. I wonder if you would give us your point of view on that narrative. >> I think what you're describing is kind of the digital transformation journey. I think the end game, as enterprises go through a digital transformation, the end game is how do I sell services, outcomes, those types of things. How do I sell an outcome to my end user? That's really the end game of a digital transformation in my mind. But before you can get to that, before you transform your business's objectives, there's a couple of intermediary steps that are required for that. The first is what you're describing, is those kind of data transformations. Enterprises need to really get a handle on their data and become data driven, and start then transforming their current business model; so how do I accelerate my current business leveraging data and analytics? I kind of frame that, that's like the data science kind of transformation aspect of the digital journey. Then the next aspect of it is how do I transform my business and change my business objectives? Part of that first step is in fact, how do I optimize my supply chain? How do I optimize my workforce? How do I optimize my goals? How do I get to my current, you know, the things that Wall Street cares about for business; how do I accelerate those, make those faster, make those better, and really put my company out in front? 'Cause really in the grand scheme of things, there's two types of companies today; there's the company that's going to be the disruptor, and there's companies that's going to get disrupted. Most companies want to be the disruptors, and it's a process to do that. >> So the accounting industry doesn't have standards around valuing data as an asset, and many of us feel as though waiting for that is a mistake. You can't wait for that. You've got to figure out on your own. But again, it seems to be somewhat of a headwind because it puts data and data value in this fuzzy category. But there are clearly the data haves and the data have-nots. What are you seeing in that regard? >> I think the first... When I was in my former role, my former company went through an exercise of valuing our data and our decisions. I'm actually doing that same exercise at IBM right now. We're going through IBM, at least in the analytics business unit, the part I'm responsible for, and going to all the leaders and saying, "What decisions are you making?" "Help me understand the decisions that you're making." "Help me understand the data you need "to make those decisions." And that does two things. Number one, it does get to the point of, how can we value the decisions? 'Cause each one of those decisions has a specific value to the company. You can assign a dollar amount to it. But it also helps you change how people in the enterprise think. Because the first time you go through and ask these questions, they talk about the dashboards they want to help them make their preconceived decisions, validated by data. They have a preconceived notion of the decision they want to make. They want the data to back it up. So they want a dashboard to help them do that. So when you come in and start having this conversation, you kind of stop them and say, "Okay, what you're describing is a dashboard. "That's not a decision. "Let's talk about the decision that you want to make, "and let's understand the real value of that decision." So you're doing two things, you're building a portfolio of decisions that then becomes to your point, Jim, about Dev ops for data science. It's your backlog for your data scientists, in the long run. You then connect those decisions to data that's required to make those, and you can extrapolate the data for each decision to the component that each piece of data makes up to it. So you can group your data logically within an enterprise; customer, product, talent, location, things like that, and you can assign a value to those based on decisions they support. >> Jim: So... >> Dave: Go ahead, please. >> As a CDO, following on that, are you also, as part of that exercise, trying to assess the value of not just the data, but of data science as a capability? Or particular data science assets, like machine learning models? In the overall scheme of things, that kind of valuation can then drive IBM's decision to ramp up their internal data science initiatives, or redeploy it, or, give me a... >> That's exactly what happened. As you build this portfolio of decisions, each decision has a value. So I am now assigning a value to the data science models that my team will build. As CDOs, CDOs are a relatively new role in many organizations. When money gets tight, they say, "What's this guy doing?" (Dave laughing) Having a portfolio of decisions that's saying, "Here's real value I'm adding..." So, number one, "Here's the value I can add in the future," and as you check off those boxes, you can kind of go and say, "Here's value I've added. "Here's where I've changed how the company's operating. "Here's where I've generated X billions of dollars "of new revenue, or cost savings, or cost avoidance, "for the enterprise." >> When you went through these exercises at your previous company, and now at IBM, are you using standardized valuation methodologies? Did you kind of develop your own, or come up with a scoring system? How'd you do that? >> I think there's some things around, like net promoter score, where there's pretty good standards on how to assign value to increases in net promoter score, or decreases in net promoter score for certain aspects of your business. In other ways, you need to kind of decide as an enterprise, how do we value our assets? Do we use a three year, five year, ten year MPV? Do we use some other metric? You need to kind of frame it in the reference that your CFO is used to talking about so that it's in the context that the company is used to talking about. Most companies, it's net present value. >> Okay, and you're measuring that on an ongoing basis. >> Seth: Yep. >> And fine tuning as you go along. Seth, we're out of time. Thanks so much for coming back in The Cube. It was great to see you. >> Seth: Yeah, thanks for having me. >> You're welcome, good luck this afternoon. >> Seth: Alright. >> Keep it right there, buddy. We'll be back. Actually, let me run down the day here for you, just take a second to do that. We're going to end our Cube interviews for the morning, and then we're going to cut over to the main tent. So in about an hour, Rob Thomas is going to kick off the main tent here with a keynote, talking about where data goes next. Hilary Mason's going to be on. There's a session with Dez Blanchfield on data science as a team sport. Then the big session on changing regulations, GDPRs. Seth, you've got some customers that you're going to bring on and talk about these issues. And then, sort of balancing act, the balancing act of hybrid data. Then we're going to come back to The Cube and finish up our Cube interviews for the afternoon. There's also going to be two breakout sessions; one with Hilary Mason, and one on GDPR. You got to go to IBMgo.com and log in and register. It's all free to see those breakout sessions. Everything else is open. You don't even have to register or log in to see that. So keep it right here, everybody. Check out the main tent. Check out siliconangle.com, and of course IBMgo.com for all the action here. Fast track your data. We're live from Munich, Germany; and we'll see you a little later. (upbeat techno music)
SUMMARY :
Brought to you by IBM. that allows our clients to interact with governance and expand the capacity of them. And you got a customer on, you guys going to be talking about and Ranger, as the kind of open-source operating system How are you spending your time as the Chief Data Officer? and the Dev teams to define what are these three or four, I mean, in an end to end machine learning pipeline, in the data science platform, if you would. and the machine learning models that might have driven it. and you connect them together, then you have that problem. I can maybe not sell the data, How do I get to my current, you know, But again, it seems to be somewhat of a headwind of decisions that then becomes to your point, Jim, of not just the data, but of data science as a capability? and as you check off those boxes, you can kind of go and say, You need to kind of frame it in the reference that your CFO And fine tuning as you go along. and we'll see you a little later.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
ING | ORGANIZATION | 0.99+ |
Seth | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Seth Dobrin | PERSON | 0.99+ |
Germany | LOCATION | 0.99+ |
Jim | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
ten year | QUANTITY | 0.99+ |
five year | QUANTITY | 0.99+ |
seven months | QUANTITY | 0.99+ |
Asia | LOCATION | 0.99+ |
three year | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
four | QUANTITY | 0.99+ |
Heinrich | PERSON | 0.99+ |
Horton Works | ORGANIZATION | 0.99+ |
Dez Blanchfield | PERSON | 0.99+ |
two types | QUANTITY | 0.99+ |
siliconangle.com | OTHER | 0.99+ |
three days | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
each piece | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Dav | PERSON | 0.99+ |
each | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
Munich, Germany | LOCATION | 0.99+ |
third | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
billions of dollars | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
One | QUANTITY | 0.98+ |
two different pieces | QUANTITY | 0.98+ |
three things | QUANTITY | 0.98+ |
DB2 | TITLE | 0.98+ |
first step | QUANTITY | 0.98+ |
GDPR | TITLE | 0.97+ |
Apache Atlas | ORGANIZATION | 0.97+ |
fourth platform | QUANTITY | 0.97+ |
2017 | DATE | 0.97+ |
three pieces | QUANTITY | 0.97+ |
IBM Analytics | ORGANIZATION | 0.96+ |
first time | QUANTITY | 0.96+ |
single | QUANTITY | 0.96+ |
Spark | TITLE | 0.95+ |
Ranger | ORGANIZATION | 0.91+ |
two breakout sessions | QUANTITY | 0.88+ |
about an hour | QUANTITY | 0.86+ |
each decision | QUANTITY | 0.85+ |
Cube | COMMERCIAL_ITEM | 0.84+ |
each one | QUANTITY | 0.83+ |
this afternoon | DATE | 0.82+ |
Cube | ORGANIZATION | 0.8+ |
San Francisco, Toronto | LOCATION | 0.79+ |
GDPRs | TITLE | 0.76+ |
GDBR | TITLE | 0.75+ |