Image Title

Search Results for IBM Analytics Organization:

Seth Dobrin, IBM Analytics - Spark Summit East 2017 - #sparksummit - #theCUBE


 

>> Narrator: Live from Boston, Massachusetts, this is theCUBE! Covering Spark Summit East 2017. Brought to you by, Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, Seth Dobrin is here, he's the vice president and chief data officer of the IBM Analytics Organization. Great to see you, Seth, thanks for coming on. >> Great to be back, thanks for having me again. >> You're welcome, so chief data officer is the hot title. It was predicted to be the hot title and now it really is. Many more of you around the world and IBM's got an interesting sort of structure of chief data officers, can you explain that? >> Yeah, so there's a global chief data officer, that's Inderpal Bhandari and he's been on this podcast or videocast a view times. Then he's set up structures within each of the business units in IBM. Where each of the major business units have a chief data officer, also. And so I'm the chief data officer for the analytics business unit. >> So one of Interpol's things when I've interviewed them is culture. The data culture, you've got to drive that in. And he talks about the five things that chief data officers really need to do to be successful. Maybe you could give us your perspective on how that flows down through the organization and what are the key critical success factors for you and how are you implementing them? >> I agree, there's five key things and maybe I frame a little differently than Interpol does. There's this whole cloud migration, so every chief data officer needs to understand what their cloud migration strategy is. Every chief data officer needs to have a good understanding of what their data science strategy is. So how are they going to build the posable data science assets. So not data science assets that are delivered through spreadsheets. Every chief data officer needs to understand what their approach to unified governance is. So how do I govern all of my platforms in a way that enables that last point about data science. And then there's a piece around people. How do I build a pipeline for me today and the future? >> So the people piece is both the skills, and it's presumably a relationship with the line of business, as well. There's sort of two vectors there, right? >> Yeah the people piece when I think of it, is really about skills. There's a whole cultural component that goes across all of those five pieces that I laid out. Finding the right people, with the right skillset, where you need them, is hard. >> Can you talk about cloud migration, why that's so critical and so hard? >> If you look at kind of where the industry's been, the IT industry, it's been this race to the public cloud. I think it's a little misguided, all along. If you look at how business is run, right? Today, enterprises that are not internet born, make their money from what's running their businesses today. So this business critical assets. And just thinking that you can pick those up and move them to the cloud and take advantage of cloud, is not realistic. So the race really, is to a hybrid cloud. Our future's really lie in how do I connect these business critical assets to the cloud? And how do I migrate those things to the cloud? >> So Seth, the CIO might say to you, "Okay, let's go there for a minute, I kind of agree with what you're saying, I can't just shift everything in to the cloud. But what I can do in a hybrid cloud that I can't do in a public cloud?" >> Well, there's some drivers for that. I think one driver for hybrid cloud is what I just said. You can't just pick everything up and move it overnight, it's a journey. And it's not a six month journey, it's probably not a year journey, it's probably a multi year journey. >> Dave: So you can actually keep running your business? >> So you can actually keep running your business. And then other piece is there's new regulations that are coming up. And these regulations, EUGDPR is the biggest example of them right now. There are very stiff fines, for violations of those policies. And the party that's responsible for paying those fines, is the party that who the consumer engaged with. It's you, it's whoever owns the business. And as a business leader, I don't know that I would be, very willingly give up, trust a third party to manage that, just any any third party to manage that for me. And so there's certain types of data that some enterprises may never want to move to the cloud, because they're not going to trust a third party to manage that risk for them. >> So it's more transparent from a government standpoint. It's not opaque. >> Seth: Yup. >> You feel like you're in control? >> Yeah, you feel like you're in control and if something goes wrong, it's my fault. It's not something that I got penalized for because someone else did something wrong. >> So at the data layer, help us sort of abstract one layer up and the applications. How would you partition the applications. The ones that are managing that critical data that has to stay on premises. What would you build up potentially to compliment it in the public cloud? >> I don't think you need to partition applications. The way you build modern applications today, it's all API driven. You can reduce some of the costs of latency, through design. So you don't really need to partition the applications, per say. >> I'm thinking more along the lines of that the systems of record are not going to be torn out and those are probably the last ones if ever to go to the public cloud. But other applications leverage them. If that's not the right way of looking at it, where do you add value in the public cloud versus what stays on premise? >> So some of the system of record data, there's no reason you can't replicate some of it to the cloud. So if it's not this personal information, or highly regulated information, there's no reason that you can't replicate some of that to the cloud. And I think we get caught up in, we can't replicate data, we can't replicate data. I don't think that's the right answer, I think the right answer is to replicate the data if you need to, or if the data and system of record is not in the right structure, for what I need to do, then let's put the data in the right structure. Let's not have the conversation about how I can't replicate data. Let's have the conversation about where's the right place for the data, where does it make most sense and what's the right structure for it? And if that means you've got 10 copies of a certain type of data then you've got 10 copies of a certain type of data. >> Would you be, on that data, would it typically be, other parts of the systems of record that you might have in the public cloud, or would they be new apps, sort of green field apps? >> Seth: Yes. >> George: Okay. >> Seth: I think both. And that's part of, i think in my mind, that's kind of how you build, that question you just asked right there. Is one of the things that guide how you build your cloud migration strategy. So we said you can't just pick everything up and move it. So how do you prioritize? You look at what you need to build to run your business differently. And you start there and you start thinking about how do I migrate information to support those to the cloud? And maybe you start by building a local private cloud. So that everything's close together until you kind of master it. And then once you get enough, critical mass of data and applications around it, then you start moving stuff to the cloud. >> We talked earlier off camera about reframing governance steps. I used to head a CIO consultancy and we worked with a number of CIOs that were within legal IT, for example. And were worried about compliance and governance and things of that nature. And their ROI was always scare the board. But the holy grail, was can we turn governance into something of value? For the organization? Can we? >> I think in the world we live in today, with ever increasing regulations. And with a need to be agile and with everyone needing to and wanting to apply data science at scale. You need to reframe governance, right? Governance needs to be reframed from something that is seen as a roadblock. To something that is truly an enabler. And not just giving it lip service. And what do I mean by that? For governance to be an enabler, you really got to think about, how do I upfront, classify my data so that all data in my organization is bucketed in to some version of public, propietary and confidential. Different enterprises may have 30 scales and some may only have two. Or some may have one. and so you do that up front and so you know what can be done with data, when it can be done and who it can by done with. You need to capture intent. So what are allowed intended uses of data? And as a data scientist, what am I intending to do with this data? So that you can then mesh those two things together? Cause that's important in these new regulations I talked about, is people give you access to data, their personal data for an intended purpose. And then you need to be able to apply these governance, policies, actively. So it's not a passive, after the fact. Or you got to stop and you got to wait, it's leveraging services. Leveraging APIs. And building a composable system of polices that are delivered through APIs. So if I want to create a sandbox. To run some analytics on. I'm going to call an API. To get that data. That API is going to call a policy API that's going to say, "Okay, does Seth have permission to see this data? Can Seth use this data for this intended purpose?" if yes, the sandbox is created. If not, there's a conversation about really why does Seth need access to this data? It's really moving governance to be actively to enable me to do things. And it changes the conversation from, hey it's your data, can I have it? To there's really solid reasons as to why I can and can't have data. >> And then some potential automation around a sandbox that creates value. >> Seth: Absolutely. >> But it's still, the example you gave, public prop6ietary or confidential. Is still very governance like, where I was hoping you were going with the data classification and I think you referenced this. Can I extend that, that schema, that nomenclature to include other attributes of value? And can i do it, automate it, at the point of creation or use and scale it? >> Absolutely, that is exactly what I mean. I just used those three cause it was the three that are easy to understand. >> So I can give you as a business owner some areas that I would like to see, a classification schema and then you could automate that for me at scale? In theory? >> In theory, that's where we're hoping to go. To be able to automate. And it's going to be different based on what industry vertical you're in. What risk profile your business is willing to take. So that classification scheme is going to look very different for a bank, than it will for a pharmaceutical company. Or for a research organization. >> Dave: Well, if I can then defensively delete data. That's of real value to an organization. >> With new regulations, you need to be able to delete data. And you need to be able to know where all of your data is. So that you can delete it. Today, most organizations don't know where all their data is. >> And that problem is solved with math and data science, or? >> I think that problem is solved with a combination of governance. >> Dave: Sure. >> And technology. Right? >> Yeah, technology kind of got us into this problem. We'll say technology can get us out. >> On the technology subject, it seems like, with the explosion of data, whether it's not just volume, but also, many copies of the truth. You would need some sort of curation and catalog system that goes beyond what you had in a data warehouse. How do you address that challenge? >> Seth: Yeah and that gets into what I said when you guys asked me about CDOs, what do they care about? One of the things is unified governance. And so part of unified governance, the first piece of unified governance is having a catalog of your data. That is all of your data. And it's a single catalog for your data whether it's one of your business critical systems that's running your business today. Whether it's a public cloud, or it's a private cloud. Or some combination of both. You need to know where all your data is. You also need to have a policy catalog that's single for both of those. Catalogs like this fall apart by entropy. And the more you have, the more likely they are to fall apart. And so if you have one. And you have a lot of automation around it to do a lot of these things, so you have automation that allows you to go through your data and discover what data is where. And keep track of lineage in an automated fashion. Keep track of provenance in an automated fashion. Then we start getting into a system of truly unified governance that's active like I said before. >> There's a lot of talk about digital transformations. Of course, digital equals data. If it ain't data, it ain't digital. So one of the things that in the early days of the whole big data theme. You'd hear people say, "You have to figure out how to monetize the data." And that seems to have changed and morphed into you have to understand how your organization gets value from data. If you're a for profit company, it's monetizing. Something and feeding how data contributes to that monetization if you're a health care organization, maybe it's different. I wonder if you could talk about that in terms of the importance of understanding how an organization makes money to the CDO specifically. >> I think you bring up a good point. Monetization of data and analytics, is often interpreted differently. If you're a CFO you're going to say, "You're going to create new value for me, I'm going to start getting new revenue streams." And that may or may not be what you mean. >> Dave: Sell the data, it's not always so easy. >> It's not always so easy and it's hard to demonstrate value for data. To sell it. There's certain types, like IBM owns a weather company. Clearly, people want to buy weather data, it's important. But if you're talking about how do you transform a business unit it's not necessarily about creating new revenue streams, it's how do I leverage data and analytics to run my business differently. And maybe even what are new business models that I could never do before I had data and data science. >> Would it be fair to say that, as Dave was saying, there's the data side and people were talking about monetizing that. But when you talk about analytics increasingly, machine learning specifically, it's a fusion of the data and the model. And a feedback loop. Is that something where, that becomes a critical asset? >> I would actually say that you really can't generate a tremendous amount of value from just data. You need to apply something like machine learning to it. And machine learning has no value without good data. You need to be able to apply machine learning at scale. You need to build the deployable data science assets that run your business differently. So for example, I could run a report that shows me how my business did last quarter. How my sales team did last quarter. Or how my marketing team did last quarter. That's not really creating value. That's giving me a retrospective look on how I did. Where you can create value is how do I run my marketing team differently. So what data do I have and what types of learning can I get from that data that will tell my marketing team what they should be doing? >> George: And the ongoing process. >> And the ongoing process. And part of actually discovering, doing this catalog your data and understanding data you find data quality issues. And data quality issues are not necessarily an issue with the data itself or the people, they're usually process issues. And by discovering those data quality issues you may discover processes that need to be changed and in changing those processes you can create efficiencies. >> So it sounds like you guys got a pretty good framework. Having talked to Interpol a couple times and what you're saying makes sense. Do you have nightmares about IOT? (laughing) >> Do I have nightmares about IOT? I don't think I have nightmares about IOT. IOT is really just a series of connected devices. Is really what it is. On my talk tomorrow, I'm going to talk about hybrid cloud and connect a car is actually one of the things I'm going to talk about. And really a connected car you're just have a bunch of connected devices to a private cloud that's on wheels. I'm less concerned about IOT than I am, people manually changing data. IOT you get data, you can track it, if something goes wrong, you know what happened. I would say no, I don't have nightmares about IOT. If you do security wrong, that's a whole nother conversation. >> But it sounds like you're doing security right, sounds like you got a good handle on governance. Obviously scale is a key part of that. Could break the whole thing if you can't scale. And you're comfortable with the state of technology being able to support that? At least with IBM. >> I think at least with an IBM I think I am. Like I said, a connected car which is basically a bunch of IOT devices, a private cloud. How do we connect that private cloud to other private clouds or to a public cloud? There's tons of technologies out there to do that. Spark, Kafka. Those two things together allow you to do things that we could never do before. >> Can you elaborate? Like in a connected car environment or some other scenario where, other people called it a data center on wheels. Think of it as a private cloud, that's a wonderful analogy. How does Spark and Kafka on that very, very, smart device, cooperate with something like on the edge. Like the cities, buildings, versus in the clouds? >> If you're a connected car and you're this private cloud on wheels. You can't drive the car just on that information. You can't drive it just on the LIDAR knowing how well the wheels are in contact, you need weather information. You need information about other cars around you. You need information about pedestrians. You need information about traffic. All of this information you get from that connection. And the way you do that is leveraging Spark and Kafka. Kafka's a messaging system, you could leverage Kafka to send the car messages. Or send pedestrian messages. "This car is coming, you shouldn't cross." Or vice versa. Get a car to stop because there's a pedestrian in the way before even the systems on the car can see it. So if you can get that kind of messaging system in near real time. If I'm the pedestrian I'm 300 feet away. A half a second that it would take for that to go through, isn't that big of a deal because you'll be stopped before you get there. >> What about the again, intelligence between not just the data, but the advanced analytics. Where some of that would live in the car and some in the cloud. Is it just you're making realtime decisions in the car and you're retraining the models in the cloud, or how does that work? >> No I think some of those decisions would be done through Spark. In transit. And so one of the nice things about something about Spark is, we can do machine learning transformations on data. Think ETL. But think ETL where you can apply machine learning as part of that ETL. So I'm transferring all this weather data, positioning data and I'm applying a machine learning algorithm for a given purpose in that car. So the purpose is navigation. Or making sure I'm not running into a building. So that's happening in real time as it's streaming to the car. >> That's the prediction aspect that's happening in real time. >> Seth: Yes. >> But at the same time, you want to be learning from all the cars in your fleet. >> That would happen up in the cloud. I don't think that needs to happen on the edge. Maybe it does, but I don't think it needs to happen on the edge. And today, while I said a car is a data center, a private cloud on wheels, there's cost to the computation you can have on that car. And I don't think the cost is quite low enough yet where you could do all that where it makes sense to do all that computation on the edge. So some of it you would want to do in the cloud. Plus you would want to have all the information from as many cars in the area as possible. >> Dave: We're out of time, but some closing thoughts. They say may you live in interesting times. Well you can sum up the sum of the changes that are going on the business. Dell buys EMC, IBM buys The Weather Company. And that gave you a huge injection of data scientists. Which, talk about data culture. Just last thoughts on that in terms of the acquisition and how that's affected your role. >> I've only been at IBM since November. So all that happened before my role. >> Dave: So you inherited? >> So from my perspective it's a great thing. Before I got there, the culture was starting to change. Like we talked about before we went on air, that's the hardest part about any kind of data science transformation is the cultural aspects. >> Seth, thanks very much for coming back in theCUBE. Good to have you. >> Yeah, thanks for having me again. >> You're welcome, all right, keep it right there everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit in Boston. Right back. (soft rock music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by, Databricks. of the IBM Analytics Organization. Many more of you around the world And so I'm the chief data officer and what are the key critical success factors for you So how are they going to build the posable data science assets. So the people piece is both the skills, with the right skillset, where you need them, is hard. So the race really, is to a hybrid cloud. So Seth, the CIO might say to you, And it's not a six month journey, So you can actually keep running your business. So it's more transparent from a government standpoint. Yeah, you feel like you're in control that has to stay on premises. I don't think you need to partition applications. of record are not going to be torn out to replicate the data if you need to, that guide how you build your cloud migration strategy. But the holy grail, So that you can then mesh those two things together? And then some potential automation But it's still, the example you gave, that are easy to understand. So that classification scheme is going to That's of real value to an organization. And you need to be able to know where all of your data is. I think that problem is solved And technology. Yeah, technology kind of got us into this problem. that goes beyond what you had in a data warehouse. And the more you have, And that seems to have changed and morphed into you have And that may or may not be what you mean. and it's hard to demonstrate value for data. it's a fusion of the data and the model. that you really can't generate a tremendous amount And by discovering those data quality issues you may So it sounds like you guys got a pretty good framework. of the things I'm going to talk about. Could break the whole thing if you can't scale. Those two things together allow you Can you elaborate? And the way you do that is leveraging Spark and Kafka. and some in the cloud. But think ETL where you can apply machine That's the prediction aspect you want to be learning from all the cars in your fleet. to the computation you can have on that car. And that gave you a huge injection of data scientists. So all that happened before my role. that's the hardest part about any kind Good to have you. we'll be back with our next guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

IBMORGANIZATION

0.99+

GeorgePERSON

0.99+

George GilbertPERSON

0.99+

SethPERSON

0.99+

Dave VellantePERSON

0.99+

Inderpal BhandariPERSON

0.99+

10 copiesQUANTITY

0.99+

Seth DobrinPERSON

0.99+

DellORGANIZATION

0.99+

300 feetQUANTITY

0.99+

oneQUANTITY

0.99+

twoQUANTITY

0.99+

six monthQUANTITY

0.99+

bothQUANTITY

0.99+

BostonLOCATION

0.99+

30 scalesQUANTITY

0.99+

last quarterDATE

0.99+

five thingsQUANTITY

0.99+

five piecesQUANTITY

0.99+

IBM Analytics OrganizationORGANIZATION

0.99+

Boston, MassachusettsLOCATION

0.99+

eachQUANTITY

0.99+

two thingsQUANTITY

0.99+

todayDATE

0.99+

NovemberDATE

0.99+

tomorrowDATE

0.99+

TodayDATE

0.99+

singleQUANTITY

0.99+

The Weather CompanyORGANIZATION

0.99+

two vectorsQUANTITY

0.99+

EMCORGANIZATION

0.98+

SparkTITLE

0.98+

InterpolORGANIZATION

0.98+

IBM AnalyticsORGANIZATION

0.98+

one driverQUANTITY

0.98+

OneQUANTITY

0.97+

first pieceQUANTITY

0.97+

KafkaPERSON

0.97+

threeQUANTITY

0.97+

Spark Summit East 2017EVENT

0.93+

a yearQUANTITY

0.93+

Spark SummitEVENT

0.92+

five key thingsQUANTITY

0.91+

single catalogQUANTITY

0.9+

EUGDPRTITLE

0.9+

one layerQUANTITY

0.9+

SparkPERSON

0.88+

KafkaTITLE

0.86+

half a secondQUANTITY

0.84+

DatabricksORGANIZATION

0.82+