Seth Dobrin, IBM Analytics - Spark Summit East 2017 - #sparksummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is theCUBE! Covering Spark Summit East 2017. Brought to you by, Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, Seth Dobrin is here, he's the vice president and chief data officer of the IBM Analytics Organization. Great to see you, Seth, thanks for coming on. >> Great to be back, thanks for having me again. >> You're welcome, so chief data officer is the hot title. It was predicted to be the hot title and now it really is. Many more of you around the world and IBM's got an interesting sort of structure of chief data officers, can you explain that? >> Yeah, so there's a global chief data officer, that's Inderpal Bhandari and he's been on this podcast or videocast a view times. Then he's set up structures within each of the business units in IBM. Where each of the major business units have a chief data officer, also. And so I'm the chief data officer for the analytics business unit. >> So one of Interpol's things when I've interviewed them is culture. The data culture, you've got to drive that in. And he talks about the five things that chief data officers really need to do to be successful. Maybe you could give us your perspective on how that flows down through the organization and what are the key critical success factors for you and how are you implementing them? >> I agree, there's five key things and maybe I frame a little differently than Interpol does. There's this whole cloud migration, so every chief data officer needs to understand what their cloud migration strategy is. Every chief data officer needs to have a good understanding of what their data science strategy is. So how are they going to build the posable data science assets. So not data science assets that are delivered through spreadsheets. Every chief data officer needs to understand what their approach to unified governance is. So how do I govern all of my platforms in a way that enables that last point about data science. And then there's a piece around people. How do I build a pipeline for me today and the future? >> So the people piece is both the skills, and it's presumably a relationship with the line of business, as well. There's sort of two vectors there, right? >> Yeah the people piece when I think of it, is really about skills. There's a whole cultural component that goes across all of those five pieces that I laid out. Finding the right people, with the right skillset, where you need them, is hard. >> Can you talk about cloud migration, why that's so critical and so hard? >> If you look at kind of where the industry's been, the IT industry, it's been this race to the public cloud. I think it's a little misguided, all along. If you look at how business is run, right? Today, enterprises that are not internet born, make their money from what's running their businesses today. So this business critical assets. And just thinking that you can pick those up and move them to the cloud and take advantage of cloud, is not realistic. So the race really, is to a hybrid cloud. Our future's really lie in how do I connect these business critical assets to the cloud? And how do I migrate those things to the cloud? >> So Seth, the CIO might say to you, "Okay, let's go there for a minute, I kind of agree with what you're saying, I can't just shift everything in to the cloud. But what I can do in a hybrid cloud that I can't do in a public cloud?" >> Well, there's some drivers for that. I think one driver for hybrid cloud is what I just said. You can't just pick everything up and move it overnight, it's a journey. And it's not a six month journey, it's probably not a year journey, it's probably a multi year journey. >> Dave: So you can actually keep running your business? >> So you can actually keep running your business. And then other piece is there's new regulations that are coming up. And these regulations, EUGDPR is the biggest example of them right now. There are very stiff fines, for violations of those policies. And the party that's responsible for paying those fines, is the party that who the consumer engaged with. It's you, it's whoever owns the business. And as a business leader, I don't know that I would be, very willingly give up, trust a third party to manage that, just any any third party to manage that for me. And so there's certain types of data that some enterprises may never want to move to the cloud, because they're not going to trust a third party to manage that risk for them. >> So it's more transparent from a government standpoint. It's not opaque. >> Seth: Yup. >> You feel like you're in control? >> Yeah, you feel like you're in control and if something goes wrong, it's my fault. It's not something that I got penalized for because someone else did something wrong. >> So at the data layer, help us sort of abstract one layer up and the applications. How would you partition the applications. The ones that are managing that critical data that has to stay on premises. What would you build up potentially to compliment it in the public cloud? >> I don't think you need to partition applications. The way you build modern applications today, it's all API driven. You can reduce some of the costs of latency, through design. So you don't really need to partition the applications, per say. >> I'm thinking more along the lines of that the systems of record are not going to be torn out and those are probably the last ones if ever to go to the public cloud. But other applications leverage them. If that's not the right way of looking at it, where do you add value in the public cloud versus what stays on premise? >> So some of the system of record data, there's no reason you can't replicate some of it to the cloud. So if it's not this personal information, or highly regulated information, there's no reason that you can't replicate some of that to the cloud. And I think we get caught up in, we can't replicate data, we can't replicate data. I don't think that's the right answer, I think the right answer is to replicate the data if you need to, or if the data and system of record is not in the right structure, for what I need to do, then let's put the data in the right structure. Let's not have the conversation about how I can't replicate data. Let's have the conversation about where's the right place for the data, where does it make most sense and what's the right structure for it? And if that means you've got 10 copies of a certain type of data then you've got 10 copies of a certain type of data. >> Would you be, on that data, would it typically be, other parts of the systems of record that you might have in the public cloud, or would they be new apps, sort of green field apps? >> Seth: Yes. >> George: Okay. >> Seth: I think both. And that's part of, i think in my mind, that's kind of how you build, that question you just asked right there. Is one of the things that guide how you build your cloud migration strategy. So we said you can't just pick everything up and move it. So how do you prioritize? You look at what you need to build to run your business differently. And you start there and you start thinking about how do I migrate information to support those to the cloud? And maybe you start by building a local private cloud. So that everything's close together until you kind of master it. And then once you get enough, critical mass of data and applications around it, then you start moving stuff to the cloud. >> We talked earlier off camera about reframing governance steps. I used to head a CIO consultancy and we worked with a number of CIOs that were within legal IT, for example. And were worried about compliance and governance and things of that nature. And their ROI was always scare the board. But the holy grail, was can we turn governance into something of value? For the organization? Can we? >> I think in the world we live in today, with ever increasing regulations. And with a need to be agile and with everyone needing to and wanting to apply data science at scale. You need to reframe governance, right? Governance needs to be reframed from something that is seen as a roadblock. To something that is truly an enabler. And not just giving it lip service. And what do I mean by that? For governance to be an enabler, you really got to think about, how do I upfront, classify my data so that all data in my organization is bucketed in to some version of public, propietary and confidential. Different enterprises may have 30 scales and some may only have two. Or some may have one. and so you do that up front and so you know what can be done with data, when it can be done and who it can by done with. You need to capture intent. So what are allowed intended uses of data? And as a data scientist, what am I intending to do with this data? So that you can then mesh those two things together? Cause that's important in these new regulations I talked about, is people give you access to data, their personal data for an intended purpose. And then you need to be able to apply these governance, policies, actively. So it's not a passive, after the fact. Or you got to stop and you got to wait, it's leveraging services. Leveraging APIs. And building a composable system of polices that are delivered through APIs. So if I want to create a sandbox. To run some analytics on. I'm going to call an API. To get that data. That API is going to call a policy API that's going to say, "Okay, does Seth have permission to see this data? Can Seth use this data for this intended purpose?" if yes, the sandbox is created. If not, there's a conversation about really why does Seth need access to this data? It's really moving governance to be actively to enable me to do things. And it changes the conversation from, hey it's your data, can I have it? To there's really solid reasons as to why I can and can't have data. >> And then some potential automation around a sandbox that creates value. >> Seth: Absolutely. >> But it's still, the example you gave, public prop6ietary or confidential. Is still very governance like, where I was hoping you were going with the data classification and I think you referenced this. Can I extend that, that schema, that nomenclature to include other attributes of value? And can i do it, automate it, at the point of creation or use and scale it? >> Absolutely, that is exactly what I mean. I just used those three cause it was the three that are easy to understand. >> So I can give you as a business owner some areas that I would like to see, a classification schema and then you could automate that for me at scale? In theory? >> In theory, that's where we're hoping to go. To be able to automate. And it's going to be different based on what industry vertical you're in. What risk profile your business is willing to take. So that classification scheme is going to look very different for a bank, than it will for a pharmaceutical company. Or for a research organization. >> Dave: Well, if I can then defensively delete data. That's of real value to an organization. >> With new regulations, you need to be able to delete data. And you need to be able to know where all of your data is. So that you can delete it. Today, most organizations don't know where all their data is. >> And that problem is solved with math and data science, or? >> I think that problem is solved with a combination of governance. >> Dave: Sure. >> And technology. Right? >> Yeah, technology kind of got us into this problem. We'll say technology can get us out. >> On the technology subject, it seems like, with the explosion of data, whether it's not just volume, but also, many copies of the truth. You would need some sort of curation and catalog system that goes beyond what you had in a data warehouse. How do you address that challenge? >> Seth: Yeah and that gets into what I said when you guys asked me about CDOs, what do they care about? One of the things is unified governance. And so part of unified governance, the first piece of unified governance is having a catalog of your data. That is all of your data. And it's a single catalog for your data whether it's one of your business critical systems that's running your business today. Whether it's a public cloud, or it's a private cloud. Or some combination of both. You need to know where all your data is. You also need to have a policy catalog that's single for both of those. Catalogs like this fall apart by entropy. And the more you have, the more likely they are to fall apart. And so if you have one. And you have a lot of automation around it to do a lot of these things, so you have automation that allows you to go through your data and discover what data is where. And keep track of lineage in an automated fashion. Keep track of provenance in an automated fashion. Then we start getting into a system of truly unified governance that's active like I said before. >> There's a lot of talk about digital transformations. Of course, digital equals data. If it ain't data, it ain't digital. So one of the things that in the early days of the whole big data theme. You'd hear people say, "You have to figure out how to monetize the data." And that seems to have changed and morphed into you have to understand how your organization gets value from data. If you're a for profit company, it's monetizing. Something and feeding how data contributes to that monetization if you're a health care organization, maybe it's different. I wonder if you could talk about that in terms of the importance of understanding how an organization makes money to the CDO specifically. >> I think you bring up a good point. Monetization of data and analytics, is often interpreted differently. If you're a CFO you're going to say, "You're going to create new value for me, I'm going to start getting new revenue streams." And that may or may not be what you mean. >> Dave: Sell the data, it's not always so easy. >> It's not always so easy and it's hard to demonstrate value for data. To sell it. There's certain types, like IBM owns a weather company. Clearly, people want to buy weather data, it's important. But if you're talking about how do you transform a business unit it's not necessarily about creating new revenue streams, it's how do I leverage data and analytics to run my business differently. And maybe even what are new business models that I could never do before I had data and data science. >> Would it be fair to say that, as Dave was saying, there's the data side and people were talking about monetizing that. But when you talk about analytics increasingly, machine learning specifically, it's a fusion of the data and the model. And a feedback loop. Is that something where, that becomes a critical asset? >> I would actually say that you really can't generate a tremendous amount of value from just data. You need to apply something like machine learning to it. And machine learning has no value without good data. You need to be able to apply machine learning at scale. You need to build the deployable data science assets that run your business differently. So for example, I could run a report that shows me how my business did last quarter. How my sales team did last quarter. Or how my marketing team did last quarter. That's not really creating value. That's giving me a retrospective look on how I did. Where you can create value is how do I run my marketing team differently. So what data do I have and what types of learning can I get from that data that will tell my marketing team what they should be doing? >> George: And the ongoing process. >> And the ongoing process. And part of actually discovering, doing this catalog your data and understanding data you find data quality issues. And data quality issues are not necessarily an issue with the data itself or the people, they're usually process issues. And by discovering those data quality issues you may discover processes that need to be changed and in changing those processes you can create efficiencies. >> So it sounds like you guys got a pretty good framework. Having talked to Interpol a couple times and what you're saying makes sense. Do you have nightmares about IOT? (laughing) >> Do I have nightmares about IOT? I don't think I have nightmares about IOT. IOT is really just a series of connected devices. Is really what it is. On my talk tomorrow, I'm going to talk about hybrid cloud and connect a car is actually one of the things I'm going to talk about. And really a connected car you're just have a bunch of connected devices to a private cloud that's on wheels. I'm less concerned about IOT than I am, people manually changing data. IOT you get data, you can track it, if something goes wrong, you know what happened. I would say no, I don't have nightmares about IOT. If you do security wrong, that's a whole nother conversation. >> But it sounds like you're doing security right, sounds like you got a good handle on governance. Obviously scale is a key part of that. Could break the whole thing if you can't scale. And you're comfortable with the state of technology being able to support that? At least with IBM. >> I think at least with an IBM I think I am. Like I said, a connected car which is basically a bunch of IOT devices, a private cloud. How do we connect that private cloud to other private clouds or to a public cloud? There's tons of technologies out there to do that. Spark, Kafka. Those two things together allow you to do things that we could never do before. >> Can you elaborate? Like in a connected car environment or some other scenario where, other people called it a data center on wheels. Think of it as a private cloud, that's a wonderful analogy. How does Spark and Kafka on that very, very, smart device, cooperate with something like on the edge. Like the cities, buildings, versus in the clouds? >> If you're a connected car and you're this private cloud on wheels. You can't drive the car just on that information. You can't drive it just on the LIDAR knowing how well the wheels are in contact, you need weather information. You need information about other cars around you. You need information about pedestrians. You need information about traffic. All of this information you get from that connection. And the way you do that is leveraging Spark and Kafka. Kafka's a messaging system, you could leverage Kafka to send the car messages. Or send pedestrian messages. "This car is coming, you shouldn't cross." Or vice versa. Get a car to stop because there's a pedestrian in the way before even the systems on the car can see it. So if you can get that kind of messaging system in near real time. If I'm the pedestrian I'm 300 feet away. A half a second that it would take for that to go through, isn't that big of a deal because you'll be stopped before you get there. >> What about the again, intelligence between not just the data, but the advanced analytics. Where some of that would live in the car and some in the cloud. Is it just you're making realtime decisions in the car and you're retraining the models in the cloud, or how does that work? >> No I think some of those decisions would be done through Spark. In transit. And so one of the nice things about something about Spark is, we can do machine learning transformations on data. Think ETL. But think ETL where you can apply machine learning as part of that ETL. So I'm transferring all this weather data, positioning data and I'm applying a machine learning algorithm for a given purpose in that car. So the purpose is navigation. Or making sure I'm not running into a building. So that's happening in real time as it's streaming to the car. >> That's the prediction aspect that's happening in real time. >> Seth: Yes. >> But at the same time, you want to be learning from all the cars in your fleet. >> That would happen up in the cloud. I don't think that needs to happen on the edge. Maybe it does, but I don't think it needs to happen on the edge. And today, while I said a car is a data center, a private cloud on wheels, there's cost to the computation you can have on that car. And I don't think the cost is quite low enough yet where you could do all that where it makes sense to do all that computation on the edge. So some of it you would want to do in the cloud. Plus you would want to have all the information from as many cars in the area as possible. >> Dave: We're out of time, but some closing thoughts. They say may you live in interesting times. Well you can sum up the sum of the changes that are going on the business. Dell buys EMC, IBM buys The Weather Company. And that gave you a huge injection of data scientists. Which, talk about data culture. Just last thoughts on that in terms of the acquisition and how that's affected your role. >> I've only been at IBM since November. So all that happened before my role. >> Dave: So you inherited? >> So from my perspective it's a great thing. Before I got there, the culture was starting to change. Like we talked about before we went on air, that's the hardest part about any kind of data science transformation is the cultural aspects. >> Seth, thanks very much for coming back in theCUBE. Good to have you. >> Yeah, thanks for having me again. >> You're welcome, all right, keep it right there everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit in Boston. Right back. (soft rock music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by, Databricks. of the IBM Analytics Organization. Many more of you around the world And so I'm the chief data officer and what are the key critical success factors for you So how are they going to build the posable data science assets. So the people piece is both the skills, with the right skillset, where you need them, is hard. So the race really, is to a hybrid cloud. So Seth, the CIO might say to you, And it's not a six month journey, So you can actually keep running your business. So it's more transparent from a government standpoint. Yeah, you feel like you're in control that has to stay on premises. I don't think you need to partition applications. of record are not going to be torn out to replicate the data if you need to, that guide how you build your cloud migration strategy. But the holy grail, So that you can then mesh those two things together? And then some potential automation But it's still, the example you gave, that are easy to understand. So that classification scheme is going to That's of real value to an organization. And you need to be able to know where all of your data is. I think that problem is solved And technology. Yeah, technology kind of got us into this problem. that goes beyond what you had in a data warehouse. And the more you have, And that seems to have changed and morphed into you have And that may or may not be what you mean. and it's hard to demonstrate value for data. it's a fusion of the data and the model. that you really can't generate a tremendous amount And by discovering those data quality issues you may So it sounds like you guys got a pretty good framework. of the things I'm going to talk about. Could break the whole thing if you can't scale. Those two things together allow you Can you elaborate? And the way you do that is leveraging Spark and Kafka. and some in the cloud. But think ETL where you can apply machine That's the prediction aspect you want to be learning from all the cars in your fleet. to the computation you can have on that car. And that gave you a huge injection of data scientists. So all that happened before my role. that's the hardest part about any kind Good to have you. we'll be back with our next guest.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Seth	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Inderpal Bhandari	PERSON	0.99+
10 copies	QUANTITY	0.99+
Seth Dobrin	PERSON	0.99+
Dell	ORGANIZATION	0.99+
300 feet	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
six month	QUANTITY	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
30 scales	QUANTITY	0.99+
last quarter	DATE	0.99+
five things	QUANTITY	0.99+
five pieces	QUANTITY	0.99+
IBM Analytics Organization	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
each	QUANTITY	0.99+
two things	QUANTITY	0.99+
today	DATE	0.99+
November	DATE	0.99+
tomorrow	DATE	0.99+
Today	DATE	0.99+
single	QUANTITY	0.99+
The Weather Company	ORGANIZATION	0.99+
two vectors	QUANTITY	0.99+
EMC	ORGANIZATION	0.98+
Spark	TITLE	0.98+
Interpol	ORGANIZATION	0.98+
IBM Analytics	ORGANIZATION	0.98+
one driver	QUANTITY	0.98+
One	QUANTITY	0.97+
first piece	QUANTITY	0.97+
Kafka	PERSON	0.97+
three	QUANTITY	0.97+
Spark Summit East 2017	EVENT	0.93+
a year	QUANTITY	0.93+
Spark Summit	EVENT	0.92+
five key things	QUANTITY	0.91+
single catalog	QUANTITY	0.9+
EUGDPR	TITLE	0.9+
one layer	QUANTITY	0.9+
Spark	PERSON	0.88+
Kafka	TITLE	0.86+
half a second	QUANTITY	0.84+
Databricks	ORGANIZATION	0.82+

Namik Hrle, IBM | IBM Think 2018

>> Narrator: Live, from Las Vegas, it's theCUBE, covering IBM Think 2018, brought to you by IBM. >> Welcome back to theCUBE. We are live on day one of the inaugural IBM Think 2018 event. I'm Lisa Martin with Dave Vellante, and we are in sunny Vegas at the Mandalay Bay, excited to welcome to theCUBE, one of the IBM Fellows, Namik Hrle, welcome to theCUBE. >> Thank you so much. >> So you are not only an IBM Fellow, but you're also an IBM analytics technical leadership team chair. Tell us about you're role on that technical leadership team. What are some of the things that you're helping to drive? And maybe even give us some of the customer feedback that you're helping to infiltrate into IBM's technical direction. >> Okay, so basically, technical leadership team is a group of top technical leaders in the IBM analytics group, and we are kind of chartered by evaluating the new technologies, providing the guidance to our business leaders into what to invest, what to de-invest, listening to our customer requirements, listening to how the customers actually using the technology, and making sure that IBM is timely there when it's needed. And also very important element of the technical leadership team is also to promote the innovation, innovative activities, particularly kind of grass roots innovative activities. Meaning helping our technical leaders across the analytics, to encourage them to come up with innovation, to present the ideas to that, to follow up on those, to potentially turn them into projects, and so on. So that's it. >> And guide them, or just sort of send them off to discover? >> As a matter of fact, we should be probably mostly sounding board, so not necessarily that this is coming from top down, but trying to encourage them, trying to incite them, trying to kind of make the innovative activity interesting, and also at the same time, make sure that they see that there's something coming out of it. It's not just they are coming up, and then nothing's happening, but trying also to turn that into the reality by working with our business developers, which, by the way, who, by the way, they control the resources, right? So, in order to do something like that. >> How much of it is guiding folks on who want to go down a certain path that maybe you know has been attempted before in that particular way, so you know what probably better to go elsewhere? Or, do you let them go and make the same mistake? Is there any of that? Like, don't go down that, don't go through that door. >> Well, as you can imagine, it's human attempt to say, Well, you know, I've already tried, already done. but you know we are really trying not to do that. >> Yeah >> We are trying not to do that, trying to have an open mind, because in this industry in which we are there's always new set of opportunities, and new conditions, and even if you are going to talk about our current topic, like fast data, and so on, I believe that many of these things have been around already, we just didn't know how how to actually, how to help, how to support something like that. But now, with the new set of the knowledge we can actually do that. >> So, let's get into the fast data. I mean, wasn't too long ago, we just asked earlier guest what inning are we at in IOT? He said the third inning. It wasn't long ago we were in the third inning of a dupe, and everything was batched, and then all of a sudden, big data changed, everything became streaming, real-time, fast data. What do you mean by fast data? What is it? What's the state of fast data inside IBM? >> Well, thank you for that question, because I also wanted when I was preparing bit of this interview, of course, I wanted first to, that we are all on the same page in terms of what fast data actually means right? And there's of course in our industry, it's full of hype and misunderstanding and everything else. And like many other things and concepts, actually it's not fundamentally newest thing. It's just the fact that the current state of technology, and enhancements in the technology, allow us to do something that we couldn't do before. So, the requirements for the fast data value proposition were always there, but right now technology allows us actually to derive the real time inside out of the data, irrespective of the data volume, variety, velocity. And when I just said that three V's, it sounds like big data, right? >> Dave: Yeah. >> And, as a matter of fact, there is a pretty large intersection with big data, but there's a huge difference. And the huge difference that typically big data is really associated with data at rest, while the fast data is really associated with data in motion. So the examples of that particular patterns are all over the place. I mean, you can think of like a click stream and stuff. You can think about ticker financial data right? You can think about manufacturing IOT data, sensors, locks. And the spectrum of industries that take advantage of that are all over the place. From financial and retail, from manufacturing, from utilities, all the way to advertising, to agriculture, and everything else. So, I like, for example, very often when I talk about fast data, people first drop immediately into let's say, you know this have YouTube streaming, or this is Facebook, Twitter, kinds of postings, and everything else. While this is true, and certainly there are business cases built on something like that, what interests me more are the huge cases, like for example Airbus, right? With 10,000 sensors in each of the wings, for using 7 terabytes of information per day, which, by the way, cannot be just dumped somewhere like before, and then do some batch processing on it. But you actually have to process that data right there, when it happens, that millisecond because, you know, the ramifications are pretty, pretty serious about that, right? Or take for example, opportunity in the utility industry, like in power, electricity, where the distributors and manufacturers really entice people to put this smart metering in place. So, they can basically measure the consumption of power, electricity, power basically on a hourly basis. And instead of giving you once yearly, kind of bill, of what it is, to know that all the time, what is the consumption, to react on spikes, to avoid blackouts, and to come up with a totally new set of business models in terms of, you know, offering some special incentives for spending or not spending, adding additional manufacturers, I mean, fantastic set of use cases. I believe that Carter said that by 2020, like 80% of the businesses will have some sort of situational awareness obligation, which is not a world of basically using this kind of capability, of event driven messaging. And I agree with that 100%. >> So it's data, fast data is data that is analyzed in real time. >> Namik: Right. >> Such that you can affect an outcome [Namik] Right. >> Before, what, before something bad happens? Before you lose the buyer? Before-- >> All over the place. You know, before fraud happens in financials, right? Before manufacturing lines breaks, right? Before, you know, airplanes, something happens with the airplane. So there are many, many, many examples of something like that, right? And when we talk about it, what we need to understand, again, even the technologies that are needed in order to deliver fast data, value propositions, are kind of known technologies. I mean, what do you really need? You need very scalable POP SOP messaging systems like Kafka, for example, right? In order to acquire the data. Then you need a system which is typically a streaming system, streams, and you have tons of offerings in the open source space, like, you know, Apache Spark streaming, you have Storm, you have Fling, Apache Fling products, as well as you have our IBM Stream. Typically it is for really the kind of enterprise for your service delivery. And then, very importantly, and this is something that I hope we will have time to talk today, is you you also need to be able to basically absorb that data. And not only do the analytics on the fly, but also to store that data and combine that with analytics with the data that is historical. And typically for that, if you read what people are kind of suggesting what to do, you have also lots of open source technology that can do that, like a Sombra, like some HDFS based systems, and so on. But what I'm saying is all of them come with this kind of complexity that yes, you can have land data somewhere, but then you need to put it somewhere else in order to do the analytics. And basically, you are introducing the latency between data production and data consumption. And this is why I believe that the technology like DB2 event store, that we announced just yesterday, is actually something that will come very, very interestingly, a very powerful part of the whole files data story. >> So, let's talk about that a little bit more. Fast data as a term, and thank you for clarifying what it means to IBM, isn't new, but to your point, as technology is evolving, it's opening up new opportunities, much like, it sounds like kind of the innovation lab that you have within IBM, there might be, Dave was asking, ideas that people bring that aren't new, maybe they were tried before, but maybe now there's new enabling technologies. Tell us about how is IBM enabling organizations, whether they're fast paced innovative start ups, to enterprise organizations, not create that sort of latency and actually achieve the business benefits that fast data can help them achieve today with today's, or rather technologies that you're announcing at the show. >> Right, right. So again, let's go through these stages that I said that every fast data technology and project and solution should really probably have. As I said, first of all you need to have some messaging POP system, and I believe that the systems like Kafka are absolutely enough for something like that. >> Dave: Sure. >> Then you need a system that's going to take this data off that fire hose coming from the cuff, which is stream, stream technology, but and as I said, lots of technologies in the open source, but IBM Stream as a technology is something that has also hundreds of different basically models, whether predictive analytics, whether it's prescriptive analytics, whether machine learning, basically kind of AI elements, text to speech. If you can apply on the data, on the wire, with the wire speed, so you need that kind of enterprise quality of service in terms of applying the analytics on the data that is streaming, and then we come to the DB2 event store, basically a repository for that fire hose data. Where you can put this data in the format in which you can basically, immediately, without any latency between data creation and data consumption, do the analytics on it. That's what we did with our DB2 event store. So, not only that we can ingest, like millions of events per second, literally millions and millions events per second, but we can also store that in a basically open format, which is tremendous value. Remember, any data based system basically in the past, stores data in its own format. So you have to use that system that created data, in order to consume that data. >> Dave: Sure. >> What event, DB2 event store does, is actually, it ingest that data, puts it into the format that you can use any kind of open source product, like for example, Spark Analytics, to do the analytics on the data. You could use Spark Machine Learning Libraries to do immediately kind of machine learning, modeling as well as scoring, on that data. So, I believe that that particular element of event store, coupled with a tremendous capability to acquire data, is what makes a really differentiation. >> And it does that how? Through a set of API's that allows it to be read? >> So, basically, when the data is coming off the hose, you know, off the streams or something like that, what event store actually does, it puts the data, it's basically in memory database right? It puts the data in memory, >> Dave: Something else that's been around forever. >> Exactly, something else yeah. We just have more of it, right? (laughing) And guess what? If it is in memory, it's going to be faster than if it is on disk. What a surprise. >> Yeah. (chuckling) >> So, of course, when put the data into the memory, and immediately makes it basically available for querying, if you need this data that just came in. But then, kind of asynchronously, offloads the data into basically Apache Parquet format. Into the columnar store. Basically allowing very powerful analytical capabilities immediately on the data. And again, if you like, you can go to the event store to query that data, but you don't have to. You can basically use any kind of tool, like Spark, like Titan or Anaconda Stack, to go after the data and do the analytics on it, to build the models on it, and so on. >> And that asynchronous transformation is fast? >> Asynchronous transformation is such that it gives you this data, which we now call historical data, basically in a minute. >> Dave: Okay. >> So it's kind of like minutes. >> So reasonable low latency. >> But what's very important to understand that actually the union of that data and the data that is in the memory on this one, we by the way, make transparent, can give you 100% what we call kind of almost transactional consistency of your queries against the data that is kind of coming in. So, it's really now a hybrid kind of store, of the memory, in the memory, very fast log, because also logging this data in order for to have it for high visibility across multiple things because this is highly scalable, I mean, it's highly what we call web scale kind of data base. And then parquet format for the open source storing of the data for historic analysis. >> Let's in our last 30 seconds or so, give us some examples, I know this was just announced, but maybe a customer genericize in terms of the business benefits that one of the beta customers is achieving leveraging this technology. >> So, in order for customers really to take advantage of all that, as I said, what I would suggest customers to do first of all to understand where the situation or where these applications actually make sense to them. Where the data is coming in fire hoses, not in the traditional transactional capabilities, but through the fire hose. Where does it come? And then apply these technologies, as I just said. Acquisition of the data, streaming on the wire, analytics, and then DB2 event store as the sort of the data. For all that, what you also need, just to tell you, you also need kind of messaging run time, which typically products like, for example, ACCA technology, and that's why we have also, we have entered also in partnership with the Liebmann in order to deliver the entire, kind of experience, for customer that want to build application that run on a fast data. >> So maybe enabling customers to become more proactive maybe predictive, eventually? >> To enable customers to take advantage of this tremendously business relevant data, that is, data that is coming in the, is it the click stream? Is it financial data? Is it IOT data? And to combine it with the assets that they already have, coming from transactions, well, that's a powerful combination. That basically they can build totally brand new business models, as well as enhance existing ones, to something that is going to, you know, improve productivity, for example, or improve the customer satisfaction, or grow the customer segments, and so on and so forth. >> Well, Namik, thank you so much for coming on theCUBE, and sharing the insight of the announcements. It's pretty cool, Dave, I'm sittin' between you, and an IBM Fellow. >> Yeah, that's uh-- >> It's pretty good for a Monday. It's Monday, isn't it? >> Thank you so much. >> Not easy becoming an IBM Fellow, so congratulations on that. >> Thank you so much. >> Lisa: And thanks, again. >> Thank you for having me. >> Lisa: Absolutely, our pleasure. For Dave Vellante, I'm Lisa Martin. We are live at Mandalay Bay in Las Vegas. Nice, sunny day today, where we are on our first day of three days of coverage at IBM Think 2018. Check out our CUBE conversations on thecube.net. Head over to siliconangle.com to find our articles on everything we've done so far at this event and other events, and what we'll be doing for the next few days. Stick around, Dave and I are going to be right back, with our next guest after a short break. (innovative music)

Published Date : Mar 19 2018

SUMMARY :

covering IBM Think 2018, brought to you by IBM. We are live on day one of the inaugural What are some of the things that you're helping to drive? providing the guidance to our business leaders So, in order to do something like that. before in that particular way, so you know what Well, as you can imagine, it's human attempt to say, and new conditions, and even if you are going to talk So, let's get into the fast data. and enhancements in the technology, allow us to do something of that are all over the place. So it's data, fast data is data that is analyzed Such that you can affect an outcome that yes, you can have land data somewhere, that you have within IBM, there might be, and I believe that the systems like Kafka off that fire hose coming from the cuff, it ingest that data, puts it into the format If it is in memory, it's going to be faster to query that data, but you don't have to. it gives you this data, which we now call that is in the memory on this one, we by the way, that one of the beta customers Acquisition of the data, streaming on the wire, to something that is going to, you know, and sharing the insight of the announcements. It's pretty good for a Monday. so congratulations on that. for the next few days.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Namik	PERSON	0.99+
Namik Hrle	PERSON	0.99+
100%	QUANTITY	0.99+
millions	QUANTITY	0.99+
Lisa	PERSON	0.99+
Las Vegas	LOCATION	0.99+
80%	QUANTITY	0.99+
10,000 sensors	QUANTITY	0.99+
Mandalay Bay	LOCATION	0.99+
Monday	DATE	0.99+
Liebmann	ORGANIZATION	0.99+
siliconangle.com	OTHER	0.99+
2020	DATE	0.99+
Carter	PERSON	0.99+
three days	QUANTITY	0.99+
third inning	QUANTITY	0.99+
Apache	ORGANIZATION	0.99+
thecube.net	OTHER	0.99+
first	QUANTITY	0.99+
yesterday	DATE	0.99+
today	DATE	0.99+
hundreds	QUANTITY	0.98+
IBM Think 2018	EVENT	0.98+
Facebook	ORGANIZATION	0.98+
Kafka	TITLE	0.98+
Airbus	ORGANIZATION	0.98+
Spark	TITLE	0.98+
Namik	ORGANIZATION	0.97+
YouTube	ORGANIZATION	0.97+
Twitter	ORGANIZATION	0.96+
first day	QUANTITY	0.96+
Spark Analytics	TITLE	0.95+
Anaconda Stack	TITLE	0.95+
DB2	TITLE	0.95+
Titan	TITLE	0.94+
millions of events per second	QUANTITY	0.94+
three V	QUANTITY	0.92+
a minute	QUANTITY	0.92+
millions events per second	QUANTITY	0.89+
day one	QUANTITY	0.88+
Stream	COMMERCIAL_ITEM	0.86+
each	QUANTITY	0.84+
7 terabytes of information	QUANTITY	0.75+
one	QUANTITY	0.74+
Fling	TITLE	0.71+
DB2	EVENT	0.66+
Storm	TITLE	0.65+
ACCA	ORGANIZATION	0.64+
theCUBE	ORGANIZATION	0.64+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Spark Analytics: