Joel Horwitz, IBM | IBM CDO Summit Sping 2018
(techno music) >> Announcer: Live, from downtown San Francisco, it's theCUBE. Covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody, this is theCUBE, the leader in live tech coverage. We're here at the Parc 55 in San Francisco covering the IBM CDO Strategy Summit. I'm here with Joel Horwitz who's the Vice President of Digital Partnerships & Offerings at IBM. Good to see you again Joel. >> Thanks, great to be here, thanks for having me. >> So I was just, you're very welcome- It was just, let's see, was it last month, at Think? >> Yeah, it's hard to keep track, right. >> And we were talking about your new role- >> It's been a busy year. >> the importance of partnerships. One of the things I want to, well let's talk about your role, but I really want to get into, it's innovation. And we talked about this at Think, because it's so critical, in my opinion anyway, that you can attract partnerships, innovation partnerships, startups, established companies, et cetera. >> Joel: Yeah. >> To really help drive that innovation, it takes a team of people, IBM can't do it on its own. >> Yeah, I mean look, IBM is the leader in innovation, as we all know. We're the market leader for patents, that we put out each year, and how you get that technology in the hands of the real innovators, the developers, the longtail ISVs, our partners out there, that's the challenging part at times, and so what we've been up to is really looking at how we make it easier for partners to partner with IBM. How we make it easier for developers to work with IBM. So we have a number of areas that we've been adding, so for example, we've added a whole IBM Code portal, so if you go to developer.ibm.com/code you can actually see hundreds of code patterns that we've created to help really any client, any partner, get started using IBM's technology, and to innovate. >> Yeah, and that's critical, I mean you're right, because to me innovation is a combination of invention, which is what you guys do really, and then it's adoption, which is what your customers are all about. You come from the data science world. We're here at the Chief Data Officer Summit, what's the intersection between data science and CDOs? What are you seeing there? >> Yeah, so when I was here last, it was about two years ago in 2015, actually, maybe three years ago, man, time flies when you're having fun. >> Dave: Yeah, the Spark Summit- >> Yeah Spark Technology Center and the Spark Summit, and we were here, I was here at the Chief Data Officer Summit. And it was great, and at that time, I think a lot of the conversation was really not that different than what I'm seeing today. Which is, how do you manage all of your data assets? I think a big part of doing good data science, which is my kind of background, is really having a good understanding of what your data governance is, what your data catalog is, so, you know we introduced the Watson Studio at Think, and actually, what's nice about that, is it brings a lot of this together. So if you look in the market, in the data market, today, you know we used to segment it by a few things, like data gravity, data movement, data science, and data governance. And those are kind of the four themes that I continue to see. And so outside of IBM, I would contend that those are relatively separate kind of tools that are disconnected, in fact Dinesh Nirmal, who's our engineer on the analytic side, Head of Development there, he wrote a great blog just recently, about how you can have some great machine learning, you have some great data, but if you can't operationalize that, then really you can't put it to use. And so it's funny to me because we've been focused on this challenge, and IBM is making the right steps, in my, I'm obviously biased, but we're making some great strides toward unifying the, this tool chain. Which is data management, to data science, to operationalizing, you know, machine learning. So that's what we're starting to see with Watson Studio. >> Well, I always push Dinesh on this and like okay, you've got a collection of tools, but are you bringing those together? And he flat-out says no, we developed this, a lot of this from scratch. Yes, we bring in the best of the knowledge that we have there, but we're not trying to just cobble together a bunch of disparate tools with a UI layer. >> Right, right. >> It's really a fundamental foundation that you're trying to build. >> Well, what's really interesting about that, that piece, is that yeah, I think a lot of folks have cobbled together a UI layer, so we formed a partnership, coming back to the partnership view, with a company called Lightbend, who's based here in San Francisco, as well as in Europe, and the reason why we did that, wasn't just because of the fact that Reactive development, if you're not familiar with Reactive, it's essentially Scala, Akka, Play, this whole framework, that basically allows developers to write once, and it kind of scales up with demand. In fact, Verizon actually used our platform with Lightbend to launch the iPhone 10. And they show dramatic improvements. Now what's exciting about Lightbend, is the fact that application developers are developing with Reactive, but if you turn around, you'll also now be able to operationalize models with Reactive as well. Because it's basically a single platform to move between these two worlds. So what we've continued to see is data science kind of separate from the application world. Really kind of, AI and cloud as different universes. The reality is that for any enterprise, or any company, to really innovate, you have to find a way to bring those two worlds together, to get the most use out of it. >> Fourier always says "Data is the new development kit". He said this I think five or six years ago, and it's barely becoming true. You guys have tried to make an attempt, and have done a pretty good job, of trying to bring those worlds together in a single platform, what do you call it? The Watson Data Platform? >> Yeah, Watson Data Platform, now Watson Studio, and I think the other, so one side of it is, us trying to, not really trying, but us actually bringing together these disparate systems. I mean we are kind of a systems company, we're IT. But not only that, but bringing our trained algorithms, and our trained models to the developers. So for example, we also did a partnership with Unity, at the end of last year, that's now just reaching some pretty good growth, in terms of bringing the Watson SDK to game developers on the Unity platform. So again, it's this idea of bringing the game developer, the application developer, in closer contact with these trained models, and these trained algorithms. And that's where you're seeing incredible things happen. So for example, Star Trek Bridge Crew, which I don't know how many Trekkies we have here at the CDO Summit. >> A few over here probably. >> Yeah, a couple? They're using our SDK in Unity, to basically allow a gamer to use voice commands through the headset, through a VR headset, to talk to other players in the virtual game. So we're going to see more, I can't really disclose too much what we're doing there, but there's some cool stuff coming out of that partnership. >> Real immersive experience driving a lot of data. Now you're part of the Digital Business Group. I like the term digital business, because we talk about it all the time. Digital business, what's the difference between a digital business and a business? What's the, how they use data. >> Joel: Yeah. >> You're a data person, what does that mean? That you're part of the Digital Business Group? Is that an internal facing thing? An external facing thing? Both? >> It's really both. So our Chief Digital Officer, Bob Lord, he has a presentation that he'll give, where he starts out, and he goes, when I tell people I'm the Chief Digital Officer they usually think I just manage the website. You know, if I tell people I'm a Chief Data Officer, it means I manage our data, in governance over here. The reality is that I think these Chief Digital Officer, Chief Data Officer, they're really responsible for business transformation. And so, if you actually look at what we're doing, I think on both sides is we're using data, we're using marketing technology, martech, like Optimizely, like Segment, like some of these great partners of ours, to really look at how we can quickly A/B test, get user feedback, to look at how we actually test different offerings and market. And so really what we're doing is we're setting up a testing platform, to bring not only our traditional offers to market, like DB2, Mainframe, et cetera, but also bring new offers to market, like blockchain, and quantum, and others, and actually figure out how we get better product-market fit. What actually, one thing, one story that comes to mind, is if you've seen the movie Hidden Figures- >> Oh yeah. >> There's this scene where Kevin Costner, I know this is going to look not great for IBM, but I'm going to say it anyways, which is Kevin Costner has like a sledgehammer, and he's like trying to break down the wall to get the mainframe in the room. That's what it feels like sometimes, 'cause we create the best technology, but we forget sometimes about the last mile. You know like, we got to break down the wall. >> Where am I going to put it? >> You know, to get it in the room! So, honestly I think that's a lot of what we're doing. We're bridging that last mile, between these different audiences. So between developers, between ISVs, between commercial buyers. Like how do we actually make this technology, not just accessible to large enterprise, which are our main clients, but also to the other ecosystems, and other audiences out there. >> Well so that's interesting Joel, because as a potential partner of IBM, they want, obviously your go-to-market, your massive company, and great distribution channel. But at the same time, you want more than that. You know you want to have a closer, IBM always focuses on partnerships that have intrinsic value. So you talked about offerings, you talked about quantum, blockchain, off-camera talking about cloud containers. >> Joel: Yeah. >> I'd say cloud and containers may be a little closer than those others, but those others are going to take a lot of market development. So what are the offerings that you guys are bringing? How do they get into the hands of your partners? >> I mean, the commonality with all of these, all the emerging offerings, if you ask me, is the distributed nature of the offering. So if you look at blockchain, it's a distributed ledger. It's a distributed transaction chain that's secure. If you look at data, really and we can hark back to say, Hadoop, right before object storage, it's distributed storage, so it's not just storing on your hard drive locally, it's storing on a distributed network of servers that are all over the world and data centers. If you look at cloud, and containers, what you're really doing is not running your application on an individual server that can go down. You're using containers because you want to distribute that application over a large network of servers, so that if one server goes down, you're not going to be hosed. And so I think the fundamental shift that you're seeing is this distributed nature, which in essence is cloud. So I think cloud is just kind of a synonym, in my opinion, for distributed nature of our business. >> That's interesting and that brings up, you're right, cloud and Big Data/Hadoop, we don't talk about Hadoop much anymore, but it kind of got it all started, with that notion of leave the data where it is. And it's the same thing with cloud. You can't just stuff your business into the public cloud. You got to bring the cloud to your data. >> Joel: That's right. >> But that brings up a whole new set of challenges, which obviously, you're in a position just to help solve. Performance, latency, physics come into play. >> Physics is a rough one. It's kind of hard to avoid that one. >> I hear your best people are working on it though. Some other partnerships that you want to sort of, elucidate. >> Yeah, no, I mean we have some really great, so I think the key kind of partnership, I would say area, that I would allude to is, one of the things, and you kind of referenced this, is a lot of our partners, big or small, want to work with our top clients. So they want to work with our top banking clients. They want, 'cause these are, if you look at for example, MaRisk and what we're doing with them around blockchain, and frankly, talk about innovation, they're innovating containers for real, not virtual containers- >> And that's a joint venture right? >> Yeah, it is, and so it's exciting because, what we're bringing to market is, I also lead our startup programs, called the Global Entrepreneurship Program, and so what I'm focused on doing, and you'll probably see more to come this quarter, is how do we actually bridge that end-to-end? How do you, if you're startup or a small business, ultimately reach that kind of global business partner level? And so kind of bridging that, that end-to-end. So we're starting to bring out a number of different incentives for partners, like co-marketing, so I'll help startups when they're early, figure out product-market fit. We'll give you free credits to use our innovative technology, and we'll also bring you into a number of clients, to basically help you not burn all of your cash on creating your own marketing channel. God knows I did that when I was at a start-up. So I think we're doing a lot to kind of bridge that end-to-end, and help any partner kind of come in, and then grow with IBM. I think that's where we're headed. >> I think that's a critical part of your job. Because I mean, obviously IBM is known for its Global 2000, big enterprise presence, but startups, again, fuel that innovation fire. So being able to attract them, which you're proving you can, providing whatever it is, access, early access to cloud services, or like you say, these other offerings that you're producing, in addition to that go-to-market, 'cause it's funny, we always talk about how efficient, capital efficient, software is, but then you have these companies raising hundreds of millions of dollars, why? Because they got to do promotion, marketing, sales, you know, go-to-market. >> Yeah, it's really expensive. I mean, you look at most startups, like their biggest ticket item is usually marketing and sales. And building channels, and so yeah, if you're, you know we're talking to a number of partners who want to work with us because of the fact that, it's not just like, the direct kind of channel, it's also, as you kind of mentioned, there's other challenges that you have to overcome when you're working with a larger company. for example, security is a big one, GDPR compliance now, is a big one, and just making sure that things don't fall over, is a big one. And so a lot of partners work with us because ultimately, a number of the decision makers in these larger enterprises are going, well, I trust IBM, and if IBM says you're good, then I believe you. And so that's where we're kind of starting to pull partners in, and pull an ecosystem towards us. Because of the fact that we can take them through that level of certification. So we have a number of free online courses. So if you go to partners, excuse me, ibm.com/partners/learn there's a number of blockchain courses that you can learn today, and will actually give you a digital certificate, that's actually certified on our own blockchain, which we're actually a first of a kind to do that, which I think is pretty slick, and it's accredited at some of the universities. So I think that's where people are looking to IBM, and other leaders in this industry, is to help them become experts in their, in this technology, and especially in this emerging technology. >> I love that blockchain actually, because it's such a growing, and interesting, and innovative field. But it needs players like IBM, that can bring credibility, enterprise-grade, whether it's security, or just, as I say, credibility. 'Cause you know, this is, so much of negative connotations associated with blockchain and crypto, but companies like IBM coming to the table, enterprise companies, and building that ecosystem out is in my view, crucial. >> Yeah, no, it takes a village. I mean, there's a lot of folks, I mean that's a big reason why I came to IBM, three, four years ago, was because when I was in start-up land, I used to work for H20, I worked for Alpine Data Labs, Datameer, back in the Hadoop days, and what I realized was that, it's an opportunity cost. So you can't really drive true global innovation, transformation, in some of these bigger companies because there's only so much that you can really kind of bite off. And so you know at IBM it's been a really rewarding experience because we have done things like for example, we partnered with Girls Who Code, Treehouse, Udacity. So there's a number of early educators that we've partnered with, to bring code to, to bring technology to, that frankly, would never have access to some of this stuff. Some of this technology, if we didn't form these alliances, and if we didn't join these partnerships. So I'm very excited about the future of IBM, and I'm very excited about the future of what our partners are doing with IBM, because, geez, you know the cloud, and everything that we're doing to make this accessible, is bar none, I mean, it's great. >> I can tell you're excited. You know, spring in your step. Always a lot of energy Joel, really appreciate you coming onto theCUBE. >> Joel: My pleasure. >> Great to see you again. >> Yeah, thanks Dave. >> You're welcome. Alright keep it right there, everybody. We'll be back. We're at the IBM CDO Strategy Summit in San Francisco. You're watching theCUBE. (techno music) (touch-tone phone beeps)
SUMMARY :
Brought to you by IBM. Good to see you again Joel. that you can attract partnerships, To really help drive that innovation, and how you get that technology Yeah, and that's critical, I mean you're right, Yeah, so when I was here last, to operationalizing, you know, machine learning. that we have there, but we're not trying that you're trying to build. to really innovate, you have to find a way in a single platform, what do you call it? So for example, we also did a partnership with Unity, to basically allow a gamer to use voice commands I like the term digital business, to look at how we actually test different I know this is going to look not great for IBM, but also to the other ecosystems, But at the same time, you want more than that. So what are the offerings that you guys are bringing? So if you look at blockchain, it's a distributed ledger. You got to bring the cloud to your data. But that brings up a whole new set of challenges, It's kind of hard to avoid that one. Some other partnerships that you want to sort of, elucidate. and you kind of referenced this, to basically help you not burn all of your cash early access to cloud services, or like you say, that you can learn today, but companies like IBM coming to the table, that you can really kind of bite off. really appreciate you coming onto theCUBE. We're at the IBM CDO Strategy Summit in San Francisco.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Joel | PERSON | 0.99+ |
Joel Horwitz | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Kevin Costner | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dinesh Nirmal | PERSON | 0.99+ |
Alpine Data Labs | ORGANIZATION | 0.99+ |
Lightbend | ORGANIZATION | 0.99+ |
Verizon | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Hidden Figures | TITLE | 0.99+ |
Bob Lord | PERSON | 0.99+ |
Both | QUANTITY | 0.99+ |
MaRisk | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
iPhone 10 | COMMERCIAL_ITEM | 0.99+ |
2015 | DATE | 0.99+ |
Datameer | ORGANIZATION | 0.99+ |
both sides | QUANTITY | 0.99+ |
one story | QUANTITY | 0.99+ |
Think | ORGANIZATION | 0.99+ |
five | DATE | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Treehouse | ORGANIZATION | 0.99+ |
three years ago | DATE | 0.99+ |
developer.ibm.com/code | OTHER | 0.99+ |
Unity | ORGANIZATION | 0.98+ |
two worlds | QUANTITY | 0.98+ |
Reactive | ORGANIZATION | 0.98+ |
GDPR | TITLE | 0.98+ |
one side | QUANTITY | 0.98+ |
Digital Business Group | ORGANIZATION | 0.98+ |
today | DATE | 0.98+ |
Udacity | ORGANIZATION | 0.98+ |
ibm.com/partners/learn | OTHER | 0.98+ |
last month | DATE | 0.98+ |
Watson Studio | ORGANIZATION | 0.98+ |
each year | QUANTITY | 0.97+ |
three | DATE | 0.97+ |
single platform | QUANTITY | 0.97+ |
Girls Who Code | ORGANIZATION | 0.97+ |
Parc 55 | LOCATION | 0.97+ |
one thing | QUANTITY | 0.97+ |
four themes | QUANTITY | 0.97+ |
Spark Technology Center | ORGANIZATION | 0.97+ |
six years ago | DATE | 0.97+ |
H20 | ORGANIZATION | 0.97+ |
four years ago | DATE | 0.97+ |
martech | ORGANIZATION | 0.97+ |
Unity | TITLE | 0.96+ |
hundreds of millions of dollars | QUANTITY | 0.94+ |
Watson Studio | TITLE | 0.94+ |
Dinesh | PERSON | 0.93+ |
one server | QUANTITY | 0.93+ |
Frederick Reiss, IBM STC - Big Data SV 2017 - #BigDataSV - #theCUBE
>> Narrator: Live from San Jose, California it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Big Data SV 2016, day two of our wall to wall coverage of Strata Hadoob Conference, Big Data SV, really what we call Big Data Week because this is where all the action is going on down in San Jose. We're at the historic Pagoda Lounge in the back of the Faramount, come on by and say hello, we've got a really cool space and we're excited and never been in this space before, so we're excited to be here. So we got George Gilbert here from Wiki, we're really excited to have our next guest, he's Fred Rice, he's the chief architect at IBM Spark Technology Center in San Francisco. Fred, great to see you. >> Thank you, Jeff. >> So I remember when Rob Thomas, we went up and met with him in San Francisco when you guys first opened the Spark Technology Center a couple of years now. Give us an update on what's going on there, I know IBM's putting a lot of investment in this Spark Technology Center in the San Francisco office specifically. Give us kind of an update of what's going on. >> That's right, Jeff. Now we're in the new Watson West building in San Francisco on 505 Howard Street, colocated, we have about a 50 person development organization. Right next to us we have about 25 designers and on the same floor a lot of developers from Watson doing a lot of data science, from the weather underground, doing weather and data analysis, so it's a really exciting place to be, lots of interesting work in data science going on there. >> And it's really great to see how IBM is taking the core Watson, obviously enabled by Spark and other core open source technology and now applying it, we're seeing Watson for Health, Watson for Thomas Vehicles, Watson for Marketing, Watson for this, and really bringing that type of machine learning power to all the various verticals in which you guys play. >> Absolutely, that's been what Watson has been about from the very beginning, bringing the power of machine learning, the power of artificial intelligence to real world applications. >> Jeff: Excellent. >> So let's tie it back to the Spark community. Most folks understand how data bricks builds out the core or does most of the core work for, like, the sequel workload the streaming and machine learning and I guess graph is still immature. We were talking earlier about IBM's contributions in helping to build up the machine learning side. Help us understand what the data bricks core technology for machine learning is and how IBM is building beyond that. >> So the core technology for machine learning in Apache Spark comes out, actually, of the machine learning department at UC Berkeley as well as a lot of different memories from the community. Some of those community members also work for data bricks. We actually at the IBM Spark Technology Center have made a number of contributions to the core Apache Spark and the libraries, for example recent contributions in neural nets. In addition to that, we also work on a project called Apache System ML, which used to be proprietary IBM technology, but the IBM Spark Technology Center has turned System ML into Apache System ML, it's now an open Apache incubating project that's been moving forward out in the open. You can now download the latest release online and that provides a piece that we saw was missing from Spark and a lot of other similar environments and optimizer for machine learning algorithms. So in Spark, you have the catalyst optimizer for data analysis, data frames, sequel, you write your queries in terms of those high level APIs and catalyst figures out how to make them go fast. In System ML, we have an optimizer for high level languages like Spark and Python where you can write algorithms in terms of linear algebra, in terms of high level operations on matrices and vectors and have the optimizer take care of making those algorithms run in parallel, run in scale, taking account of the data characteristics. Does the data fit in memory, and if so, keep it in memory. Does the data not fit in memory? Stream it from desk. >> Okay, so there was a ton of stuff in there. >> Fred: Yep. >> And if I were to refer to that as so densely packed as to be a black hole, that might come across wrong, so I won't refer to that as a black hole. But let's unpack that, so the, and I meant that in a good way, like high bandwidth, you know. >> Fred: Thanks, George. >> Um, so the traditional Spark, the machine learning that comes with Spark's ML lib, one of it's distinguishing characteristics is that the models, the algorithms that are in there, have been built to run on a cluster. >> Fred: That's right. >> And very few have, very few others have built machine learning algorithms to run on a cluster, but as you were saying, you don't really have an optimizer for finding something where a couple of the algorithms would be fit optimally to solve a problem. Help us understand, then, how System ML solves a more general problem for, say, ensemble models and for scale out, I guess I'm, help us understand how System ML fits relative to Sparks ML lib and the more general problems it can solve. >> So, ML Live and a lot of other packages such as Sparking Water from H20, for example, provide you with a toolbox of algorithms and each of those algorithms has been hand tuned for a particular range of problem sizes and problem characteristics. This works great as long as the particular problem you're facing as a data scientist is a good match to that implementation that you have in your toolbox. What System ML provides is less like having a toolbox and more like having a machine shop. You can, you have a lot more flexibility, you have a lot more power, you can write down an algorithm as you would write it down if you were implementing it just to run on your laptop and then let the System ML optimizer take care of producing a parallel version of that algorithm that is customized to the characteristics of your cluster, customized to the characteristics of your data. >> So let me stop you right there, because I want to use an analogy that others might find easy to relate to for all the people who understand sequel and scale out sequel. So, the way you were describing it, it sounds like oh, if I were a sequel developer and I wanted to get at some data on my laptop, I would find it pretty easy to write the sequel to do that. Now, let's say I had a bunch of servers, each with it's own database, and I wanted to get data from each database. If I didn't have a scale out database, I would have to figure out physically how to go to each server in the cluster to get it. What I'm hearing for System ML is it will take that query that I might have written on my one server and it will transparently figure out how to scale that out, although in this case not queries, machine learning algorithms. >> The database analogy is very apt. Just like sequel and query optimization by allowing you to separate that logical description of what you're looking for from the physical description of how to get at it. Lets you have a parallel database with the exact same language as a single machine database. In System ML, because we have an optimizer that separates that logical description of the machine learning algorithm from the physical implementation, we can target a lot of parallel systems, we can also target a large server and the code, the code that implements the algorithm stays the same. >> Okay, now let's take that a step further. You refer to matrix math and I think linear algebra and a whole lot of other things that I never quite made it to since I was a humanities major but when we're talking about those things, my understanding is that those are primitives that Spark doesn't really implement so that if you wanted to do neural nets, which relies on some of those constructs for high performance, >> Fred: Yes. >> Then, um, that's not built into Spark. Can you get to that capability using System ML? >> Yes. System ML edits core, provides you with a library, provides you as a user with a library of machine, rather, linear algebra primitives, just like a language like r or a library like Mumpai gives you matrices and vectors and all of the operations you can do on top of those primitives. And just to be clear, linear algebra really is the language of machine learning. If you pick up a paper about an advanced machine learning algorithm, chances are the specification for what that algorithm does and how that algorithm works is going to be written in the paper literally in linear algebra and the implementation that was used in that paper is probably written in the language where linear algebra is built in, like r, like Mumpai. >> So it sounds to me like Spark has done the work of sort of the blocking and tackling of machine learning to run in parallel. And that's I mean, to be clear, since we haven't really talked about it, that's important when you're handling data at scale and you want to train, you know, models on very, very large data sets. But it sounds like when we want to go to some of the more advanced machine learning capabilities, the ones that today are making all the noise with, you know, speech to text, text to speech, natural language, understanding those neural network based capabilities are not built into the core Spark ML lib, that, would it be fair to say you could start getting at them through System ML? >> Yes, System ML is a much better way to do scalable linear algebra on top of Spark than the very limited linear algebra that's built into Spark. >> So alright, let's take the next step. Can System ML be grafted onto Spark in some way or would it have to be in an entirely new API that doesn't take, integrate with all the other Spark APIs? In a way, that has differentiated Spark, where each API is sort of accessible from every other. Can you tie System ML in or do the Spark guys have to build more primitives into their own sort of engine first? >> A lot of the work that we've done with the Spark Technology Center as part of bringing System ML into the Apache ecosystem has been to build a nice, tight integration with Apache Spark so you can pass Spark data frames directly into System ML you can get data frames back. Your System ML algorithm, once you've written it, in terms of one of System ML's main systematic languages it just plugs into Spark like all the algorithms that are built into Spark. >> Okay, so that's, that would keep Spark competitive with more advanced machine learning frameworks for a longer period of time, in other words, it wouldn't hit the wall the way if would if it encountered tensor flow from Google for Google's way of doing deep learning, Spark wouldn't hit the wall once it needed, like, a tensor flow as long as it had System ML so deeply integrated the way you're doing it. >> Right, with a system like System ML, you can quickly move into new domains of machine learning. So for example, this afternoon I'm going to give a talk with one of our machine learning developers, Mike Dusenberry, about our recent efforts to implement deep learning in System ML, like full scale, convolutional neural nets running on a cluster in parallel processing many gigabytes of images, and we implemented that with very little effort because we have this optimizer underneath that takes care of a lot of the details of how you get that data into the processing, how you get the data spread across the cluster, how you get the processing moved to the data or vice versa. All those decisions are taken care of in the optimizer, you just write down the linear algebra parts and let the system take care of it. That let us implement deep learning much more quickly than we would have if we had done it from scratch. >> So it's just this ongoing cadence of basically removing the infrastructure gut management from the data scientists and enabling them to concentrate really where their value is is on the algorithms themselves, so they don't have to worry about how many clusters it's running on, and that configuration kind of typical dev ops that we see on the regular development side, but now you're really bringing that into the machine learning space. >> That's right, Jeff. Personally, I find all the minutia of making a parallel algorithm worked really fascinating but a lot of people working in data science really see parallelism as a tool. They want to solve the data science problem and System ML lets you focus on solving the data science problem because the system takes care of the parallelism. >> You guys could go on in the weeds for probably three hours but we don't have enough coffee and we're going to set up a follow up time because you're both in San Francisco. But before we let you go, Fred, as you look forward into 2017, kind of the advances that you guys have done there at the IBM Spark Center in the city, what's kind of the next couple great hurdles that you're looking to cross, new challenges that are getting you up every morning that you're excited to come back a year from now and be able to say wow, these are the one or two things that we were able to take down in 2017? >> We're moving forward on several different fronts this year. On one front, we're helping to get the notebook experience with Spark notebooks consistent across the entire IBM product portfolio. We helped a lot with the rollout of notebooks on data science experience on z, for example, and we're working actively with the data science experience and with the Watson data platform. On the other hand, we're contributing to Spark 2.2. There are some exciting features, particularly in sequel that we're hoping to get into that release as well as some new improvements to ML Live. We're moving forward with Apache System ML, we just cut Version 0.13 of that. We're talking right now on the mailing list about getting System ML out of incubation, making it a full, top level project. And we're also continuing to help with the adoption of Apache Spark technology in the enterprise. Our latest focus has been on deep learning on Spark. >> Well, I think we found him! Smartest guy in the room. (laughter) Thanks for stopping by and good luck on your talk this afternoon. >> Thank you, Jeff. >> Absolutely. Alright, he's Fred Rice, he's George Gilbert, and I'm Jeff Rick, you're watching the Cube from Big Data SV, part of Big Data Week in San Jose, California. (upbeat music) (mellow music) >> Hi, I'm John Furrier, the cofounder of SiliconANGLE Media cohost of the Cube. I've been in the tech business since I was 19, first programming on mini computers.
SUMMARY :
it's the Cube, covering Big Data Silicon Valley 2017. in the back of the Faramount, come on by and say hello, in the San Francisco office specifically. and on the same floor a lot of developers from Watson to all the various verticals in which you guys play. of machine learning, the power of artificial intelligence or does most of the core work for, like, the sequel workload and have the optimizer take care of making those algorithms and I meant that in a good way, is that the models, the algorithms that are in there, and the more general problems it can solve. to that implementation that you have in your toolbox. in the cluster to get it. and the code, the code that implements the algorithm so that if you wanted to do neural nets, Can you get to that capability using System ML? and all of the operations you can do the ones that today are making all the noise with, you know, linear algebra on top of Spark than the very limited So alright, let's take the next step. System ML into the Apache ecosystem has been to build so deeply integrated the way you're doing it. and let the system take care of it. is on the algorithms themselves, so they don't have to worry because the system takes care of the parallelism. into 2017, kind of the advances that you guys have done of Apache Spark technology in the enterprise. Smartest guy in the room. and I'm Jeff Rick, you're watching the Cube cohost of the Cube.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
George Gilbert | PERSON | 0.99+ |
Jeff Rick | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Jeff | PERSON | 0.99+ |
Fred Rice | PERSON | 0.99+ |
Mike Dusenberry | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
San Francisco | LOCATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
505 Howard Street | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Frederick Reiss | PERSON | 0.99+ |
Spark Technology Center | ORGANIZATION | 0.99+ |
Fred | PERSON | 0.99+ |
IBM Spark Technology Center | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
Spark 2.2 | TITLE | 0.99+ |
three hours | QUANTITY | 0.99+ |
Watson | ORGANIZATION | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
one server | QUANTITY | 0.99+ |
Spark | TITLE | 0.99+ |
SiliconANGLE Media | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
each server | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
each database | QUANTITY | 0.98+ |
Big Data Week | EVENT | 0.98+ |
Pagoda Lounge | LOCATION | 0.98+ |
Strata Hadoob Conference | EVENT | 0.98+ |
System ML | TITLE | 0.98+ |
Big Data SV | EVENT | 0.97+ |
each API | QUANTITY | 0.97+ |
ML Live | TITLE | 0.96+ |
today | DATE | 0.96+ |
Thomas Vehicles | ORGANIZATION | 0.96+ |
Apache System ML | TITLE | 0.95+ |
Big Data | EVENT | 0.95+ |
Apache Spark | TITLE | 0.94+ |
Watson for Marketing | ORGANIZATION | 0.94+ |
Sparking Water | TITLE | 0.94+ |
first | QUANTITY | 0.94+ |
one front | QUANTITY | 0.94+ |
Big Data SV 2016 | EVENT | 0.94+ |
IBM Spark Technology Center | ORGANIZATION | 0.94+ |
about 25 designers | QUANTITY | 0.93+ |
Nick Pentreath, IBM STC - Spark Summit East 2017 - #sparksummit - #theCUBE
>> Narrator: Live from Boston, Massachusetts, this is The Cube, covering Spark Summit East 2017. Brought to you by Data Bricks. Now, here are your hosts, Dave Valente and George Gilbert. >> Boston, everybody. Nick Pentry this year, he's a principal engineer a the IBM Spark Technology Center in South Africa. Welcome to The Cube. >> Thank you. >> Great to see you. >> Great to see you. >> So let's see, it's a different time of year, here that you're used to. >> I've flown from, I don't know the Fahrenheit's equivalent, but 30 degrees Celsius heat and sunshine to snow and sleet, so. >> Yeah, yeah. So it's a lot chillier there. Wait until tomorrow. But, so we were joking. You probably get the T-shirt for the longest flight here, so welcome. >> Yeah, I actually need the parka, or like a beanie. (all laugh) >> Little better. Long sleeve. So Nick, tell us about the Spark Technology Center, STC is its acronym and your role, there. >> Sure, yeah, thank you. So Spark Technology Center was formed by IBM a little over a year ago, and its mission is to focus on the Open Source world, particularly Apache Spark and the ecosystem around that, and to really drive forward the community and to make contributions to both the core project and the ecosystem. The overarching goal is to help drive adoption, yeah, and particularly enterprise customers, the kind of customers that IBM typically serves. And to harden Spark and to make it really enterprise ready. >> So why Spark? I mean, we've watched IBM do this now for several years. The famous example that I like to use is Linux. When IBM put $1 billion into Linux, it really went all in on Open Source, and it drove a lot of IBM value, both internally and externally for customers. So what was it about Spark? I mean, you could have made a similar bet on Hadoop. You decided not to, you sort of waited to see that market evolve. What was the catalyst for having you guys all go in on Spark? >> Yeah, good question. I don't know all the details, certainly, of what was the internal drivers because I joined HTC a little under a year ago, so I'm fairly new. >> Translate the hallway talk, maybe. (Nick laughs) >> Essentially, I think you raise very good parallels to Linux and also Java. >> Absolutely. >> So Spark, sorry, IBM, made these investments and Open Source technologies that had ceased to be transformational and kind of game-changing. And I think, you know, most people will probably admit within IBM that they maybe missed the boat, actually, on Hadoop and saw Spark as the successor and actually saw a chance to really dive into that and kind of almost leap frog and say, "We're going to "back this as the next generation analytics platform "and operating system for analytics "and big debt in the enterprise." >> Well, I don't know if you happened to watch the Super Bowl, but there's a saying that it's sometimes better to be lucky than good. (Nick laughs) And that sort of applies, and so, in some respects, maybe missing the window on Hadoop was not a bad thing for IBM >> Yeah, exactly because not a lot of people made a ton of dough on Hadoop and they're still sort of struggling to figure it out. And now along comes Spark, and you've got this more real time nature. IBM talks a lot about bringing analytics and transactions together. They've made some announcements about that and affecting business outcomes in near real time. I mean, that's really what it's all about and one of your areas of expertise is machine learning. And so, talk about that relationship and what it means for organizations, your mission. >> Yeah, machine learning is a key part of the mission. And you've seen the kind of big debt in enterprise story, starting with the kind of Hadoop and data lakes. And that's evolved into, now we've, before we just dumped all of this data into these data lakes and these silos and maybe we had some Hadoop jobs and so on. But now we've got all this data we can store, what are we actually going to do with it? So part of that is the traditional data warehousing and business intelligence and analytics, but more and more, we're seeing there's a rich value in this data, and to unlock it, you really need intelligent systems. You need machine learning, you need AI, you need real time decision making that starts transcending the boundaries of all the rule-based systems and human-based systems. So we see machine learning as one of the key tools and one of the key unlockers of value in these enterprise data stores. >> So Nick, perhaps paint us a picture of someone who's advanced enough to be working with machine learning with BMI and we know that the tool chain's kind of immature. Although, IBM with Data Works or Data First has a fairly broad end-to-end sort of suit of tools, but what are the early-use cases? And what needs to mature to go into higher volume production apps or higher-value production apps? >> I think the early-use cases for machine learning in general and certainly at scale are numerous and they're growing, but classic examples are, let's say, recommendation engines. That's an area that's close to my heart. In my previous life before IBM, I bought the startup that had a recommendation engine service targeting online stores and new commerce players and social networks and so on. So this is a great kind of example use case. We've got all this data about, let's say, customer behavior in your retail store or your video-sharing site, and in order to serve those customers better and make more money, if you can make good recommendations about what they should buy, what they should watch, or what they should listen to, that's a classic use case for machine learning and unlocking the data that is there, so that is one of the drivers of some of these systems, players like Amazon, they're sort of good examples of the recommendation use case. Another is fraud detection, and that is a classic example in financial services, enterprise, which is a kind of staple of IBM's customer base. So these are a couple of examples of the use cases, but the tool sets, traditionally, have been kind of cumbersome. So Amazon bought everything from scratch themselves using customized systems, and they've got teams and teams of people. Nowadays, you've got this bold into Apache Spark, you've got it in Spark, a machine learning library, you've got good models to do that kind of thing. So I think from an algorithmic perspective, there's been a lot of advancement and there's a lot of standardization and almost commoditization of the model side. So what is missing? >> George: Yeah, what else? >> And what are the shortfalls currently? So there's a big difference between the current view, I guess the hype of the machine learning as you've got data, you apply some machine learning, and then you get profit, right? But really, there's a hugely complex workflow that involves this end-to-end story. You've got data coming from various data sources, you have to feed it into one centralized system, transform and process it, extract your features and do your sort of hardcore data signs, which is the core piece that everyone sort of thinks about as the only piece, but that's kind of in the middle and it makes up a relatively small proportion of the overall chain. And once you've got that, you do model training and selection testing, and you now have to take that model, that machine-learning algorithm and you need to deploy it into a real system to make real decisions. And that's not even the end of it because once you've got that, you need to close the loop, what we call the feedback loop, and you need to monitor the performance of that model in the real world. You need to make sure that it's not deteriorating, that it's adding business value. All of these ind of things. So I think that is the real, the piece of the puzzle that's missing at the moment is this end-to-end, delivering this end-to-end story and doing it at scale, securely, enterprise-grade. >> And the business impact of that presumably will be a better-quality experience. I mean, recommendation engines and fraud detection have been around for a while, they're just not that good. Retargeting systems are too little too late, and kind of cumbersome fraud detection. Still a lot of false positives. Getting much better, certainly compressing the time. It used to be six months, >> Yes, yes. Now it's minutes or second, but a lot of false positives still, so, but are you suggesting that by closing that gap, that we'll start to see from a consumer standpoint much better experiences? >> Well, I think that's imperative because if you don't see that from a consumer standpoint, then the mission is failing because ultimately, it's not magic that you just simply throw machine learning at something and you unlock business value and everyone's happy. You have to, you know, there's a human in the loop, there. You have to fulfill the customer's need, you have to fulfill consumer needs, and the better you do that, the more successful your business is. You mentioned the time scale, and I think that's a key piece, here. >> Yeah. >> What makes better decisions? What makes a machine-learning system better? Well, it's better data and more data, and faster decisions. So I think all of those three are coming into play with Apache Spark, end-to-end's story streaming systems, and the models are getting better and better because they're getting more data and better data. >> So I think we've, the industry, has pretty much attacked the time problem. Certainly for fraud detection and recommendation systems the quality issue. Are we close? I mean, are we're talking about 6-12 months before we really sort of start to see a major impact to the consumer and ultimately, to the company who's providing those services? >> Nick: Well, >> Or is it further away than that, you think? >> You know, it's always difficult to make predictions about timeframes, but I think there's a long way to go to go from, yeah, as you mentioned where we are, the algorithms and the models are quite commoditized. The time gap to make predictions is kind of down to this real-time nature. >> Yeah. >> So what is missing? I think it's actually less about the traditional machine-learning algorithms and more about making the systems better and getting better feedback, better monitoring, so improving the end user's experience of these systems. >> Yeah. >> And that's actually, I don't think it's, I think there's a lot of work to be done. I don't think it's a 6-12 month thing, necessarily. I don't think that in 12 months, certainly, you know, everything's going to be perfectly recommended. I think there's areas of active research in the kind of academic fields of how to improve these things, but I think there's a big engineering challenge to bring in more disparate data sources, to better, to improve data quality, to improve these feedback loops, to try and get systems that are serving customer needs better. So improving recommendations, improving the quality of fraud detection systems. Everything from that to medical imaging and counter detection. I think we've got a long way to go. >> Would it be fair to say that we've done a pretty good job with traditional application lifecycle in terms of DevOps, but we now need the DevOps for the data scientists and their collaborators? >> Nick: Yeah, I think that's >> And where is BMI along that? >> Yeah, that's a good question, and I think you kind of hit the nail on the head, that the enterprise applied machine learning problem has moved from the kind of academic to the software engineering and actually, DevOps. Internally, someone mentioned the word train ops, so it's almost like, you know, the machine learning workflow and actually professionalizing and operationalizing that. So recently, IBM, for one, has announced what's in data platform and now, what's in machine learning. And that really tries to address that problem. So really, the aim is to simplify and productionize these end-to-end machine-learning workflows. So that is the product push that IBM has at the moment. >> George: Okay, that's helpful. >> Yeah, and right. I was at the Watson data platform announcement you call the Data Works. I think they changed the branding. >> Nick: Yeah. >> It looked like there were numerous components that IBM had in its portfolio that's now strung together. And to create that end-to-end system that you're describing. Is that a fair characterization, or is it underplaying? I'm sure it is. The work that went into it, but help us maybe understand that better. >> Yeah, I should caveat it by saying we're fairly focused, very focused at HTC on the Open Source side of things, So my work is predominately within the Apache Spark project and I'm less involved in the data bank. >> Dave: So you didn't contribute specifically to Watson data platform? >> Not to the product line, so, you know, >> Yeah, so its really not an appropriate question for you? >> I wouldn't want to kind of, >> Yeah. >> To talk too deeply about it >> Yeah, yeah, so that, >> Simply because I haven't been involved. >> Yeah, that's, I don't want to push you on that because it's not your wheelhouse, but then, help me understand how you will commercialize the activities that you do, or is that not necessarily the intent? >> So the intent with HTC particularly is that we focus on Open Source and a core part of that is that we, being within IBM, we have the opportunity to interface with other product groups and customer groups. >> George: Right. >> So while we're not directly focused on, let's say, the commercial aspect, we want to effectively leverage the ability to talk to real-world customers and find the use cases, talk to other product groups that are building this Watson data platform and all the product lines and the features, data sans experience, it's all built on top of Apache Apache Spark and platform. >> Dave: So your role is really to innovate? >> Exactly, yeah. >> Leverage and Open Source and innovate. >> Both innovate and kind of improve, so improve performance improve efficiency. When you are operating at the scale of a company such as IBM and other large players, your customers and you as product teams and builders of products will come into contact with all the kind of little issues and bugs >> Right. >> And performance >> Make it better. Problems, yeah. And that is the feedback that we take on board and we try and make it better, not just for IBM and their customers. Because it's an Apache product and everyone benefits. So that's really the idea. Take all the feedback and learnings from enterprise customers and product groups and centralize that in the Open Source contributions that we make. >> Great. Would it be, so would it be fair to say you're focusing on making the core Spark, Spark ML and Spark ML Lib capabilities sort of machine learning libraries and in the pipeline, more robust? >> Yes. >> And if that's the case, we know there needs to be improvements in its ability to serve predictions in real time, like high speed. We know there's a need to take the pipeline and sort of share it with other tools, perhaps. Or collaborate with other tool chains. >> Nick: Yeah. >> What are some of the things that the Enterprise customers are looking for along the lines? >> Yeah, that's a great question and very topical at the moment. So both from an Open Source community perspective and Enterprise customer perspective, this is one of the, if not the key, I think, kind of missing pieces within the Spark machine-learning kind of community at the moment, and it's one of the things that comes up most often. So it is a missing piece, and we as a community need to work together and decide, is this something that we built within Spark and provide that functionality? Is is something where we try and adopt open standards that will benefit everybody and that provides a kind of one standardized format, or way or serving models? Or is it something where there's a few Open Source projects out there that might serve for this purpose, and do we get behind those? So I don't have the answer because this is ongoing work, but it's definitely one of the most critical kind of blockers, or, let's say, areas that needs work at the moment. >> One quick question, then, along those lines. IBM, the first thing IBM contributed to the Spark community was Spark ML, which is, as I understand it, it was an ability to, I think, create an ensemble sort of set of models to do a better job or create a more, >> So are you referring to system ML, I think it is? >> System ML. >> System ML, yeah, yeah. >> What are they, I forgot. >> Yeah, so, so. >> Yeah, where does that fit? >> System ML started out as a IBM research project and perhaps the simplest way to describe it is, as a kind of sequel optimizer is to take sequel queries and decide how to execute them in the most efficient way, system ML takes a kind of high-level mathematical language and compiles it down to a execution plan that runs in a distributed system. So in much the same way as your sequel operators allow this very flexible and high-level language, you don't have to worry about how things are done, you just tell the system what you want done. System ML aims to do that for mathematical and machine learning problems, so it's now an Apache project. It's been donated to Open Source and it's an incubating project under very active development. And that is really, there's a couple of different aspects to it, but that's the high-level goal. The underlying execution engine is Spark. It can run on Hadoop and it can run locally, but really, the main focus is to execute on Spark and then expose these kind of higher level APRs that are familiar to users of languages like R and Python, for example, to be able to write their algorithms and not necessarily worry about how do I do large scale matrix operations on a cluster? System ML will compile that down and execute that for them. >> So really quickly, follow up, what that means is if it's a higher level way for people who sort of cluster aware to write machine-learning algorithms that are cluster aware? >> Nick: Precisely, yeah. >> That's very, very valuable. When it works. >> When it works, yeah. So it does, again, with the caveat that I'm mostly focused on Spark and not so much the System ML side of things, so I'm definitely not an expert. I don't claim to be an expert in it. But it does, you know, it works at the moment. It works for a large class of machine-learning problems. It's very powerful, but again, it's a young project and there's always work to be done, so exactly the areas that I know that they're focusing on are these areas of usability, hardening up the APRs and making them easier to use and easier to access for users coming from the R and Python communities who, again are, as you said, they're not necessarily experts on distributed systems and cluster awareness, but they know how to write a very complex machine-learning model in R, for example. And it's really trying to enable them with a set of APR tools. So in terms of the underlying engine, they are, I don't know how many hundreds of thousands, millions of lines of code and years and years of research that's gone into that, so it's an extremely powerful set of tools. But yes, a lot of work still to be done there and ongoing to make it, in a way to make it user ready and Enterprise ready in a sense of making it easier for people to use it and adopt it and to put it into their systems and production. >> So I wonder if we can close, Nick, just a few questions on STC, so the Spark Technology Centers in Cape Town, is that a global expertise center? Is is STC a virtual sort of IBM community, or? >> I'm the only member visiting Cape Town, >> David: Okay. >> So I'm kind of fairly lucky from that perspective, to be able to kind of live at home. The rest of the team is mostly in San Francisco, so there's an office there that's co-located with the Watson west office >> Yeah. >> And Watson teams >> Sure. >> That are based there in Howard Street, I think it is. >> Dave: How often do you get there? >> I'll be there next week. >> Okay. >> So I typically, sort of two or three times a year, I try and get across there >> Right. And interface with the team, >> So, >> But we are a fairly, I mean, IBM is obviously a global company, and I've been surprised actually, pleasantly surprised there are team members pretty much everywhere. Our team has a few scattered around including me, but in general, when we interface with various teams, they pop up in all kinds of geographical locations, and I think it's great, you know, a huge diversity of people and locations, so. >> Anything, I mean, these early days here, early day one, but anything you saw in the morning keynotes or things you hope to learn here? Anything that's excited you so far? >> A couple of the morning keynotes, but had to dash out to kind of prepare for, I'm doing a talk later, actually on feature hashing for scalable machine learning, so that's at 12:20, please come and see it. >> Dave: A breakout session, it's at what, 12:20? >> 20 past 12:00, yeah. >> Okay. >> So in room 302, I think, >> Okay. >> I'll be talking about that, so I needed to prepare, but I think some of the key exciting things that I have seen that I would like to go and take a look at are kind of related to the deep learning on Spark. I think that's been a hot topic recently in one of the areas, again, Spark is, perhaps, hasn't been the strongest contender, let's say, but there's some really interesting work coming out of Intel, it looks like. >> They're talking here on The Cube in a couple hours. >> Yeah. >> Yeah. >> I'd really like to see their work. >> Yeah. >> And that sounds very exciting, so yeah. I think every time I come to a Spark summit, they always need projects from the community, various companies, some of them big, some of them startups that are pushing the envelope, whether it's research projects in machine learning, whether it's adding deep learning libraries, whether it's improving performance for kind of commodity clusters or for single, very powerful single modes, there's always people pushing the envelope, and that's what's great about being involved in an Open Source community project and being part of those communities, so yeah. That's one of the talks that I would like to go and see. And I think I, unfortunately, had to miss some of the Netflix talks on their recommendation pipeline. That's always interesting to see. >> Dave: Right. >> But I'll have to check them on the video (laughs). >> Well, there's always another project in Open Source land. Nick, thanks very much for coming on The Cube and good luck. Cool, thanks very much. Thanks for having me. >> Have a good trip, stay warm, hang in there. (Nick laughs) Alright, keep it right there. My buddy George and I will be back with our next guest. We're live. This is The Cube from Sparks Summit East, #sparksummit. We'll be right back. (upbeat music) (gentle music)
SUMMARY :
Brought to you by Data Bricks. a the IBM Spark Technology Center in South Africa. So let's see, it's a different time of year, here I've flown from, I don't know the Fahrenheit's equivalent, You probably get the T-shirt for the longest flight here, need the parka, or like a beanie. So Nick, tell us about the Spark Technology Center, and the ecosystem. The famous example that I like to use is Linux. I don't know all the details, certainly, Translate the hallway talk, maybe. Essentially, I think you raise very good parallels and kind of almost leap frog and say, "We're going to and so, in some respects, maybe missing the window on Hadoop and they're still sort of struggling to figure it out. So part of that is the traditional data warehousing So Nick, perhaps paint us a picture of someone and almost commoditization of the model side. And that's not even the end of it And the business impact of that presumably will be still, so, but are you suggesting that by closing it's not magic that you just simply throw and the models are getting better and better attacked the time problem. to go from, yeah, as you mentioned where we are, and more about making the systems better So improving recommendations, improving the quality So really, the aim is to simplify and productionize Yeah, and right. And to create that end-to-end system that you're describing. and I'm less involved in the data bank. So the intent with HTC particularly is that we focus leverage the ability to talk to real-world customers and you as product teams and builders of products and centralize that in the Open Source contributions sort of machine learning libraries and in the pipeline, And if that's the case, So I don't have the answer because this is ongoing work, IBM, the first thing IBM contributed to the Spark community but really, the main focus is to execute on Spark When it works. and ongoing to make it, in a way to make it user ready So I'm kind of fairly lucky from that perspective, And interface with the team, and I think it's great, you know, A couple of the morning keynotes, but had to dash out are kind of related to the deep learning on Spark. that are pushing the envelope, whether it's research and good luck. My buddy George and I will be back with our next guest.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Dave Valente | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Nick Pentreath | PERSON | 0.99+ |
Howard Street | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Nick Pentry | PERSON | 0.99+ |
$1 billion | QUANTITY | 0.99+ |
Nick | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
HTC | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Cape Town | LOCATION | 0.99+ |
South Africa | LOCATION | 0.99+ |
Java | TITLE | 0.99+ |
Linux | TITLE | 0.99+ |
12 months | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
next week | DATE | 0.99+ |
Boston | LOCATION | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
IBM Spark Technology Center | ORGANIZATION | 0.99+ |
BMI | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
Spark | TITLE | 0.99+ |
12:20 | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
6-12 month | QUANTITY | 0.99+ |
Watson | ORGANIZATION | 0.98+ |
tomorrow | DATE | 0.98+ |
Spark Technology Center | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.98+ |
Spark Technology Centers | ORGANIZATION | 0.98+ |
this year | DATE | 0.97+ |
Hadoop | TITLE | 0.97+ |
hundreds of thousands | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
30 degrees Celsius | QUANTITY | 0.97+ |
Data First | ORGANIZATION | 0.97+ |
Super Bowl | EVENT | 0.97+ |
single | QUANTITY | 0.96+ |
Rod Smith - IBM Spark Summit 2015 - theCUBE
from galvanized San Francisco extraction signal from the noise it's the kue cover the apache spark community event brought you IBM now your host John free George okay welcome back everyone we are live in San Francisco for this special q presentation with the IBM sparkman the event here live at galvanized in San Francisco workspace incubator great place for developer education IBM's big announcement today their commitment to spark they didn't see any numbers but I'm counting in the hundreds of millions of years to quote Papa Chiana on my call with him on Friday with rod $17 fuck yeah holler last for hundreds of millions yeah hundred millions of dollars getting late in the day going to be your coming rod Smith's our next guest rod welcome to the cube thank you very much with a catalyst behind spark at IBM worked hard on it yeah you guys tell a story what's the story well we worked on big data and I have a group of folks that go out and work with customers all the time and what we were doing Hadoop we would do these cool applications that sometimes you know small clusters 20 minutes you get a result and a customer would say can you do that in a couple seconds kind of look around and go what changed it means it did the business problem and they couldn't tell us but it's one of those data points in your head that go something's not quite right you know what's what's changing or what are they trying to tell me that they can't and that's when we started learning you know customers were looking for technology that they could iterate on quickly you know open-ended questions it wasn't the give me a problem do the game pew pew output I'm done this was oh gee there's the journey I now see some interesting insights I have other questions was it was something not right the data that they got didn't match their hypothesis or was it the expectation that if I can do it fast on google and find a Thai restaurant down the block well so I can it went that way something doesn't right what was with me that said why can't you tell me what you're really trying to accomplish what I learned is that as we go through these kind of digital transfer mation real real time they were thinking about how their business is going to change so fast and so the problems always been for technologists and vendors like IBM tell us the problem we pick out the technology and you're pretty well stuck with it it stays that way and they wanted more flexibility open-ended questions lots of different data sources on demand when they had to have it on this they wanted to see results along the way and they would rather have analytics be approximation that they could use quickly rather than after the fact and more accurate okay so you know when you went through that it wasn't they couldn't find a bi person to talk bad about and I couldn't find a data person so you know it was fun to try to put piece puzzles together and that's where spark came into this so I see a lot of other trends are kind of vectoring into that convergence which is in-memory databases you know the community flash for persistence store on the storage side so this you as a close to all that action what was the aha moment for for within IBM is han hey you know what this spark thing is the next Linux me we got to get out in front of this and help the community go faster and then kind of rising tide floats elbows what was that flash point flow we we had two of them one was that in our commerce group there's ways that they work on online pricing and there's a vendor stander which takes about a week when you get data off of a site or retail site they analyze that they correct the analytics they put it back up again takes about a week but we showed them a spark we could do it in about four hours a week down to four hours and now they started to think oh you know what do we offer customers now we have ways to have not just one product many products let's bring in other data location data traffic data weather data social data so that kind of exploded internally on this is a big change this is something that we can relate to cus of multiple data source of the need for unification and speed and and speed speed first because be first that's a heck all the speed i want to bring other data sets and it's time to value i mean if you're going to be a digital business and look at real time where it's going Netflix others have really set the standard on ok so then i'm a so let's take a next level so rod you're crazy we can't do that it would disrupt all these other businesses we have so how does that conversation happen within IBM the way that happens in IBM is rod you are crazy and you're going to cause me odds it up so please go away and I don't go away easily but you keep pushing on this and part of my job is to work with customers can I show value so I can take the product team saying you need to take this more seriously I've got currency now and then as you just said the marketplace starts to light up spark is on the front page as people are talking about how they're using it well Hadoop is growing too at the same time so it loop does it seeds the market seats the Mars you see you're playing ahead do but if you see the customer challenges and you're like you guys just connect the dots and and then it's back to the customer is talking about what their problems they want to use or the solutions are looking for so yeah it takes time because it's it's risky meaning that all of us have quarterly is what we're doing but how do we now make it safer for people in IBM jump in the water so that eventually they don't hate me so what's your what's your comment when a friend says hey rod you know linux was great but it's a different era oh you know here with cloud and mobile open source with the patch he's evolved to the point where it's very manageable for vendors to be contributed as well with with non company contributors how do you guys see the difference between those two worlds because really this is a Linux moment but there's no big bad main many many computer companies name frames out there but their specialized for like the Z systems are great but like this is scale out commodity hardware a dupe now that's growing how do you how do you describe that because there is a Linux correlation what linux was for open source then operating systems now this is kind of distributed analytics I think you're you're you know the the part of this is kind of real-time digital business transformations and while there is not a you know bad company out there you know amazon and others have shown how they can be online businesses and use analytics and be very effective but i'm a brick and mortar company and an online business how do i do the same thing and spark starts to really show that no they don't have a corner on the market we can compete so that's the big factor on this is well it's not one company doing this it's I need to be able to compete at the speed the businesses that didn't have to see that Amazon started kind of post recession or you know Dom bubble bursting you know web services was just kind of kicking through if we remember our history lessons and what happened was they really had no traction they built some building blocks right they made a good decision to integrate to core building blocks compute and storage and they built from there so in a way you guys can enable companies to have their own amazon like extensive experience because it's a fresh clean cute paper right it is and I think we're spark it's interesting is like you said in two verticals what do i do to retail what do I do in health care what are we doing finance right very specialized I we've shown in Watson you can do Watson for cancer research you can do Watson for cooking right but they're very vertical now so specialized domain expertise becomes really interesting right that's the big part and that's the part I really liked about spark they were the community really thought about solution developers you know they stayed away kind of middle ground I you don't have to be a deep dated person or a deep analytics API person what's the problem you want to solve how can I help you do that I think that's a you know that's interesting is that that's because most people go Jay this is speeds and feeds software we look at the solutions more holistic but then you're really talk about customer problems right the so-called outcomes that go on well that's what and I think that's the part that I've enjoyed is I want to talk to you you know about what your problem is I don't want to talk technology I you know I don't want to have to make a technology choice from stay one spark helps me with that I don't notify programming while all those things come together so I can concentrate we can concentrate on talking to the customer but you know learn from them what are you trying to accomplish so you watch the next things on your list good I just gonna say you know looking at your LinkedIn page i love this at BP emerging technologies for 20 some odd years so you see here you've seen a lot of technology's come a lot of emerging technologies and the acceleration of these technologies is only going more right you have a whole lot more in your portfolio you have to look at today then then you did yesterday or five years ago yeah why is sparks a special in the cornucopia of technologies that you've seen coming over the years it's a good question and and as I've done merging technologies I've learned that I have to you know listen to customers very carefully on it and when I hear those kind of repeatable business patterns do I see an economic change a transformation that really sticks with me and sometimes the old things have start really big you know they start out good and then they fade away but I always look for technologies that seem to have lots of dimensions to them from a business value standpoint that's what attracted me to spark and my team working with some customers on pocs we could do them quickly you know I really like to get to the point where you know we an industry we with notebooks and others we can do solutions in less than four hours for a customer what better thing to take your you know employee to lunch and spat them on the back for you know something that you didn't expect for weeks well one of the exciting things that you guys have done is you shine the spotlight on spark and you opened up the conversation globally around IBM is making a big move spark was a little bit of an outlier and the mainstream press I mean the press we're picking up spark oh yeah berkeley some credibility of great people behind it but now it's like wow it's going to get the attention of CX cxos out there and they're going to be like hmm if ibm's looking at it must be relevant because of the history you guys have with innovation but they're going to ask you the question I'm going to ask you which is it's not baked out yet where are we with this what are you guys going to do how does IBM work with the community to continue to bake out spark because a lot of people are using it bringing it in but it's evolving super fast and that's going to be the question is it baked and how does it get baked faster so I think there is lots of areas that if we just talked about if I'm doing retail or health care or fine it's going to be lots of specialized analytics because that's what spark for me is is enabling custom analytics on this second part is as you think about how you want to look at bigger problems I think that many times are learning is to try to you know once we got a technology lets make everything fit it rather than starting to separate it by business problems and I think we can do that now or we can bring to the table technology learning best practices around this and solutions I think you know at the end of the day it's house part can be integrated into a business solution and our customers very quickly and hopefully those customers see it broadly from interoperability standpoint of what they're going to do so the final question I have for you is what was the biggest learning that you've taken away from this process that was magnified through this whole journey of a taking IBM from being a participant in the as a citizen in the community early on as a founding member of spark this is back in two thousand nine so it wasn't like no one knew he was going on and you know we bird cover on Hadoop from the beginning so we'd love to watch these ecosystems grow but from from the early days to now today mmm what was the biggest thing that you learned that was magnified out of all the reactions all the feedback all the customers what can you share I I think for me when we did a spark hack you know our hackathon piece when 28,000 IBM ER showed up with ideas that told us twenty eight thousand 28,000 so now you stopped and 28,000 people who were focused on the customer so they had a thought of how this could be relevant this is great I mean this isn't like back talking for this isn't one little vein with a little stream it's big and it big was what we can do for our customer when was that um about two months ago how did you pull that off just out an email blast all the IBM's put on the message board to a crowd chat what did you do well when you put out an email blast the second one is you put on a webcam to explain to people what you're going to do with it what you'd like them to do and I'll we're setting it up and and then you step back and you know kind of cross your fingers hope people show up and then when you know you invite ten thousand and twenty eight thousand show up you kind of know that we're turning a corner as a company on understanding how we can use that for this this also highlights this whole connectedness apps internet of things and people are things to so their mobile device when you have that kind of people close to the action the creativity is there right there on the front lines and they don't feel like that the work they do is going to be taken by the machinery in the old days I got to go back all these hurdles I gotta jump now they could instantly be there with some solutions so that's that's super compelling the next question is security and how does how do you see that leaving in because now one of the things that came up will first meeting let me back up but I get this you think about security question for a second last week ahead dupe summit we were talking with the Hadoop ecosystem Hortonworks ODP conversations etc but when you looked at kind of like reading the tea leaves it was sparked that was kind of stealing the show the subtext was smart all the spark sessions were packed the developers had was salivating over sparks like to hear that I did why why is that why are the Hadoop developers salivating over spark is it because they wanted to go faster do they see extensions any thoughts I think that I've say it two ways one is I think there was and since I did who do for quite a while I think people thought for a while Hadoop was going to be an analytics platform and it it kind of went down the path of being immoral generalized platform so you can do more than MapReduce jobs so there's been this pent-up demand for really analytics focus and spark offered that focus and the performance side I think that's the parts in Hadoop sold kind of a false dream or it didn't materialize fast but I don't think of material out of false treaty I'm saying if they promise them around yeah it well and people set those you know well the fresh maybe yeah I don't think the vendors all I think was more than well vendors you know it did to unstructured data does that unstructured data does that storing data and I didn't be able to act on it creates some interesting dynamics I mean I've worked with customers who you know started to put data in Hadoop but to have put data dupe you know we're only going to do a year's worth of data and then putting three years of data because they want to do monte pucker up my Carlo simulations against a Monty Python it's time you threw water on us and we love yours we on the cube but the problem says we're talking about before like you know our internal use we can produce you know interesting innovations in days that's going to attract audiences because now they can show their you know business people what they can do for them that's what's really driving this I mean if you gotta see XO you know CMO says you know show me what you can do you know do segmentation on my population for these products they want it in in minutes not so you know going to run it in different jobs and the over a certain period of time I was just talking with the CEOs of docusign box 18 1018 Syrian kinky was executive director and then EVP a platform that Salesforce the common thread amongst those executives was the new digital transformation has such a dynamic or impactful economic impact yes I mean dr. Sanyal using examples how literally Deutsche Telekom saved 230 million dollars on one process yes one process yes with analytics and yeah process improvements extreme it sounds funny but it's extremely low hanging fruit they haven't had technology and the economics and be able support it now we do and now you're seeing the solution developer go I think I can make a business result faster yeah and if they can show it then businesses react and I think that's the beautiful thing about what Hadoop is done I mean I brought that up earlier trying to tease that out with reality we're seeing is that that mark is continuing to grow but there's a world beyond Hadoop yep I mean Hortonworks this public company I mean IBM is massive so you got Hadoop and then sparks a beautiful extension to that that enables so much more well I think spark will go further because it's more to me is another dimension it's an integration technology so i can have sparked up to legacy systems without hadoop you know in there doing analytics in there being an avenue for doing joins on data doing analytics on unstructured and transactional data whether data pulling it all together and I think that's the again talking about multi-dimensional that's what that was hard even five years ago so any relational database that's a nightmare yeah and you're asked about security so you want to touch on yeah okay go ahead so part of the things that I like about spark is the technology is called resilient distributed data sets r dds so I read data from a source and I make it into this r DD I can work on it that gives me a great data point or a great interaction with a Cassandra datastax did a really great job of a spark driver so you think about this in businesses for a db2 or something now I know where I can put my security and my governance I can put those at certain endpoints now as i'm reading in my application writing these things out so again back to my point of an integration it's not something that i'm trying to get around a business i'm at integrating extending their life and/or capabilities that's right so I got to ask you the internal IBM question my last question is it what's the vibe like at IBM because you know I've been you know I worked at IBM way back in the day back in the 80s and the cultures changed right so much mm-hmm but there's still a huge technical group of people at IBM so I got to ask you the question with all this new cloud innovation all this new capabilities to do stuff differently what's it like for all the technical guys at IBM right now because they got to be like Hayden we can now do this we can so new capabilities are emerging what's the what's the vibe like and what are some of the things that that are low-hanging fruit that are that our game change because low-hanging fruit is game-changing today oh yes I what's the vibe eternally at idea I've internally is very hot I mean the guys and gals at this you look at cloud computing look we've done with bluemix it got is getting you know great recent press it's getting great results with customers back to this time to value piece it's new to us I mean there's only a small group that started that so now the rest of the IBM arts are going this is really cool how do we do it now you've got analytics that you know we're starting you've been you know competencies are on this now you can take the real-time aspect so yeah the five is really all those little silos you know identity system here I got to build all the software now you can gotta go horizontal yeah so you know that's kind of a new thing that's kind of exciting it's gonna be fun to watch my final question I guess is my final final question is have you been keeping track this is the sixth and final time analytics well rods great to have you on the cube you're awesome great great commentary great great insight spark in the cloud is what data bricks announce what about an on-premise i'm a customer i want i want on prem I don't necessarily want to do what's next I 40 s or other stuff oh I think you're going to see you know like hybrid models for cloud where spark as a service is there on prem i think one of the really exciting parts to me is that one the unified program model to the portability of the analytic models so let's say I start on prom because I'm worried about security and other things and then I want to move it to a cloud service well I don't have to go rewrite it I can just move the analytics over from a model standpoint so I think you're going to see this evolved very fast as people want to do either on prem or hybrid or you know dedicated cuz of the integration capabilities and the distributed nature of it that's the point yep awesome well I'll let you get the last word on the segment share what the folks who's not or aren't watching what is this all about today why is in San Francisco today IBM's announcement what's so groundbreaking about it I know you're part of it a little bit biased but share the folks why what why now what's this all about what's what's what's going on here well we think that the kind of epicenter for spark innovation is here in San Francisco amp lab with data bricks and others are doing here and we want to be a part of that and I think spark technology senator setting up is about how we can contribute and learn and you know help the community grow we think this is gonna you brought some food to the party I mean you are I said earlier beer right you bring a you know the ml yeah you got them back other wine napa valley of course you got to go to wine well craft beers good north north bay thanks so much for coming on the cube really appreciate the insight because it is a great color from an expert IBM here we're on the ground this is the cube special presentation live in San Ruby back with more with live coverage of the breakouts in the event tonight IBM spark community event here in san fran at the galvanized workspace education center we write back
SUMMARY :
the question I'm going to ask you which
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Deutsche Telekom | ORGANIZATION | 0.99+ |
amazon | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
four hours | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
twenty eight thousand | QUANTITY | 0.99+ |
20 minutes | QUANTITY | 0.99+ |
three years | QUANTITY | 0.99+ |
Papa Chiana | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
28,000 people | QUANTITY | 0.99+ |
Friday | DATE | 0.99+ |
Rod Smith | PERSON | 0.99+ |
$17 | QUANTITY | 0.99+ |
a year | QUANTITY | 0.99+ |
230 million dollars | QUANTITY | 0.99+ |
hundreds of millions | QUANTITY | 0.99+ |
San Francisco | LOCATION | 0.99+ |
less than four hours | QUANTITY | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
second part | QUANTITY | 0.99+ |
linux | TITLE | 0.99+ |
san fran | LOCATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
dr. Sanyal | PERSON | 0.99+ |
five years ago | DATE | 0.99+ |
today | DATE | 0.98+ |
yesterday | DATE | 0.98+ |
sixth | QUANTITY | 0.98+ |
hundred millions of dollars | QUANTITY | 0.98+ |
about a week | QUANTITY | 0.98+ |
San Ruby | LOCATION | 0.98+ |
hundreds of millions of years | QUANTITY | 0.98+ |
Watson | TITLE | 0.97+ |
two | QUANTITY | 0.97+ |
28,000 | QUANTITY | 0.97+ |
Hortonworks | ORGANIZATION | 0.96+ |
about a week | QUANTITY | 0.96+ |
about four hours a week | QUANTITY | 0.96+ |
first meeting | QUANTITY | 0.96+ |
Linux | TITLE | 0.96+ |
one | QUANTITY | 0.95+ |
five years ago | DATE | 0.95+ |
about two months ago | DATE | 0.94+ |
Hayden | PERSON | 0.94+ |
hadoop | TITLE | 0.94+ |
two verticals | QUANTITY | 0.94+ |
ORGANIZATION | 0.94+ | |
tonight | DATE | 0.93+ |
two worlds | QUANTITY | 0.93+ |
two thousand nine | QUANTITY | 0.93+ |
one product | QUANTITY | 0.93+ |
Carlo | TITLE | 0.92+ |
spark technology | ORGANIZATION | 0.91+ |
five | QUANTITY | 0.91+ |
IBM Spark Summit 2015 | EVENT | 0.91+ |
second one | QUANTITY | 0.91+ |
first | QUANTITY | 0.9+ |
docusign | ORGANIZATION | 0.89+ |