Jay Limburn, IBM & Julie Lockner, IBM | IBM Think 2019

>> Live from San Francisco, it's theCUBE! Covering IBM Think 2019. Brought to you by IBM. >> Welcome back, live here in San Francisco, it's theCUBE's coverage of IBM Think 2019. I'm John Furrier--Stu Miniman. Stu, four days, we're on our fourth day, the sun's shining, they've shut down Howard Street here at IBM. Big event for IBM, in San Francisco, not Las Vegas. Lot of great cloud action, lot of great AI data developers. Great story, good to see you again. Our next two guests, Julie Lockner, Director, Offering Management, Portfolio Operations at IBM, Data+AI, great to see you. >> Thank you, it's great to see you too, thank you. >> And Jay Limburn, Director of Offering Management, IBM Data+AI, thanks for coming on. >> Hey guys, great to be here. >> So, we've chatted many times at events, the role of data. So, we're religious about data, data flows through our blood, but IBM has put it all together now. All the reorgs are over, everyone's kind of, the table is set for IBM. The data path is clear, it's part of applications. It's feeding the apps. AI's the key workload inside the application. This is now a fully set-up group, give us the update, what's the focus? >> Yeah, it's really exciting because, if you think about it, before, we were called IBM Analytics, and that really is only a part of what we do. Now that we're Data+AI, that means that not only are we responsible for delivering data assets, and technology that supports those data assets to our customers, but infusing AI, not only in the technologies that we have, but also helping them build applications so they can fuse AI into their business processes. >> It's pretty broad, I mean, data's very much a broad swath of things. Analytics, you know, wrangling data, setting things up, cataloging them. Take me through how you guys set this up. How do you present it to the marketplace? How are clients engaged with it? Because it's pretty broad. But it could be, it needs to be specific. Take us through the methodology. >> So, you probably heard a lot of people today talk about the ladder to AI, right? This is IBM's view of how we explain our client's journey towards AI. It really starts at the bottom rung of the ladder, where we've got the collection of information. Collect your data. Once you've collected your data, you move up to the next rung, which is the Organize. And this is really where all the governance stuff comes in. This is how we can provide a view across that data, understand that data, provide trust to that data, and then serve that up to the consumers of that information, so they can actually use that in AI. That's where all the data science capabilities come in, allowing people to actually be able to consume that information. >> So, the bottom set is just really all the hard and heavy lifting that data scientists actually don't want to do. >> And writing algorithms, the collecting, the ingesting of data from any source, that's the bottom? And then, tell me about that next layer up, from the collection-- >> So, Collect is the physical assets or the collection of the data that you're going to be using for AI. If you don't get that foundation right, it doesn't really make sense. You have to have the data first. The piece in the middle that Jay was referring to, that's called Organize, our whole divisions are actually organized around these ladders to AI, so, Collect, Organize, Analyze, Infuse. On the Organize side, as Jay was mentioning, it's all about inventorying the data assets, knowing what data you have, then providing data quality rules, governance, compliance-type offerings, that allow organizations to not just know your data, trust your data, but then make it available so you can use your data, and the users are those data scientists, they're the analytics teams, they're the operation organizations that need to be able to build their solutions on top of trusted data. >> So, where does the Catalog fit in? Which level does that come into? >> Yeah, so, think of the Data Catalog as the DNS for data, all right? It's the way in which you can provide a full view of all of your information. Whether it's structured information, unstructured information, data you've got on PRAM and data you've got in a cloud somewhere. >> That's in the Organize layer, right? >> That's all in the Organize layer. So, if you can collect that information, you can then provide capabilities that allow you to understand the quality of that data, know where that data's come from, and then, finally, if you serve that up inside a compelling, business-friendly experience, so that a data scientist can go to one place, quickly make a decision on if that's the right data for them, and allow them to go and be productive by building a data science model, then we're really able to move the needle on making those data science organizations efficient, allowing us to build better models to transform their business. >> Yeah, and a big part of that is, if you think about what makes Amazon successful, it's because they know where all their products are, from the vendor, to when it shows up on the doorstep. What the Catalog provides is really the similar capability of, I would call it inventory management of your data assets, where we know where the data came from, its source--in that Collect layer-- who's transformed it, who's accessed it, if they're even allowed to see it, so, data privacy policies are part of that, and then being able to just serve up that data to those users. Being able to see that whole end-to-end lineage is a key point, critical point of the ladder to AI. Especially when you start to think about things like bias detection, which is a big part of the Analyze layer. >> But one of the things we've been digging into on theCUBE is, is data the next flywheel of innovation? You know, it used to be I just had my information, many years ago we started talking about, "Okay, I need to be able to access all that other information." We hear things like 80% of the data out there isn't really searchable today. So, how do you see data, data gravity, all those pieces, as the next flywheel of innovation? >> Yeah, I think it's key. I mean, we've talked a lot about how, you can't do AI without information architecture. And it's absolutely true. And getting that view of that data in a single location, so it is like the DNS of the internet. So you know exactly where to search, you can get hold of that data, and then you've got tools that give you self-service access to actually get hold of the data without any need of support from IT to get access to it. It's really a key-- >> Yeah, but to the point you were just asking about, data gravity? I mean, being able to do this where the data resides. So, for example, we have a lot of our customers that are mergers and acquisitions. Some teams have a lot of data assets that are on-premises, others have large data lakes in AWS or Azure. How do you inventory those assets and really have a view of what you have available across that landscape? Part of what we've been focusing on this year is making our technology work across all of those clouds. And having a single view of your assets but knowing where it resides. >> So, Julie, this environment is a bit more complicated than the old data warehousing, or even what we were looking at with big data and Hadoop and all those pieces. >> Isn't that the truth? >> Help explain why we're actually going to be able to get the information, leverage and drive new business value out of data today, when we've struggled so many times in the past. >> Well, I think the biggest thing that's changed is the adoption of DevOps, and when I say adoption of DevOps and things like containerization and Docker containers, Kubernetes, the ability to provision data assets very quickly, no matter where they are, build these very quick value-producing applications based on AI, Artificial Intelligence APIs, is what's allowing us to take advantage of this multi-cloud landscape. If you didn't have that DevOps foundation, you'd still be building ETL jobs in data warehouses, and that was 20 years ago. Today, it's much more about these microservices-based architecture, building up these AI-- >> Well, that's the key point, and the "Fuse" part of the stack, I think, or ladder. Stack? Ladder? >> Ladder. (laughs) >> Ladder to success! Is key, because you're seeing the applications that have data native into the app, where it has to have certain characteristics, whether it's a realtime healthcare app, or retail app, and we had the retail folks on earlier, it's like, oh my god, this now has to be addressable very fast, so, the old fenced-off data warehouse-- "Hey, give me that data!"--pull it over. You need a sub-second latency, or milliseconds. So, this is now a requirement. >> That's right. >> So, how are people getting there? What are some use cases? >> Sure. I'll start with the healthcare 'cause you brought that up. One of the big use cases for technology that we provide is really around taking information that might be realtime, or batch data, and providing the ability to analyze that data very quickly in realtime to the point where you can predict when someone might potentially have a cardiac arrest. And yesterday's keynote that Rob Thomas presented, a demonstration that showed the ability to take data from a wearable device, combine it with data that's sitting in an Amazon... MySQL database, be able to predict who is the most at-risk of having a potential cardiac arrest! >> That's me! >> And then present that to a call center of cardiologists. So, this company that we work with, iCure, really took that entire stack, Organize, Collect, Organize, Analyze, Infuse, and built an application in a matter of six weeks. Now, that's the most compelling part. We were able to build the solution, inventory their data assets, tie it to the industry model, healthcare industry model, and predict when someone might potentially-- >> Do you have that demo on you? The device? >> Of course I do. I know, I know. So, here is, this is called a BraveHeart Life Sensor. And essentially, it's a Bluetooth device. I know! If you put it on! (laughs) >> If I put it on, it'll track... Biometric? It'll start capturing information about your heart, ECG, and on Valentine's Day, right? My heart to yours, happy Valentine's Day to my husband, of course. The ability to be able to capture all this data here on the device, stream it to an AI engine that can then immediately classify whether or not someone has an anomaly in their ECG signal. You couldn't do that without having a complete ladder to AI capability. >> So, realtime telemetry from the heart. So, I see timing's important if you're about to have a heart attack. >> Yeah. >> Pretty important. >> And that's a great example of, you mentioned the speed. It's all about being able to capture that data in whatever form it's coming in, understand what that data is, know if you can trust that data, and then put it in the hands of the individuals that can do something valuable with the analysis from that data. >> Yeah, you have to able to trust it. Especially-- >> So, you brought up earlier bias in data. So, I want to bring that up in context of this. This is just one example of wearables, Fitbits, all kinds of things happening. >> New sources of tech, yeah. >> In healthcare, retail, all kinds of edge, realtime, is bias of data. And the other one's privacy because now you have a new kind of data source going into the cloud. And then, so, this fits into what part of the ladder? So, the ladder needs a secure piece. >> Tell me about that. >> Yeah, it does. So, that really falls into that Organize piece of that ladder, the governance aspects around it. If you're going to make data available for self-service, you've got to still make sure that that data's protected, and that you're not going to go and break any kind of regulatory law around that data. So, we actually can use technology now to understand what that data is, whether it contains sensitive information, credit card numbers, and expose that information out to those consumers, yet still masking the key elements that should be protected. And that's really important, because data science is a hugely inefficient business. Data scientists are spending too much time looking for information. And worse than that, they actually don't have all the information available that they need, because certain information needs to be protected. But what we can do now is expose information that wasn't previously available, but protect just the key parts of that information, so we're still ensuring it's safe. >> That's a really key point. It's the classic iceberg, right? What you see: "Oh, data science is going to "change the game of our business!" And then when they realize what's underneath the water, it's like, all this set-up, incompatible data, dirty data, data cleaning, and then all of a sudden it just doesn't work, right? This is the reality. Are you guys seeing this? Do you see that? >> Yeah, absolutely. I think we're only just really at the beginning of a crest of a wave, here. I think organizations know they want to get to AI, the ladder to AI really helps explain and it helps to understand how they can get there. And we're able then to solve that through our technology, and help them get there and drive those efficiencies that they need. >> And just to add to that, I mean, now that there's more data assets available, you can't manually classify, tag and inventory all that data, determine whether or not it contains sensitive data. And that's where infusing machine learning into our products has really allowed our customers to automate the process. I mentioned, the only way that we were able to deploy this application in six weeks, is because we used a lot of the embedded machine learning to identify the patient data that was considered sensitive, tag it as patient data, and then, when the data scientists were actually building the models in that same environment, it was masked. So, they knew that they had access to the data, but they weren't allowed to see it. It's perfectly--especially with HIMSS' conference this week as well! You were talking about this there. >> Great use case with healthcare. >> Love to hear you speak about the ecosystem being built around this. Everything, open APIs, I'm guessing? >> Oh, yeah. What kind of partners are-- >> Jay, talk a little bit-- >> Yeah, so, one of the key things we're doing is ensuring that we're able to keep this stuff open. We don't want to curate a proprietary system. We're already big supporters of open source, as you know, in IBM. One of the things that we're heavily-invested in is our open metadata strategy. Open metadata is part of the open source ODPi Foundation. Project Egeria defines a standard for common metadata interchange. And what that means is that, any of these metadata systems that adopt this standard can freely share and exchange metadata across that landscape, so that wherever your data is, whichever systems it's stored in, wherever that metadata is harvested, it can play part of that network and share that metadata across those systems. >> I'd like to get your thoughts on something, Julie. You've been on the analyst side, you're now at IBM. Jay, if you can weigh in on this too, that'd be great. We, here, we see all the trends and go to all the events and one of the things that's popping up that's clear within the IBM ecosystem because you guys have a lot of business customers, is that a new kind of business app developer's coming in. And we've seen data science highlight the citizen data scientist, so if data is code, part of the application, and all the ladder stuff kind of falls into place, that means we're going to see new kinds of applications. So, how are you guys looking at, this is kind of a, not like the cloud-native, hardcore DevOps developer. It's the person that says, "Hey, I can innovate "a business model." I see a business model innovation that's not so much about building technology, it's about using insight and a unique... Formula or algorithm, to tweak something. That's not a lot of programming involved. 'Cause with Cloud and Cloud Private, all these back end systems, that's an ecosystem partner opportunity for you guys, but it's not your classic ISV. So, there's a new breed of business apps that we see coming, your thoughts on this? >> Yeah, it's almost like taking business process optimization as a discipline, and turning it into micro-applications. You want to be able to leverage data that's available and accessible, be able to insert that particular Artificial Intelligence machine learning algorithm to optimize that business process, and then get out of the way. Because if you try to reinvent your entire business process, culture typically gets in the way of some of these things. >> I thought, as an application value, 'cause there's value creation here, right? >> Absolutely. >> You were talking about, so, is this a new kind of genre of developer, or-- >> It really is, I mean... If you take the citizen data scientist, an example that you mentioned earlier. It's really about lowering the entry point to that technology. How can you allow individuals with lower levels of skills to actually get in and be productive and create something valuable? It shouldn't be just a practice that's held away for the hardcore developer anymore. It's about lowering the entry point with the set of tools. One of the things we have in Watson Studio, for example, our data science platform, is just that. It's about providing wizards and walkthroughs to allow people to develop productive use models very easily, without needing hardcore coding skills. >> Yeah, I also think, though, that, in order for these value-added applications to be built, the data has to be business-ready. That's how you accelerate these application development life cycles. That's how you get the new class of application developers productive, is making sure that they start with a business-ready foundation. >> So, how are you guys going to go after this new market? What's the marketing strategy? Again, this is like, forward-pioneering kind of things happening. What's the strategy, how are you going to enable this, what's the plan? >> Well, there's two parts of it. One is, when Jay was mentioning the Open Metadata Repository Services, our key strategy is embedding Catalog everywhere and anywhere we can. We believe that having that open metadata exchange allows us to open up access to metadata across these applications. So, really, that's first and foremost, is making sure that we can catalog and inventory data assets that might not necessarily be in the IBM Cloud, or in IBM products. That's really the first step. >> Absolutely. The second step, I would say, is really taking all of our capabilities, making them, from the ground up, microservices-enabled, delivering them through Docker containers and making sure that they can port across whatever cloud deployment model our customers want to be able to execute on. And being able to optimize the runtime engines, whether it's data integration, data movement, data virtualization, based on data gravity, that you had mentioned-- >> So, something like a whole new developer program opportunity to bring to the market. >> Absolutely. I mean, there is, I think there is a huge opportunity for, from an education perspective, to help our customers build these applications. But it starts with understanding the data assets, understanding what they can do with it, and using self-service-type tools that Jay was referring to. >> And all of that underpinned with the trust. If you don't trust your data, the data scientist is not going to know whether or not they're using the right thing. >> So, the ladder's great. Great way for people to figure out where they are, it's like looking in the mirror, on the organization. How early is this? What inning are we in? How do you guys see the progression? How far along are we? Obviously, you have some data, examples, some people are doing it end-to-end. What's the maturity look like? What's the uptake? >> Go ahead, Jay. >> So, I think we're at the beginning of a crest of a wave. As I say, there's been a lot of discussion so far, even if you compare this year's conference to last year's. A lot of the discussion last year was, "What's possible with AI?" This year's conference is much more about, "What are we doing with AI?" And I think we're now getting to the point where people can actually start to be productive and really start to change their business through that. >> Yeah and, just to add to that, I mean, the ladder to AI was introduced last year, and it has gained so much adoption in the marketplace and our customers, they're actually organizing their business that way. So, the Collect divisions are the database teams, are now expanding to Hadoop and Cloudera, and Hortonworks and Mongo. They're organizing their data governance teams around the Organize pillar, where they're doing things like data integration, data replication. So, I feel like the maturity of this ladder to AI is really enabling our customers to achieve it much faster than-- >> I was talking to Dave Vellante about this, and we're seeing that, you know, we've been covering IBM since, it's the 10th year of theCUBE, all ten years. It's been, watching the progression. The past couple of years has been setting the table, everyone seems to be pumping, it makes sense, everything's hanging together, it's in one group. Data's not one, "This group, that group," it's all, Data, AI, all Analytics, all Watson. Smart, and the ladder just allows you to understand where a customer is, and then-- >> Well, and also, we mentioned the emphasis on open source. It allows our customers to take an inventory of, what do they have, internally, with IBM assets, externally, open source, so that they can actually start to architect their information architecture, using the same kind of analogy. >> And an opportunity for developers too, great. Julie, thanks for coming on. Jay, appreciate it. >> Thank you so much for the opportunity, happy Valentine's Day! Happy Valentine's Day, we're theCUBE. I'm John Furrier, Stu Miniman here, live in San Francisco at the Moscone Center, and the whole street's shut down, Howard Street. Huge event, 30,000 people, we'll be back with more Day Four coverage after this short break.

Published Date : Feb 14 2019

SUMMARY :

Brought to you by IBM. Great story, good to see you again. And Jay Limburn, Director of Offering Management, It's feeding the apps. not only in the technologies that we have, But it could be, it needs to be specific. talk about the ladder to AI, right? So, the bottom set is just really that need to be able to build their solutions It's the way in which you can provide so that a data scientist can go to one place, of the ladder to AI. is data the next flywheel of innovation? get hold of the data without any need Yeah, but to the point you were than the old data warehousing, going to be able to get the information, the ability to provision data assets of the stack, I think, or ladder. (laughs) that have data native into the app, the ability to analyze that data And then present that to a call center of cardiologists. If you put it on! The ability to be able to capture So, realtime telemetry from the heart. It's all about being able to capture that data Yeah, you have to able to trust it. So, you brought up earlier bias in data. And the other one's privacy because now you have of that ladder, the governance aspects around it. This is the reality. the ladder to AI really helps explain I mentioned, the only way that we were able Love to hear you speak about What kind of partners are-- One of the things that we're heavily-invested in and one of the things that's popping up be able to insert that particular One of the things we have in Watson Studio, for example, to be built, the data has to be business-ready. What's the strategy, how are you That's really the first step. that you had mentioned-- opportunity to bring to the market. from an education perspective, to help And all of that underpinned with the trust. So, the ladder's great. A lot of the discussion last year was, So, I feel like the maturity of this ladder to AI Smart, and the ladder just allows you It allows our customers to take an inventory of, And an opportunity for developers too, great. and the whole street's shut down, Howard Street.

ENTITIES

Entity	Category	Confidence
Julie Lockner	PERSON	0.99+
Jay Limburn	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Jay	PERSON	0.99+
Julie	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
San Francisco	LOCATION	0.99+
80%	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
last year	DATE	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
second step	QUANTITY	0.99+
two parts	QUANTITY	0.99+
first	QUANTITY	0.99+
Hadoop	ORGANIZATION	0.99+
Howard Street	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Moscone Center	LOCATION	0.99+
10th year	QUANTITY	0.99+
ODPi Foundation	ORGANIZATION	0.99+
six weeks	QUANTITY	0.99+
One	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
30,000 people	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
four days	QUANTITY	0.98+
Today	DATE	0.98+
Stu	PERSON	0.98+
MySQL	TITLE	0.98+
Valentine's Day	EVENT	0.98+
20 years ago	DATE	0.98+
iCure	ORGANIZATION	0.97+
two guests	QUANTITY	0.97+
Watson Studio	TITLE	0.97+
2019	DATE	0.97+
this year	DATE	0.96+
today	DATE	0.96+
DevOps	TITLE	0.95+
one group	QUANTITY	0.95+
one	QUANTITY	0.94+
Cloud	TITLE	0.93+
single location	QUANTITY	0.92+
IBM Data	ORGANIZATION	0.92+
Project Egeria	ORGANIZATION	0.9+
this week	DATE	0.9+
one example	QUANTITY	0.9+

Rob Thomas, IBM | Change the Game: Winning With AI

>> Live from Times Square in New York City, it's The Cube covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to The Cube's special presentation. We're covering IBM's announcements today around AI. IBM, as The Cube does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on The Cube, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching The Cube.

Published Date : Sep 13 2018

SUMMARY :

brought to you by IBM. Rob, great to see you. what you guys have going on, it's great when you have on the phases, the waves that we've seen where you want to go, you're the BI data warehouse modernization, a data catalog, if you and get the infrastructure right with, and help them get to a first and I think we have a as the architecture to news that you guys announced that are looking to do new things, I point it as that server, I get the data, of processing power on the the edge, where essentially, it's not just the IBM Cloud, Is that the right way to think about it? We need to give them seamless connectivity certain laws of the land, that is the runtime for people to consume. and it's not a lot of time, register for the event tonight. Yeah, it's going to be fun, we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
six months	QUANTITY	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Monday	DATE	0.99+
last year	DATE	0.99+
one month	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Germany	LOCATION	0.99+
New York City	LOCATION	0.99+
one	QUANTITY	0.99+
Vince Lombardi	PERSON	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
Watson Studio	TITLE	0.99+
Cube	ORGANIZATION	0.99+
ibm.com/WinWithAI	OTHER	0.99+
two	QUANTITY	0.99+
Times Square	LOCATION	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
First	QUANTITY	0.99+
today	DATE	0.98+
Data Science Elite	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
two steps	QUANTITY	0.98+
Scala	TITLE	0.98+
Python	TITLE	0.98+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
Barney	ORGANIZATION	0.98+
Javits Center	LOCATION	0.98+
Watson	TITLE	0.98+
This evening	DATE	0.98+
IBM Analytics	ORGANIZATION	0.97+
one step	QUANTITY	0.97+
Stack Overflow	ORGANIZATION	0.96+
Cloud	TITLE	0.96+
seven-figure deals	QUANTITY	0.96+
Terminal 5	LOCATION	0.96+
Watson Applications	TITLE	0.95+
Watson Machine Learning	TITLE	0.94+
a month	QUANTITY	0.94+
50 million developers	QUANTITY	0.92+

Alan Gates, Hortonworks | Dataworks Summit 2018

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
Alan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Alan Gates	PERSON	0.99+
four years	QUANTITY	0.99+
James	PERSON	0.99+
ING	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Apache	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Java	TITLE	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
100%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Atlas	ORGANIZATION	0.99+
DataWorks Summit 2018	EVENT	0.98+
Data Steward Studio	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
NiFi	ORGANIZATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
Hadoop	ORGANIZATION	0.98+
one platform	QUANTITY	0.97+
2018	EVENT	0.97+
both	QUANTITY	0.97+
millions of events	QUANTITY	0.96+
Hbase	ORGANIZATION	0.95+
Tablo	TITLE	0.95+
ODPI	ORGANIZATION	0.94+
Big Data Analytics	ORGANIZATION	0.94+
One	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
NiFi	COMMERCIAL_ITEM	0.92+
day two	QUANTITY	0.92+
about five	QUANTITY	0.91+
Kafka	TITLE	0.9+
Zeppelin	ORGANIZATION	0.89+
Atlas	TITLE	0.85+
Ranger	ORGANIZATION	0.84+
Jupyter	ORGANIZATION	0.83+
first	QUANTITY	0.82+
Apache Atlas	ORGANIZATION	0.82+
Hadoop	TITLE	0.79+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Telco	ORGANIZATION	0.99+
Madhu	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jamie Engesser	PERSON	0.99+
Madhu Kochar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
second step	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
two activities	QUANTITY	0.99+
San Jose	LOCATION	0.99+
second thing	QUANTITY	0.99+
Hortonwork	ORGANIZATION	0.99+
2015	DATE	0.99+
first	QUANTITY	0.99+
first thought	QUANTITY	0.98+
two things	QUANTITY	0.98+
eight months	QUANTITY	0.98+
three things	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.97+
two next guests	QUANTITY	0.97+
both	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Apache	ORGANIZATION	0.97+

Steve Roberts, IBM– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: Covering DataWorks Summit, Europe 2017, brought to you by Hortonworks. >> Welcome back to Munich everybody. This is The Cube. We're here live at DataWorks Summit, and we are the live leader in tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. >> Yeah, good to see you Dave. >> So we're here in Munich, a lot of action, good European flavor. It's my second European, formerly Hadoop Summit, now DataWorks. What's your take on the show? >> I like it. I like the size of the venue. It's the ability to interact and talk to a lot of the different sponsors and clients and partners, so the ability to network with a lot of people from a lot of different parts of the world in a short period of time, so it's been great so far and I'm looking forward to building upon this and towards the next DataWorks Summit in San Jose. >> Terri Virnig VP in your organization was up this morning, had a keynote presentation, so IBM got a lot of love in front of a fairly decent sized audience, talking a lot about the sort of ecosystem and that's evolving, the openness. Talk a little bit about open generally at IBM, but specifically what it means to your organization in the context of big data. >> Well, I am from the power systems team. So we have an initiative that we have launched a couple years ago called Open Power. And Open Power is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive open power capability into the market and have power servers delivered not just through IBM, but through a whole ecosystem of partners. This compliments quite well with the Apache, Hadoop, and Spark philosophy of openness as it relates to software stack. So our story's really about being able to marry the benefits of open ecosystem for open power as it relates to the system infrastructure technology, which drives the same time to innovation, community value, and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community and we're a key active member, as is Hortonworks with the ODPi organization forwarding the standards around Hadoop. So this is a one, two combo of open Hadoop, open Spark, either from Hortonworks or from IBM sitting on the open power platform built for big data. No other story really exists like that in the market today, open on open. >> So Terri mentioned cognitive systems. Bob Picciano has recently taken over and obviously has some cognitive chops, and some systems chops. Is this a rebranding of power? Is it sort of a layer on top? How should we interpret this? >> No, think of it more as a layer on top. So power will now be one of the assets, one of the sort of member family of the cognitive systems portion on IBM. System z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System z. So power systems as a server really built for big data and machine learning, in particular our S822LC for high performance computing. This is a server which is landing very well in the deep learning, machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8x bandwidth benefits CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. >> So I was going to ask you actually, sort of what make power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's Law starting to hit its peak, that innovation is going to come from other places. I love that narrative 'cause it's really combinatorial innovation that's going to lead us in the next 50 years, but can we stay on that thread for a bit? What makes power so substantially unique, uniquely suited and qualified to run cognitive systems and big data? >> Yeah, it actually starts with even more of the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're writing iterative computation, the same data set as when you're doing model training, these can all benefit from highly parallelized workloads, which can benefit from this 4x thread advantage. But of course to do this, you also need large, fast memory, and we have six times more cache per core versus Broadwell, so this just means you have a lot of memory close to the processor, driving that throughput that you require. And then on top of that, now we get to the ability to add accelerators, and unique accelerators such as I mentioned the NVIDIA in the links scenario for GPU or using the open CAPI as an approach to attach FPGA or Flash to get access speeds, processor memory access speeds, but with an attached acceleration device. And so this is economies of scale in terms of being able to offload specialized compute processing to the right accelerator at the right time, so you can drive way more throughput. The upper bounds are driving workload through individual nodes and being able to balance your IO and compute on an individual node is far superior with the power system server. >> Okay, so multi-threaded, giant memories, and this open CAPI gives you primitive level access I guess to a memory extension, instead of having to-- >> Yeah, pluggable accelerators through this high speed memory extension. >> Instead of going through, what I often call the horrible storage stack, aka SCSI, And so that's cool, some good technology discussion there. What's the business impact of all that? What are you seeing with clients? >> Well, the business impact is not everyone is going to start with supped up accelerated workloads, but they're going to get there. So part of the vision that clients need to understand is to begin to get more insights from their data is, it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly, it's the flexibility, being able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately, it's about improving time to insights. So with accelerators and with large memory, running workloads on a similar configured clusters, you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPC-DS queries on Hortonworks running on Linux and power servers, we're able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. 'Cause power family is a broad family that now includes 1U, 2U, scale out servers, along with our 192 core horsepowers for enterprise grade. So we can directly price compete on a scale out box, but we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. >> So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play. Or, for the same amount of money, I can do more work. >> Absolutely >> Is that fair? >> Absolutely, now in some cases, especially in the Hadoop space, the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale up storage model, you're going to have so many nodes no matter what 'cause you can only put so much storage on the node. So in that case, >> You're scaling storage. >> Your clusters can look the same, but you can put a lot more workload on that cluster or you can bring in IBM, a solution like IBM Spectrum Scale our elastic storage server, which allows you to essentially pull that storage off the nodes, put it in a storage appliance, and at that point, you now have high speed access to storage 'cause of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to a classic Hadoop deployment. You can get that high speed access in a storage appliance mode with the resiliency at far less cost 'cause you don't need 3x replication, you just have about a 30% overhead for the software erasure coding. And now with your compete nodes, you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node, which is a classic, how big is my cluster calculation. That just doesn't work if you get over 10 nodes, 'cause now you're just starting to get to the point where you're wasting something right? You're either wasting storage capacity or typically you're wasting compute capacity 'cause you're over provisioned on one side or the other. >> So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently, more efficiently? >> You can right size the compute and storage for your cluster, but also importantly is you gain the flexibility with that storage tier, that data plan can be used for other non-HDFS workloads. You can still have classic POSIX applications or you may have new object based applications and you can with a single copy of the data, one virtual file system, which could also be geographically distributed, serving both Hadoop and non-Hadoop workloads, so you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. >> So that's a return on asset play. You got an asset that's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you got extra capacity sitting here. >> It's a TCO play, but it's also a time saver. It's going to get you time to insight faster 'cause you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data, so having a common data layer removes that delay. >> Okay, 'cause it's HDFS ready I don't have to essentially move data from my existing systems into this new stovepipe. >> Yeah, we just present it through the HDFS API as it lands in the file system from the original application. >> So now, all this talk about rings of flexibility, agility, etc, what about cloud? How does cloud fit into this strategy? What do are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer. You don't use that term anymore, but we do. When we get our bill it says SoftLayer still, but any rate, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? >> Well the cloud is still, really the born on the cloud philosophy of IBM software analytics team is still very much the motto. So as you see in the data science experience, which was launched last year, born in the cloud, all our analytics packages whether it be our BigInsights software or our business intelligence software like Cognos, our future generations are landing first in the cloud. And of course we have our whole arsenal of Watson based analytics and APIs available through the cloud. So what we're now seeing as well as we're taking those born in the cloud, but now also offering a lot of those in an on-premise model. So they can also participate in the hybrid model, so data science experience now coming on premise, we're showing it at the booth here today. Bluemix has a on premise version as well, and the same software library, BigInsights, Cognos, SPSS are all available for on prem deployment. So power is still ideal place for hosting your on prem data and to run your analytics close to the data, and now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and System z's based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies, of course, are still not comfortable putting all their jewels in the cloud, so typically there's going to be a mix and match. We are expanding the footprint for cloud based offerings both in terms of power servers offered through SoftLayer, but also through other cloud providers, Nimbix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source, deep learning frameworks, packaged by IBM, optimized for Power in an easily deployed package with IBM support available. And that's, could be deployed on premise in a power server, but also available on a pay per drink purpose through the Nimbix cloud. >> All right, we covered a lot of ground here. We talked strategy, we talked strategic fit, which I guess is sort of a adjunct to strategy, we talked a little bit about the competition and where you differentiate, some of the deployment models, like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event, just maybe summarize for use? >> Yeah, no absolutely. As it relates to IBM, and Hadoop, and Spark, we really have the full stack support, the rich analytics capabilities that I was mentioning, deep insight, prescriptive insights, streaming analytics with IBM Streams, Cognos Business Intelligence, so this set of technologies is available for both IBMs, Hadoop stack, and Hortonworks Hadoop stack today. Our BigInsights and IOP offering, is now out for tech preview, their next release their 4.3 release, is available for technical preview will be available for both Linux on Intel, Linux on power towards the end of this month, so that's kind of one piece of new Hadoop news at the analytics layer. As it relates to power systems, as Hortonworks announced this morning, HDP 2.6 is now available for Linux on power, so we've been partnering closely with Hortonworks to ensure that we have an optimized story for HDP running on power system servers as the data point I shared earlier with the 70% improved queries per hour. At the storage layer, we have a work in progress to certify Hortonworks, to certify Spectrum Scale file system, which really now unlocks abilities to offer this converged storage alternative to the classic Hadoop model. Spectrum Scale actually supports and provides advantages in both a classic Hadoop model with local storage or it can provide the flexibility of offering the same sort of multi-application support, but in a scale out model for storage that it also has the ability to form a part of a storage appliance that we call Elastic Storage Server, which is a combination of power servers and high density storage enclosures, SSD or spinning disk, depending upon the, or flash, depending on the configuration, and that certification will now have that as an available storage appliance, which could underpin either IBM Open Platform or HDP as a Hadoop data leg. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through converged storage footprint, but more importantly, provides you that flexibility of not having to create data copies to support multiple applications. >> Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always, but in the last 20 years, major, major investments in open source. They continue on, we're seeing it here. Steve, people are filing in. The evening festivities are about to begin. >> Steve: Yeah, yeah, the party will begin shortly. >> Really appreciate you coming on The Cube, thanks very much. >> Thanks a lot Dave. >> You're welcome. >> Great to talk to you. >> All right, keep it right there everybody. John and I will be back with a wrap up right after this short break, right back.

Published Date : Apr 6 2017

SUMMARY :

brought to you by Hortonworks. Steve, good to see you again. Munich, a lot of action, so the ability to network and that's evolving, the openness. as it relates to the system and some systems chops. from the processor to the GPU. in the next 50 years, and being able to balance through this high speed memory extension. What's the business impact of all that? and be able to exploit some of these I can do the same amount of especially in the Hadoop space, 'cause of course the network and you can with a You don't have to dedicate It's going to get you I don't have to essentially move data as it lands in the file system to what you guys are and to run your analytics a adjunct to strategy, to ensure that we have an optimized story but in the last 20 years, Steve: Yeah, yeah, the you coming on The Cube, John and I will be back with a wrap up

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
Bob Picciano	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Terri	PERSON	0.99+
3x	QUANTITY	0.99+
six times	QUANTITY	0.99+
70%	QUANTITY	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
two knobs	QUANTITY	0.99+
Bluemix	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
eight threads	QUANTITY	0.99+
Linux	TITLE	0.99+
Hadoop	TITLE	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
Nimbix	ORGANIZATION	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.98+
SoftLayer	TITLE	0.98+
second	QUANTITY	0.97+
Hadoop Summit	EVENT	0.97+
Intel	ORGANIZATION	0.97+
Spark	TITLE	0.97+
IBMs	ORGANIZATION	0.95+
single copy	QUANTITY	0.95+
end of this month	DATE	0.95+
Watson	TITLE	0.95+
S822LC	COMMERCIAL_ITEM	0.94+
Europe	LOCATION	0.94+
this morning	DATE	0.94+
firstly	QUANTITY	0.93+
HDP 2.6	TITLE	0.93+
first	QUANTITY	0.93+
HDFS	TITLE	0.91+
one piece	QUANTITY	0.91+
Apache	ORGANIZATION	0.91+
30%	QUANTITY	0.91+
ODPi	ORGANIZATION	0.9+
DataWorks Summit Europe 2017	EVENT	0.89+
two threads per core	QUANTITY	0.88+
SoftLayer	ORGANIZATION	0.88+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2014	DATE	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Mertic	PERSON	0.99+
Mike Olson	PERSON	0.99+
Shaun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Shaun Connolly	PERSON	0.99+
Centric	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2016	DATE	0.99+
4.1 billion	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
two	QUANTITY	0.99+
100 million	QUANTITY	0.99+
five	QUANTITY	0.99+
2011	DATE	0.99+
Mount Fuji	LOCATION	0.99+
US	LOCATION	0.99+
seven	QUANTITY	0.99+
185 million	QUANTITY	0.99+
eight years	QUANTITY	0.99+
four years	QUANTITY	0.99+
10x	QUANTITY	0.99+
Dahl Jeppe	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
one	QUANTITY	0.99+
MuleSoft	ORGANIZATION	0.99+
2025	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
2012	DATE	0.99+
Munich, Germany	LOCATION	0.98+
Hadoop	TITLE	0.98+
DataWorks 2017	EVENT	0.98+
Wei Wang	PERSON	0.98+
Wei	PERSON	0.98+
10%	QUANTITY	0.98+
eight years	QUANTITY	0.98+
20x	QUANTITY	0.98+
Hortonworks Hadoop Summit	EVENT	0.98+
end of 2016	DATE	0.98+
three billion dollars	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for ODPi: