Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Common SQL: