Kickoff | theCUBE NYC 2018

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

ENTITIES

Entity	Category	Confidence
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Diane Greene	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
John	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
$30	QUANTITY	0.99+
New York	LOCATION	0.99+
2010	DATE	0.99+
IBM	ORGANIZATION	0.99+
Doug Cutting	PERSON	0.99+
Mike Olson	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Dallas	LOCATION	0.99+
O'Reilly	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
five	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Abi Mehda	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
$2.5 billion	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
$40 billion	QUANTITY	0.99+
17 employees	QUANTITY	0.99+
VMworld	ORGANIZATION	0.99+
Today	DATE	0.99+
Impala	ORGANIZATION	0.99+
Nine years	QUANTITY	0.99+
four years ago	DATE	0.98+
last night	DATE	0.98+
last decade	DATE	0.98+
Strata Data Conference	EVENT	0.98+
Strata Conference	EVENT	0.98+
Hadoop Summit	EVENT	0.98+
ninth year	QUANTITY	0.98+
Four years ago	DATE	0.98+
two worlds	QUANTITY	0.97+
five companies	QUANTITY	0.97+
today	DATE	0.97+
Strata Hadoop	EVENT	0.97+
Hadoop World	EVENT	0.96+
CUBE	ORGANIZATION	0.96+
Google Next	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.95+
this year	DATE	0.95+
Spark	ORGANIZATION	0.95+
US	LOCATION	0.94+
CUBENYC	EVENT	0.94+
Strata O'Reilly	ORGANIZATION	0.93+
next decade	DATE	0.93+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Greg Sands, Costanoa | Big Data NYC 2017

(electronic music) >> Host: Live from Midtown Manhattan it's The Cube! Covering Big Data New York City 2017, brought to you by Silicon Angle Media, and its Ecosystem sponsors. >> Okay, welcome back everyone. We are here live, The Cube in New York City for Big Data NYC, this is our fifth year, doing our own event, not with O'Reilly or Cloud Era at Strata Data, which as Hadoop World, Strata Conference, Strata Hadoop, now called Strata Data, probably called Strata AI next year, we're The Cube every year, bringing you all the great data, and what's going on. Entrepreneurs, VCs, thought leaders, we interview them and bring that to you. I'm John Furrier with our next guest, Greg Sands, who's the managing director and founder of Costa Nova ventures in Palo Alto, started out as an entrepreneur himself, then single shingle out there, now he's a big VC firm on a third fund. >> On the third fund. >> Third fund. How much in that fund? >> 175 million dollar fund. >> So now you're a big firm now, congratulations, and really great to see your success. >> Thanks very much. I mean, we're still very much an early stage boutique focused on companies that change the way the world does business, but it is the case that we have a bigger team and a bigger fund, to go do the same thing. >> Well you've been great to work with, I've been following you, we've known each other for a while, watched you left Sir Hill and start Costanova, but what's interesting is that, I can kind of joke and kid you, the VC inside joke about being a big firm, because I know you want to be small, and like to be small, help entrepreneurs, that's your thing. But it's really not a big firm, it's a few partners, but a lot of people helping companies, that's your ethos, that's what you're all about at your firm. Take a minute to just share with the folks the kinds of things you do and how you get involved in companies, you're hands on, you roll up your sleeves. You get out of the way at the right time, you help when you can, share your ethos. >> Yeah, absolutely so the way we think of it is, combining the craft of old school venture capital, with a modern operating team, and so since most founder these days are product-oriented, our job is to think like product people, not think like investors. So we think like product people, we do product level analysis, we do customer discovery, we do, we go ride along on sales calls when we're making investment decisions. And then we do the things that great venture capitalists have done for years, and so for example, at Alatian, who I know has been on the show today, we were able to incubate them in our office for a year, I had many conversations with Sathien after he'd sold the first two or three customers. Okay, who's the next person we hire? Who isn't a founder? Who's going to go out and sell? What does that person look like? Do you go straight to a VP? Or do you hire an individual contributor? Do you hire someone for domain, or do you hire someone for talent? And that's the thing that we love doing. Now we've actually built out an operating team so marketing partner, Martino Alcenco, and Jim Wilson as a sales partner, to really help turn that into a program, so that they can, we can take these founders who find product market fit, and say, how do we help you build the right sales process and marketing process, sales team and marketing team, for your company, your customer, your product? >> Well it's interesting since you mention old school venture capital, I'll get into some of the dynamics that are going on in Silicon valley, but it's important to bring that forward, because now with cloud you can get to critical mass on the fly wheel, on economics, you can see the visibility faster now. >> Greg: Absolutely. >> So the game of the old school venture capitalist is all the same, how do you get to cruising altitude, whatever metaphor you want to use, the key was getting there, and sometimes it took a couple of rounds, but now you can get these companies with five million, maybe $10 million funding, they can have unit economics visibility, scales insight, then the scale game comes in, so that seems to be the secret trick right now in venture is, don't overspend, keep the valuation in range and allows you to look for multiple exits potentially, or growth. Talk about that dynamic, because this is like, I call it the hour glass. You get through the hour glass, everyone's down here, but if you can sneak through and get the visibility on the economics, then you grow quickly. >> Absolutely. I mean, it's exactly right an I haven't heard the hour glass metaphor before but I like it. You want to basically get through the narrows of product market fit and the beginnings of scalable sales and marketing. You don't need to know all the answers, but you can do that in a capital-efficient way, building really solid foundations for future explosive growth, look, everybody loves fast growth and big markets, and being grown into. But the number of people who basically don't build those foundations and then say, go big or go home! And they take a ton of money, and they go spend all the money, doing things that just fundamentally don't work, and they blow themselves up. >> Well this is the hourglass problem. You have, once you get through that unique economics, then you have true scale, and value will increase. Everybody wins there so it's about getting through that, and you can get through it fast with good mentoring, but here's the challenge that entrepreneurs fall into the trap. I call it the, I think I made it trap. And what happens is they think they're on the other side of the hourglass, but they still haven't even gone through the straight and narrow yet, and they don't know it. And what they do is they over fund and implode. That seems to be a major trap I see a lot of entrepreneurs fall into, while I got a 50 million pre on my B round, or some monster valuation, and they get way too much cash, and they're behaving as if they're scaling, and they haven't even nailed it yet. >> Well, I think that's right. So there's certainly, there are stages of product market fit, and so I think people hit that first stage, and they say, oh I've got it. And they try to explode out of the gates. And we, in fact I know one good example of somebody saying, hey, by the way, we're doing great in field sales, and our investors want us to go really fast, so we are going to go inside and we, my job was to hire 50 inside people, without ever having tried it. And so we always preach crawl, walk, run, right? Hire a couple, see how it works. Right, in a new channel. Or a new category, or an adjacent space, and I think that it's helpful to have an investor who has seen the whole picture to say, yeah, I know it looks like light at the end of the tunnel, but see how it's a relatively small dot? You still got to go a little farther, and then the other thing I say is, look, don't build your company to feed your venture capitalist ego. Right? People do these big rounds of big valuations, and the big dog investors say, go, go, go! But, you're the CEO. Your job is analyze the data. >> John: You can find during the day (laughs). >> And say, you know, given what we know, how fast should we go? Which investments should we make? And you've got to own that. And I think sometimes our job is just to be the pulling guard and clear space for the CEO to make good decisions. >> So you know I'm a big fan, so my bias is pretty much out there, love what you guys are doing. Tim Carr is a Pivot North doing the same thing. Really adding value, getting down and dirty, but the question that entrepreneurs always ask me and talk privately, not about you, but in general, I don't want the VC to get in the way. I want them, I don't want them to preach to me, I don't want too many know-it-alls on my board, I want added value, but again, I don't want the preaching, I don't want them to get in the way, 'cause that's the fear. I'm not saying the same about VCs in general, but that's kind of the mentality of an entrepreneur. I want someone who's going to help me, be in the boat with me, but not be in my way. How do you address that concern to the founders who think, not think like that, but might have a fear. >> Well, by the way, I think it's a legitimate fear, and I think it actually is uncorrelated with added value, right? I think the idea that the board has certain responsibilities, and management has certain responsibilities, is incredibly important. And I think, I can speak for myself in saying, I'm quite conscious of not crossing that line, I think you talk. >> John: You got to build a return, that's the thing. >> But ultimately I would say to an entrepreneur, I'd just say, hey look, call references. And by the way, here are 30 names and phone numbers, and call any one of them, because I think that people who are, so a venture capital know-it-all, in the board room, telling CEOs what to do, destroys value. It's sand in the gears, and it's bad for the company. >> Absolutely, I agree 100% >> And some of my, when I talk about being a pulling guard for the CEO, that's what I'm talking about, which is blocking people who are destructive. >> And rolling the block for a touchdown, kind of use the metaphor. Adding value, that's the key, and that's why I wanted to get that out there because most guys don't get that nuance, and entrepreneurs, especially the younger ones. So it's good and important. Okay, let's talk about culture, obviously in Silicon Valley, I get, reading this morning in the Wymo guy, and they're writing it, that's the Silicon Valley, that's not crazy, there's a lot of great people in Silicon Valley, you're one of them. The culture's certainly an innovative culture, there's been some things in the press, inclusion and diversity, obviously is super important. This whole brogrammer thing that's been kind of kicked around. How are you dealing with all that? Because, you know, this is a cultural shift, but I think it's being made out more than it really is, but there's still our core issues, your thoughts on the whole inclusion and diversity, and this whole brogrammer blowback thing. >> Yeah, well so I think, so first of all, really important issues, glad we're talking about them, and we all need to get better. And to me the question for us has been, what role do we play? And because I would say it is a relatively small subset of the tech industry, and the venture capital industry. At the same time the behavior of that has become public is appalling. It's appalling and totally unacceptable, and so the question is, okay, how can we be a part of the stand-up part of the ecosystem, and some of which is calling things out when we see them. Though frankly we work with and hang out with people and we don't see them that often, and then part of which is, how do we find a couple of ways to contribute meaningfully? So for example this summer we ran what we called the Costanova Access Fellowship, intentionally, trying to provide first opportunity and venture capital for people who traditionally haven't had as much access. We created an event in the spring called, Seat at the Table, really, particularly around women in the tech industry, and it went so well that we're running it in New York on October 19th, so if you're a woman in tech in New York, we'd love to see you then. And we're just trying to figure-- >> You're doing it in an authentic way though, you're not really doing it from a promotional standpoint. It's legit. >> Yeah, we're just trying to do, you know, pick off a couple of things that we can do, so that we can be on the side of the good guys. >> So I guess what you're saying is just have high integrity, and be part of the solution not part of the problem. >> That's right, and by the way, both of these initiatives were ones that were kicked off in late 2016, so it's not a reaction to things like binary capital, and the problems at uper, both of which are appalling. >> Self-awareness is critical. Let's get back to the nuts and bolts of the real reason why I wanted you to come on, one was to find out how much money you have to spend for the entrepreneurs that are watching. Give us the update on the last fund, so you got a new fund that you just closed, the new fund, fund three. You have your other funds that are still out there, and some funds reserved, which, what's the number amount, how much are you writing checks for? Give the whole thesis. >> Absoluteley. So we're an early stage investor, so we lead series A and seed financing companies that change the way the world does business, so up and down the stack, a business-facing software, data-driven applications. Machine-learning and AI driven applications. >> John: But the filter is changing the way the world works? >> The way, yes, but in particularly the way the world does business. You can think of it as a business-facing software stack. We're not social media investors, it's not what we know, it's not what we're good at. And it includes security and management, and the data stack and-- >> Joe: Enterprise and emerging tech. >> That's right. And the-- >> And every crazy idea in between. >> That's right. (laughs) Absolutely, and so we're participate in or leave seed financings as most typically are half a million to maybe one and a quarter, and we'll lead series A financing, small ones might be two or two and a half million dollars at the outer edge is probably a six million dollar check. We were just opening up in the next couple of days, a thousand square feet of incubation space at world headquarters at Palo Alto. >> John: Nice. >> So Alation, Acme Ticketing and Zen IQ are companies that we invested in. >> Joe: What location is this going to be at? >> That's, near the Fills in downtown Palo Alto, 164 staff, and those three companies are ones where we effectively invested at formation and incubated it for a year, we love doing that. >> At the hangout at Philsmore and get the data. And so you got some funds, what else do you have going on? 175 million? >> So one was a $100 million fund, and then fund two was $135 million fund, and the last investment of fund two which we announced about three weeks ago was called Roadster, so it's ecommerce enablement for the modern dealerships. So Omnichannel and Mobile First infrastructure for auto-dealers. We have already closed, and had the first board meeting for the first new investment of fund three, which isn't yet announced, but in the land of computer vision and deep learning, so a couple of the subjects that we care deeply about, and spend a lot of time thinking about. >> And the average check size for the A round again, seed and A, what do you know about the? The lowest and highest? >> The average for the seed is half a million to one and a quarter, and probably average for a series A is four or five. >> And you'll lead As. >> And we will lead As. >> Okay great. What's the coolest thing you're working on right now that gets you excited? It doesn't have to be a portfolio company, but the research you're doing, thing, tires you're kicking, in subjects, or domains? >> You know, so honestly, one of the great benefits of the venture capital business is that I get up and my neurons are firing right away every day. And I do think that for example, one of the things that we love is is all of the adulant infrastructure and so we've got our friends at Victor Ops that are in the middle of that space, and the thinking about how the modern programmer works, how everybody-- >> Joe: Is security on your radar? >> Security is very much on our radar, in fact, someone who you should have on your show is Asheesh Guptar, and Casey Ella, so she's just joined Bug Crowd as the CEO and Casey moves over to CTO, and the word Bug Bounty was just entered into the Oxford Dictionary for the first time last week, so that to me is the ultimate in category creation. So security and dev ops tools are among the things that we really like. >> And bounties will become the norm as more and more decentralized apps hit the scene. Are you doing anything on decentralized applications? I'm not saying Blockchain in particular, but Blockchain like apps, distributing computing you're well versed on. >> That's right, well we-- >> Blockchain will have an impact in your area. >> Blockchain will have an impact, we just spent an hour talking about it in the context our off site in Decosona Lodge in Pascadero, it felt like it was important that we go there. And digging into it. I think actually the edge computing is actually more actionable for us right now, given the things that we're, given the things that we're interested in, and we're doing and they, it is just fascinating how compute centralizes and then decentralizes, centralizes and then decentralizes again, and I do think that there are a set of things that are fascinating about what your process at the edge, and what you send back to the core. >> As Pet Gelson here said in the QU, if you're not out in front of that next wave, you're driftwood, a lot of big waves coming in, you've seen a lot of waves, you were part of one that changed the world, Netscape browser, or the business plan for that first project manager, congratulations. Now you're at a whole nother generation. You ready? (laughs) >> Absolutely, I'm totally ready, I'm ready to go. >> Greg Sands here in The Cube in New York City, part of Big Data NYC, more live coverage with The Cube after this short break, thanks for watching. (electronic jingle) (inspiring electronic music)

Published Date : Sep 29 2017

SUMMARY :

brought to you by Silicon Angle Media, and founder of Costa Nova ventures in Palo Alto, How much in that fund? congratulations, and really great to see your success. but it is the case that we have the kinds of things you do and how you get And that's the thing that we love doing. I'll get into some of the dynamics that are going on is all the same, how do you get to But the number of people who basically but here's the challenge that and the big dog investors say, go, go, go! for the CEO to make good decisions. but that's kind of the mentality of an entrepreneur. Well, by the way, I think it's a legitimate fear, And by the way, here are 30 names and phone numbers, And some of my, and entrepreneurs, especially the younger ones. and so the question is, okay, You're doing it in an authentic way though, so that we can be on the side of the good guys. not part of the problem. and the problems at uper, of the real reason why I wanted you to come on, companies that change the way the world does business, and the data stack and-- And the-- and a half million dollars at the outer edge So Alation, Acme Ticketing and Zen IQ That's, near the Fills in downtown Palo Alto, And so you got some funds, and the last investment of fund two The average for the seed is but the research you're doing, and the thinking about how the modern are among the things that we really like. more and more decentralized apps hit the scene. and what you send back to the core. or the business plan for that first I'm ready to go. Greg Sands here in The Cube in New York City,

ENTITIES

Entity	Category	Confidence
Greg Sands	PERSON	0.99+
Asheesh Guptar	PERSON	0.99+
John	PERSON	0.99+
two	QUANTITY	0.99+
Tim Carr	PERSON	0.99+
John Furrier	PERSON	0.99+
Costa Nova	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Joe	PERSON	0.99+
October 19th	DATE	0.99+
Costanova	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
$10 million	QUANTITY	0.99+
New York	LOCATION	0.99+
$100 million	QUANTITY	0.99+
five million	QUANTITY	0.99+
Casey Ella	PERSON	0.99+
$135 million	QUANTITY	0.99+
Zen IQ	ORGANIZATION	0.99+
Omnichannel	ORGANIZATION	0.99+
50 million	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Pascadero	LOCATION	0.99+
Greg	PERSON	0.99+
New York City	LOCATION	0.99+
100%	QUANTITY	0.99+
50	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
Jim Wilson	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Casey	PERSON	0.99+
Alation	ORGANIZATION	0.99+
half a million	QUANTITY	0.99+
30 names	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
175 million	QUANTITY	0.99+
first	QUANTITY	0.99+
Victor Ops	ORGANIZATION	0.99+
Pet Gelson	PERSON	0.99+
both	QUANTITY	0.99+
last week	DATE	0.99+
four	QUANTITY	0.99+
three customers	QUANTITY	0.99+
late 2016	DATE	0.99+
fifth year	QUANTITY	0.99+
Cloud Era	ORGANIZATION	0.99+
Acme Ticketing	ORGANIZATION	0.98+
164 staff	QUANTITY	0.98+
NYC	LOCATION	0.98+
five	QUANTITY	0.98+
Oxford Dictionary	TITLE	0.98+
Midtown Manhattan	LOCATION	0.98+
Alatian	ORGANIZATION	0.98+
175 million dollar	QUANTITY	0.98+
next year	DATE	0.98+
today	DATE	0.97+
first time	QUANTITY	0.97+
third fund	QUANTITY	0.97+
first board	QUANTITY	0.97+
Costanoa	PERSON	0.97+
a year	QUANTITY	0.97+
six	QUANTITY	0.97+
one	QUANTITY	0.97+
one and a quarter	QUANTITY	0.96+
Strata Conference	EVENT	0.96+
The Cube	TITLE	0.96+
Strata AI	EVENT	0.96+
million dollar	QUANTITY	0.96+
2017	EVENT	0.95+
first project	QUANTITY	0.95+
two and a half million dollars	QUANTITY	0.95+
Hadoop World	EVENT	0.94+
Sathien	PERSON	0.93+
single shingle	QUANTITY	0.93+
first two	QUANTITY	0.93+
an hour	QUANTITY	0.92+
this summer	DATE	0.92+
first stage	QUANTITY	0.92+
Bug Crowd	ORGANIZATION	0.91+

Matt Maccaux, Dell EMC | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan. It's the CUBE. Covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsor. (electronic music) >> Hey, welcome back everyone, live here in New York. This is the CUBE here in Manhattan for Big Data NYC's three days of coverage. We're one day three, things are starting to settle in, starting to see the patterns out there. I'll say it's Big Data week here, in conjunction with Hadoop World, formerly known as Strata Conference, Strata-Hadoop, Strata-Data, soon to be Strata-AI, soon to be Strata-IOT. Big Data, Mike Maccaux who's the Global Big Data Practice Lead at Dell EMC. We've been in this world now for multiple years and, well, what a riot it's been. >> Yeah, it has. It's been really interesting as the organizations have gone from their legacy systems, they have been modernizing. And we've sort of seen Big Data 1.0 a couple years ago. Big Data 2.0 and now we're moving on sort of the what's next? >> Yeah. >> And it's interesting because the Big Data space has really lagged the application space. You talk about microservices-based applications, and deploying in the cloud and stateless things. The data technologies and the data space has not quite caught up. The technology's there, but the thinking around it, and the deployment of those, it seems to be a slower, more methodical process. And so what we're seeing in a lot of enterprises is that the ones that got in early, have built out capabilities, are now looking for that, how do we get to the next level? How do we provide self-service? How do we enable our data scientists to be more productive within the enterprise, right? If you're a startup, it's easy, right? You're somewhere in the public cloud, you're using cloud based API, it's all fine. But if you're an enterprise, with the inertia of those legacy systems and governance and controls, it's a different problem to solve for. >> Let's just face it. We'll just call a spade a spade. Total cost of ownership was out of control. Hadoop was great, but it was built for something that tried to be something else as it evolved. And that's good also, because we need to decentralize and democratize the incumbent big data warehouse stuff. But let's face it, Hadoop is not the game anymore, it's everything else. >> Right, yep. >> Around it so, we've seen that, that's a couple years old. It's about business value right now. That seems to be the big thing. The separation between the players that can deliver value for the customers. >> Matt: Yep. >> And show a little bit of headroom for future AI things, they've seen that. And have the cloud on premise play. >> Yep. >> Right now, to me, that's the call here. What do you, do you agree? >> I absolutely see it. It's funny, you talk to organizations and they say, "We're going cloud, we're doing cloud." Well what does that mean? Can you even put your data in the cloud? Are you allowed to? How are you going to manage that? How are you going to govern that? How are you going to secure that? So many organizations, once they've asked those questions, they've realized, maybe we should start with the model of cloud on premise. And figure out what works and what doesn't. How do users actually want to self serve? What do we templatize for them? And what do we give them the freedom to do themselves? >> Yeah. >> And they sort of get their sea legs with that, and then we look at sort of a hybrid cloud model. How do we be able to span on premise, off premise, whatever your public cloud is, in a seamless way? Because we don't want to end up with the same thing that we had with mainframes decades ago, where it was, IBM had the best, it was the fastest, it was the most efficient, it was the new paradigm. And then 10 years later, organizations realized they were locked in, there was different technology. The same thing's true if you go cloud native. You're sort of locked in. So how do you be cloud agnostic? >> How do you get locked in a cloud native? You mean with Amazon? >> Or any of them, right? >> Okay. >> So they all have their own APIs that are really good for doing certain things. So Google's TensorFlow happens to be very good. >> Yeah. Amazon EMR. >> But you build applications that are using those native APIS, you're sort of locked. And maybe you want to switch to something else. How do you do that? So the idea is to-- >> That's why Kubernetes is so important, right now. That's a very key workload and orchestration container-based system. >> That's right, so we believe that containerization of workloads that you can define in one place, and deploy anywhere is the path forward, right? Deploy 'em on prem, deploy 'em in a private cloud, public cloud, it doesn't matter the infrastructure. Infrastructure's irrelevant. Just like Hadoop is sort of not that important anymore. >> So let me get your reaction on this. >> Yeah. So Dell EMC, so you guys have actually been a supplier. They've been the leading supplier, and now with Dell EMC across the portfolio of everything. From Dell computers, servers and what not, to storage, EMC's run the table on that for many generations. Yeah, there's people nippin' at your heels like Pure, okay that's fine. >> Sure. It's still storage is storage. You got to store the data somewhere, so storage will always be around. Here's what I heard from a CXO. This is the pattern I hear, but I'll just summarize it in one conversation. And then you can give a reaction to it. John, my life is hell. I have application development investment plan, it's just boot up all these new developers. New dev ops guys. We're going to do open source, I got to build that out. I got that, trying to get dev ops going on. >> Yep. >> That's a huge initiative. I got the security team. I'm unbundling from my IT department, into a new, difference in a reporting to the board. And then I got all this data governance crap underneath here, and then I got IOT over the top, and I still don't know where my security holes are. >> Yep. And you want to sell me what? (Matt laughs) So that's the fear. >> That's right. >> Their plates are full. How do you guys help that scenario? You walk in, actually security's pretty much, important obviously you can see that. But how do you walk into that conversation? >> Yeah, it's sort of stop the madness, right? >> (laughs) That's right. >> And all of that matters-- >> No, but this is all critical. Every room in the house is on fire. >> It is. >> And I got to get my house in order, so your comment to me better not be hype. TensorFlow, don't give me this TensorFlow stuff. >> That's right. >> I want real deal. >> Right, I need, my guys are-- >> I love TensorFlow but, doesn't put the fire out. >> They just want spark, right? I need to speed up my-- >> John: All right, so how do you help me? >> So, what we'd do is, we want to complement and augment their existing capabilities with better ways of scaling their architecture. So let's help them containerize their big data workload so that they can deploy them anywhere. Let's help them define centralized security policies that can be defined once and enforced everywhere, so that now we have a way to automate the deployment of environments. And users can bring their own tools. They can bring their data from outside, but because we have intelligent centralized policies, we can enforce that. And so with our elastic data platform, we are doing that with partners in the industry, Blue Talent and Blue Data, they provide that capability on top of whatever the customer's infrastructure is. >> How important is it to you guys that Dell EMC are partnering. I know Michael Dell talks about it all the time, so I know it's important. But I want to hear your reaction. Down in the trenches, you're in the front lines, providing the value, pulling things together. Partnerships seem to be really important. Explain how you look at that, how you guys do your partners. You mentioned Blue Talent and Blue Data. >> That's right, well I'm in the consulting organization. So we are on the front lines. We are dealing with customers day in and day out. And they want us to help them solve their problems, not put more of our kit in their data centers, on their desktops. And so partnering is really key, and our job is to find where the problems are with our customers, and find the best tool for the best job. The right thing for the right workload. And you know what? If the customer says, "We're moving to Amazon," then Dell EMC might not sell any more compute infrastructure to that customer. They might, we might not, right? But it's our job to help them get there, and by partnering with organizations, we can help that seamless. And that strengthens the relationship, and they're going to purchase-- >> So you're saying that you will put the customer over Dell EMC? >> Well, the customer is number one to Dell EMC. Net promoter score is one of the most important metrics that we have-- >> Just want to make sure get on the record, and that's important, 'cause Amazon, and you know, we saw it in Net App. I've got to say, give Net App credit. They heard from customers early on that Amazon was important. They started building into Amazon support. So people saying, "Are you crazy?" VMware, everyone's saying, "Hey you capitulated "by going to Amazon." Turns out that that was a damn good move. >> That's right. >> For Kelsinger. >> Yep. >> Look at VM World. They're going to own the cloud service provider market as an arms dealer. >> Yep. >> I mean, you would have thought that a year ago, no way. And then when they did the deal, they said, >> We have really smart leadership in the organization. Obviously Michael is a brilliant man. And it sort of trickles on down. It's customer first, solve the customer's problems, build the relationship with them, and there will be other things that come, right? There will be other needs, other workloads. We do happen to have a private cloud solution with Virtustream. Some of these customers need that intermediary step, before they go full public, with a hosted private solution using a Virtustream. >> All right, so what's the, final question, so what's the number one thing you're working on right now with customers? What's the pattern? You got the stack rank, you're requests, your deliverables, where you spend your time. What's the top things you're working on? >> The top thing right now is scaling architectures. So getting organizations past, they've already got their first 20 use cases. They've already got lakes, they got pedabytes in there. How do we enable self service so that we can actually bring that business value back, as you mentioned. Bring that business value back by making those data scientists productive. That's number one. Number two is aligning that to overall strategy. So organizations want to monetize their data, but they don't really know what that means. And so, within a consulting practice, we help our customers define, and put a road map in place, to align that strategy to their goals, the policies, the security, the GDP, or the regulations. You have to marry the business and the technology together. You can't do either one in isolation. Or ultimately, you're not going to be efficient. >> All right, and just your take on Big Data NYC this year. What's going on in Manhattan this year? What's the big trend from your standpoint? That you could take away from this show besides it becoming a sprawl of you know, everyone just promoting their wares. I mean it's a big, hyped show that O'Reilly does, >> It is. >> But in general, what's the takeaway from the signal? >> It was good hearing from customers this year. Customer segments, I hope to see more of that in the future. Not all just vendors showing their wares. Hearing customers actually talk about the pain and the success that they've had. So the Barclay session where they went up and they talked about their entire journey. It was a packed room, standing room only. They described their journey. And I saw other banks walk up to them and say, "We're feeling the same thing." And this is a highly competitive financial services space. >> Yeah, we had Packsotta's customer on Standard Bank. They came off about their journey, and how they're wrangling automating. Automating's the big thing. Machine learning, automation, no doubt. If people aren't looking at that, they're dead in my mind. I mean, that's what I'm seeing. >> That's right. And you have to get your house in order before you can start doing the fancy gardening. >> John: Yeah. >> And organizations aspire to do the gardening, right? >> I couldn't agree more. You got to be able to drive the car, you got to know how to drive the car if you want to actually play in this game. But it's a good example, the house. Got to get the house in order. Rooms are on fire (laughs) right? Put the fires out, retrench. That's why private cloud's kicking ass right now. I'm telling you right now. Wikibon nailed it in their true private cloud survey. No other firm nailed this. They nailed it, and it went viral. And that is, private cloud is working and growing faster than some areas because the fact of the matter is, there's some bursting through the clouds, and great use cases in the cloud. But, >> Yep. >> People have to get the ops right on premise. >> Matt: That's right, yep. >> I'm not saying on premise is going to be the future. >> Not forever. >> I'm just saying that the stack and rack operational model is going cloud model. >> Yes. >> John: That's absolutely happening, that's growing. You agree? >> Absolutely, we completely, we see that pattern over and over and over again. And it's the Goldilocks problem. There's the organizations that say, "We're never going to go cloud." There's the organizations that say, "We're going to go full cloud." For big data workloads, I think there's an intermediary for the next couple years, while we figure out operating pulse. >> This evolution, what's fun about the market right now, and it's clear to me that, people who try to get a spot too early, there's too many diseconomies of scale. >> Yep. >> Let the evolution, Kubernetes looking good off the tee right now. Docker containers and containerization in general's happened. >> Yep. >> Happening, dev ops is going mainstream. >> Yep. >> So that's going to develop. While that's developing, you get your house in order, and certainly go to the cloud for bursting, and other green field opportunities. >> Sure. >> No doubt. >> But wait until everything's teed up. >> That's right, the right workload in the right place. >> I mean Amazon's got thousands of enterprises using the cloud. >> Yeah, absolutely. >> It's not like people aren't using the cloud. >> No, they're, yeah. >> It's not 100% yet. (laughs) >> And what's the workload, right? What data can you put there? Do you know what data you're putting there? How do you secure that? And how do you do that in a repeatable way. Yeah, and you think cloud's driving the big data market right now. That's what I was saying earlier. I was saying, I think that the cloud is the unsubtext of this show. >> It's enabling. I don't know if it's driving, but it's the enabling factor. It allows for that scale and speed. >> It accelerates. >> Yeah. >> It accelerates... >> That's a better word, accelerates. >> Accelerates that horizontally scalable. Mike, thanks for coming on the CUBE. Really appreciate it. More live action we're going to have some partners on with you guys. Next, stay with us. Live in Manhattan, this is the CUBE. (electronic music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by Silicon Angle Media This is the CUBE here in Manhattan sort of the what's next? And it's interesting because the decentralize and democratize the The separation between the players And have the cloud on premise play. Right now, to me, that's the call here. the model of cloud on premise. IBM had the best, it was the fastest, So Google's TensorFlow happens to be very good. So the idea is to-- and orchestration container-based system. and deploy anywhere is the path forward, right? So let me get your So Dell EMC, so you guys have And then you can give a reaction to it. I got the security team. So that's the fear. How do you guys help that scenario? Every room in the house is on fire. And I got to get my house in order, doesn't put the fire out. the deployment of environments. How important is it to you guys And that strengthens the relationship, Well, the customer is number one to Dell EMC. and you know, we saw it in Net App. They're going to own the cloud service provider market I mean, you would have thought that a year ago, no way. build the relationship with them, You got the stack rank, you're the policies, the security, the GDP, or the regulations. What's the big trend from your standpoint? and the success that they've had. Automating's the big thing. And you have to get your house in order But it's a good example, the house. the stack and rack operational model John: That's absolutely happening, that's growing. And it's the Goldilocks problem. and it's clear to me that, Kubernetes looking good off the tee right now. and certainly go to the cloud for bursting, That's right, the right workload in the I mean Amazon's got It's not 100% yet. And how do you do that in a repeatable way. but it's the enabling factor. Mike, thanks for coming on the CUBE.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Michael	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Mike Maccaux	PERSON	0.99+
Matt Maccaux	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Matt	PERSON	0.99+
Manhattan	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
New York	LOCATION	0.99+
100%	QUANTITY	0.99+
Blue Data	ORGANIZATION	0.99+
Mike	PERSON	0.99+
Blue Talent	ORGANIZATION	0.99+
Dell EMC	ORGANIZATION	0.99+
Standard Bank	ORGANIZATION	0.99+
Big Data	ORGANIZATION	0.99+
this year	DATE	0.99+
one	QUANTITY	0.99+
VM World	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
thousands	QUANTITY	0.99+
Barclay	ORGANIZATION	0.99+
Hadoop	TITLE	0.98+
three days	QUANTITY	0.98+
decades ago	DATE	0.98+
NYC	LOCATION	0.98+
one day	QUANTITY	0.98+
one conversation	QUANTITY	0.98+
Goldilocks	PERSON	0.98+
O'Reilly	ORGANIZATION	0.98+
a year ago	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Midtown Manhattan	LOCATION	0.98+
10 years later	DATE	0.97+
TensorFlow	ORGANIZATION	0.97+
first 20 use cases	QUANTITY	0.97+
Google	ORGANIZATION	0.97+
Kelsinger	PERSON	0.97+
New York City	LOCATION	0.96+
first	QUANTITY	0.95+
VMware	ORGANIZATION	0.93+
Strata Conference	EVENT	0.93+
Big Data	EVENT	0.92+
Strata-Hadoop	EVENT	0.9+
Strata-Data	EVENT	0.9+
Number two	QUANTITY	0.9+
next couple years	DATE	0.86+
couple years ago	DATE	0.84+
2017	DATE	0.84+
Global Big Data	ORGANIZATION	0.83+
Packsotta	ORGANIZATION	0.83+
Hadoop World	ORGANIZATION	0.83+
Big Data 2.0	TITLE	0.81+
three	QUANTITY	0.79+
couple years	QUANTITY	0.76+
Big Data 1.0	TITLE	0.73+
Net App	TITLE	0.72+
2017	EVENT	0.71+
one place	QUANTITY	0.69+
number one	QUANTITY	0.67+
Kubernetes	ORGANIZATION	0.67+
enterprises	QUANTITY	0.66+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Day One Wrap | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering BigData New York City 2017. Brought to you by SiliconANGLE Media, and its ecosystem sponsors. >> Hello everyone, welcome back to our day one, at Big Data NYC, of three days of wall to wall coverage. This is theCUBE. I'm John Furrier, with my co-hosts Jim Kobielus and Peter Burris. We do this event every year, this is theCUBE's BigData NYC. It's our event that we run in New York City. We have a lot of great content, we have theCUBE going live, we don't go to Strata anymore. We do our own event in conjunction, they have their own event. You can go pay over there and get the booth space, but we do our media event and attract all the influencers, the VIPs, the executives, the entrepreneurs, we've been doing it for five years, we're super excited, and thank our sponsors for allowing us to get here and really appreciate the community for continuing to support theCUBE. We're here to wrap up day one what's going on in New York, certainly we've had a chance to check out the Strata situations, Strata Data, which is Cloudera, and O'Reilly, mainly O'Reilly media, they run that, kind of old school event, guys. Let's kind of discuss the impact of the event in context to the massive growth that's going outside of their event. And their event is a walled garden, you got to pay to get in, they're very strict. They don't really let a lot of people in, but, okay. Outside of that the event it going global, the activity around big data is going global. It's more than Hadoop, we certainly thought about that's old news, but what's the big trend this year? As the horizontally scalable cloud enters the equation. >> I think the big trend, John, is the, and we've talked about in our research, is that we have finally moved away from big data, being associated with a new type of infrastructure. The emergence of AI, deep learning, machine learning, cognitive, all these different names for relatively common things, are an indications that we're starting to move up into people thinking about applications, people thinking about services they can use to get access, or they can get access to build their applications. There's not enough skills. So I think that's probably the biggest thing is that the days of failure being measured by whether or not you can scale your cluster up, are finally behind us. We're using the cloud, other resources, we have enough expertise, the technologies are becoming simpler and more straightforward to do that. And now we're thinking about how we're going to create value out of all of this, which is how we're going to use the data to learn something new about what we're doing in the organization, combine it with advanced software technologies that actually dramatically reduce the amount of work that's necessary to make a decision. >> And the other trend I would say, on top of that, just to kind of put a little cherry on top of that, kind of the business focus which is again, not the speeds and feeds, although under the hood, lot of great innovation going on from deep learning, and there's a ton of stuff. However, the conversation is the business value, how it's transforming work and, but the one thing that nobody's talking about is, this is why I'm not bullish on these one shows, one show meets all kind of thing like O'Reilly Media does, because there's multiple personas in a company now in the ecosystem. There are now a variety of buyers of some products. At least in the old days, you'd go talk to the IT CIO and you're in. Not anymore. You have an analytics person, a Chief Data Officer, you might have an IT person, you might have a cloud person. So you're seeing a completely broader set of potential buyers that are driving the change. We heard Paxata talk about that. And this is a dynamic. >> Yeah, definitely. We see a fair amount of, what I'm sensing about Strata, how it's evolving these big top shows around data, it's evolving around addressing a broader, what we call maker culture. It's more than software developers. It's business analysts, it's the people who build the hardware for the internet of things into which AI and machine learning models are being containerized and embedded. I've, you know, one of the takeaways from today so far, and the keynotes are tomorrow at Strata, but I've been walking the atrium at the Javits Center having some interesting conversations, in addition, of course, to the ones we've been having here at theCUBE. And what I'm notic-- >> John: What are those hallway conversations that you're having? >> Yeah. >> What's going on over there? >> Yeah, what I've, the conversations I've had today have been focused on, the chief trend that I'm starting to sense here is that the productionization of the machine learning development process or pipeline, is super hot. It spans multiple data platforms, of course. You've got a bit of Hadoop in the refinery layer, you've got a bit of in-memory columnar databases, like the Act In discussed at their own, but the more important, not more important, but just as important is that what users are looking at is how can we build these DevOps pipelines for continuous management of releases of machine learning models for productionization, but also for ongoing evaluation and scoring and iteration and redeployment into business applications. You know there's, I had conversations with Mapbar, I had conversations with IBM, I mean, these were atrium conversations about things that they are doing. IBM had an announcement today on the wires and so forth with some relevance to that. And so I'm seeing a fair, I'm hearing, I'm sensing a fair amount of It's The Apps, it's more than just Hadoop. But it's very much the flow of these, these are the core pieces, like AI, core pieces of intellectual property in the most disruptive applications that are being developed these days in all manner, in business and industry in the consumer space. >> So I did not go over to the show floor yet, I've not been over to the Atrium. But, I'll bet you dollars to donuts this is indicative of something that always happens in a complex technology environment. And again, this is something we've thought about particularly talked about here on theCUBE, in fact we talked to Paxata about it a little bit as well. And that is, as an organization gains experience, it starts to specialize. But there's always moments, there' always inflection points in the process of gaining that experience. And by that, or one of the indications of that is that you end up with some people starting to specialize, but not quite sure what they're specializing in yet. And I think that's one of the things that's happening right now is that the skills gap is significant. At the same time that the skills gap is being significant, we're seeing people start to declare their specializations that they don't have skills, necessarily, to perform yet. And the tools aren't catching up. So there's still this tension model, open source, not necessarily focusing on the core problem. Skills looking for tools, and explosion in the number of tools out there, not focused on how you simplify, streamline, and put into operation. How all these things work together. It's going to be an interesting couple of years, but the good news, ultimately, is that we are starting to see for the first time, even on theCUBE interviews today, the emergence of a common language about how we think about the characteristics of the problem. And I think that that heralds a new round of experience and a new round of thinking about what is all the business analysts, the data scientists, the developer, the infrastructure person, business person. >> You know, you bring up that comment, those comments, about the specialists and the skills. We talked, Jim and I talked on the segment this morning about tool shed. We're talking about there are so many tools out there, and everyone loves a good tool, a hammer. But the old expression is if you're a hammer, everything looks like a nail, that's cliche. But what's happened is there are a plethora of tools, right, and tools are good. Platforms are better. As people start to replatformize everything they could have too many tools. So we asked the C Chief Data Officer, he goes yeah, I try to manage the tool tsunami, but his biggest issue was he buys a hammer, and it turns into a lawnmower. That's a vendor mentality of-- >> What a truck. Well, but that's a classic example of what I'm talking about. >> Or someone's trying to use a hammer to mow the lawn right? Again, so this is what you're getting at. >> Yeah! >> The companies out there are groping for relevance, and that's how you can see the pretenders from the winners. >> Well, a tool, fundamentally, is pedagogical. A tool describes the way work is going to be performed, and that's been a lot of what's been happening over the course of the past few years. Now, businesses that get more experience, they're describing their own way of thinking throughout a problem. And they're still not clear on how to bring the tools together because the tools are being generated, put into the marketplace by an expanding array of folks and companies, and they're now starting to shuffle for position. But I think ultimately, what we're going to see happen over the next year and I think this is an inflection point, going back to this big tent notion, is the idea that ultimately we are going to see greater specialization over the next few years. My guess is that this year will probably, should get better, or should get bigger, I'm not certain it will because it's focused on the problems that we already solved and not moving into the problems that we need to focus on. >> Yeah, I mean, a lot of the problems I have with the O'Reilly show is that they try to throw default leadership out there, and there's some smart people that go to that, but the problem is is that it's too monetization, they try to make too much money from the event when this action's happening. And this is where the tool becomes, the hammer becomes a lawnmower, because what's happening is that the vendor's trying to stay alive. And you mentioned this earlier, to your point, the customers that are buyers of the technology don't want to have something that's not going to be a fit, that's going to be agile from us. They don't want the hammer that they bought to turn into something that they didn't buy it for. And sometimes, teams can't make that leap, skillset-wise, to literally pivot overnight. Especially as a startup. So this is where the selection of the companies makes a big difference. And a lot of the clients, a lot of customers that we're serving on the end user side are reaching the conclusion that the tools themselves, while important, are clearly not where the value is. The value is in how they put them together for their business. And that's something that's going to have to, again, that's a maturation process, roles, responsibilities, the chief data officer, they're going to have a role in that or not, but ultimately, they're going to have to start finding their pipelines, their process for ingestion out to analysis. >> Let me get your reaction, you guys, your reactions to this tape. Because one of the things that I heard today, and I think this validates a bigger trend as we talk about the landscape of the markup from the event to how people are behaving and promoting and building products and companies. The pattern that I'm hearing, we said it multiple times on theCUBE today and one from the guy who's basically reading the script, is, in his interview, explaining 'cause it's so factual, I asked him the straight-up question, how do you deal with suppliers? What's happening is the trend is don't show me sizzle. I want to see the steak. Don't sell me hype, I got too many business things to work on right now, I need to nail down some core things. I got application development, I got security to build out big time, and then I got all those data channels that I need, I don't have time for you to sell me a hammer that might not be a hammer in the future! So I need real results, I need real performance that's going to have a business impact. That is the theme, and that trumps the hype. I see that becoming a huge thing right now. Your thoughts, reactions, guys-- >> Well I'll start-- >> What's your reaction then? True or false on the trend? Be-- >> Peter: True! >> Get down to business. >> I'll say that much, true, but go ahead. >> I'll say true as well, but let me just add some context. I think a show like O'Reilly Strata is good up to a point, especially to catalyze an industry, a growing industry like big data's own understanding of it, of the value that all these piece parts, Hadoop and Spark and so forth, can add, can provide when deployed in a unit according to some emerging patterns, whatever. But at a certain point where a space like this becomes well-established, it just becomes a pure marketing event. And customers, at a certain point say, you know, I come here for ideas about things that I can do in my environ, my business, that could actually many ways help me to do new things. You know, you can't get that at a marketing-oriented, you can get that, as a user, more at a research-oriented show. When it's an emerging market, like let's say Spark has been, like the Spark Summit was in the beginning, those are kind of like, when industries go through the phase those are sort of in the beginning, sort of research-focused shows where industry, the people who are doing the development of this new architecture, they talk ideas. Now I think in 2017, where we're at now, is what the idea is everybody's trying to get their heads around, they're all around AI, what the heck that is. For a show like an O'Reilly Ready show to have relevance in a market that's in this much ferment of really innovation around AI and deep learning, there needs to be a core research focus that you don't get at this point in the lifecycle of Strata, for example. So that's my take on what's going on. >> So, my take is this. And first of all, I agree with everything you said, so it's not in opposition to anything. Many years ago I had this thought that I think still is very true. And that is the value of industry, the value of infrastructure is inversely correlated with the degree to which anybody knows anything about it. So if I know a lot about my infrastructure, it's not creating a lot of business value. In fact, more often than not, it's not working, which is why people end up knowing more about it. But the problem is, the way that technology has always been sold is as a differentiated, some sort of value-add thing. So you end up with this tension. And this is an application domain, a very, very complex application domain like big data. The tension is, my tool is so great that, and it's differentiating all those other stuff, yeah but it becomes valuable to me if and only if nobody knows it exists. So I think, and one of the reasons why I bring this up, John, is many of the companies that are in the big data space today that are most successful are companies that are positioning themselves as a service. There's a lot of interesting SaaS applications for big data analysis, pipeline management, all the other things you can talk about, that are actually being rendered as a service, and not as a product. So that all you need to know is what the tool does. You don't need to know the tool. And I don't know that that's necessarily going to last, but I think it's very, very interesting that a lot of the more successful companies that we're talking to are themselves mere infrastructure SaaS companies. >> Because-- >> AtScale is interesting, though. They came in as a service. But their service has an interesting value proposition. They can allow you to essentially virtualize the data to play with it, so people can actually sandbox data. And if it gets traction, they can then double-down on it. So to me that's a freebie. To me, I'm a customer, I got to love that kind of environment because you're essentially giving almost a developer-like environment-- >> Peter: Value without necessarily-- >> Yeah, the cost, and the guy gets the signal from the marketplace, his customer, of what data resolves. To me that's a very cool scene. I don't, you saying that's bad, or? >> No, no, I think it's interesting. I think it's-- >> So you're saying service is-- >> So what I'm saying is, what I'm saying is, that the value of infrastructure is inversely proportional to the degree to which anybody knows anything about it. But you've got a bunch of companies who are selling, effectively, infrastructure software, so it's a value-add thing, and that creates a problem. And a lot of other companies not only have the ability to sell something as a service as opposed to a product, they can put the service froward, and people are using the service and getting what they need out of it without knowing anything about the tool. >> I like that. Let me just maybe possibly restate what you just said. When a market goes toward a SaaS go-to-market delivery model for solutions, the user, the buyer's focus is shifted away from what the solution can do, I mean, how it works under the cover. >> Peter: Quote, value-add-- >> To what it can do potentially for you. >> The business, that's right. >> But you're not going to, don't get distracted by the implementation details. You have then as a user become laser-focused on, wow, there's a bunch of things that this can do for me. I don't care how it works, really. You SaaS provider, you worry about that stuff. I can worry now about somehow extracting the value. I'm not distracted. >> This show, or this domain, is one of the domains where SaaS has moved, just as we're thinking about moving up the stack, the SaaS business model is moving down the stack in the big data world. >> All right, so, in summary, the stack is changing. Predictions for the next few days. What are we going to see come out of Strata Data, and our BigData NYC? 'Cause remember, this show was always a big hit, but it's very clear from the data on our dashboards, we're seeing all the social data. Microsoft Ignite is going on, and Microsoft Azure, just in the past few years, has burst on the scene. Cloud is sucking the oxygen out of the big data event. Or is it? >> I doubt it was sucking it out of the event, but you know, theCUBE is in, theCUBE is not at Ignite. Where's theCUBE right now? >> John: BigData NYC. >> No, it's here, but it's also at the Splunk show. >> John: That's true. >> And isn't it interesting-- >> John: We're sucking the data out of two events. >> Did a lot of people coming in, exactly. A lot of people coming-- >> We're live streaming in a streaming data kind of-- >> John just said we suck, there's that record saying that. >> We're sucking all the data. >> So we are-- >> We're sharing data. These videos are data-driven. >> Yeah, absolutely, but the point is, ultimately, is that, is that Splunk is an example of a company that's putting forward a service about how you do this and not necessarily a product focus. And a lot of the folks that are coming on theCUBE here are also going on to theCUBE down in Washington D.C., which is where the Splunk show's at. And so I think one of the things, one of the predictions I'll make, is that we're going to hear over the next couple of days more companies talk about their SaaS trash. >> Yeah, I mean I just think, I agree with you, but I also agree with the comments about the technology coming together. And here's one thing I want to throw on the table. I've gotten the sense a few times about connecting the dots on it, we'll put it out publicly for comment right now. The role that communities will play outside of developer, is going to be astronomical. I think we're seeing signals, certainly open-source communities have been around for a long time. They continue to grow shoulders of giants before them. Even these events like O'Reilly, which are a small community that they rely on is now not the only game in town. We're seeing the notion of a community strategy in things like Blockchain, you're seeing it in business, you're seeing people rolling out their recruitment to say, data scientists. You're seeing a community model developing in business, yes or no? >> Yes, but I would say, I would put it this way, John. That it's always been there. The difference is that we're now getting enough experience with things that have occurred, for example, collaboration, communal, communal collaboration in open-source software that people are now saying, and they've developed a bunch of social networking techniques where they can actually analyze how those communities work together, but now they're saying, hmm, I've figured out how to do an assessment analysis understanding that community. I'm going to see if I can take that same concept and apply it over here to how sales works, or how B-to-B engagement works, or how marketing gets conducted, or how sales and marketing work together. And they're discovering that the same way of thinking is actually very fruitful over there. So I totally agree, 100%. >> So they don't rely on other people's version of a community, they can essentially construct their own. >> They are, they are-- >> John: Or enabling their own. >> That's right, they are bringing that approach to thinking about a community-driven business and they're applying it to a lot of new ways, and that's very exciting. >> As the world gets connected with mobile and internet of things as we're seeing, it's one big online community. We're seeing things, I'm writing a post right now, what you could, what B-to-B markets should learn from the fake news problem. And that is content and infrastructure are now contextually tied together. >> Peter: Totally. >> And related. The payload of the fake news is also related to the gamification of the network effect, hence the targeting, hence the weaponization. >> Hey, we wrote the three Cs, we wrote a piece on the three Cs of strategy a year and a half ago. Content, community, context. And at the end of the day, the most important thing to what you're saying about, is that there is, you know, right now people talk about social networking. Social media, you think Facebook. Facebook is a community with a single context, stay in touch with your friends. >> Connections. >> Connections. But what you're really saying is that for the first time we're now going to see an enormous amount of technology being applied to the fullness of all the communities. We're going to see a lot more communities being created with the software, each driven by what content does, creates value, against the context of how it works, where the community's defined in terms of what do we do? >> Let me focus on the fact that bringing, using community as a framework for understanding how the software world is evolving. The software world is evolving towards, I've said this many times in my work about a resurge, the data scientists or data people, data science skills are the core developers in this new era. Now, what is data science all about at its heart? Machine learning, building, and training machine learning models. And so training machine learning models is everything towards making sure that they are fit for their predicted purpose of classification. Training data, where you get all the training data from to feed all, to train all these models? Where do you get all the human resources to label, to do the labeling of the data sets, and so forth, that you need communities, crowdsourcing and whatnot, and you need sustainable communities that can supply the data and the labeling services, and so forth, to be able to sustain the AI and machine learning revolution. So content, creating data and so forth, really rules in this new era, like-- >> The interest in machine learning is at an all-time high, I guess. >> Jim: Yeah, oh yeah, very much so. >> Got it, I agree. I think the social grab, interest grab, value grab is emerging. I think communities, content, context, communities are relevant. I think a lot of things are going to change, and that the scuttlebutt that I'm hearing in this area now is it's not about the big event anymore. It's about the digital component. I think you're seeing people recognize that, but they still want to do the face-to-face. >> You know what, that's right. That's right, they still want, let's put it this way. That there are, that the whole point of community is we do things together. And there are some things that are still easier to do together if we get together. >> But B-to-B marketing, you just can't say, we're not going to do events when there's a whole machinery behind events. Legion batch marketing, we call it. There's a lot of stuff that goes on in that funnel. You can't just say hey, we're going to do a blog post. >> People still need to connect. >> So it's good, but there's some online tools that are happening, so of course. You wanted to say something? >> Yeah, I just want to say one thing. Face to face validates the source of expertise. I don't really fully trust an expert, I can't in my heart engage with them, 'til I actually meet them and figure out in person whether they really do have the goods, or whether they're repurposing some thinking that they got from elsewhere and they gussy it up. So face, there's no substitute for face-to-face to validate the expertise. The expertise that you value enough to want to engage in your solution, or whatever it might be. >> Awesome, I agree. Online activities, the content, we're streaming the data, theCUBE, this is our annual event in New York City. We've got three days of coverage, Tuesday, Wednesday, Thursday, here, theCUBE in Manhattan, right around the corner from Strata Hadoop, the Javits Center of influencers. We're here with the VIPs, with the entrepreneurs, with the CEOs and all the top analysts from WikiBon and around the community. Be there tomorrow all day, day one wrap up is done. Thanks for watching, see you tomorrow. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media, of the event in context to the massive growth is that the days of failure being measured by of potential buyers that are driving the change. and the keynotes are tomorrow at Strata, is that the productionization of the machine learning is that the skills gap is significant. But the old expression is if you're a hammer, of what I'm talking about. Again, so this is what you're getting at. and that's how you can see the pretenders from the winners. is the idea that ultimately we are going to see And a lot of the clients, a lot of customers from the event to how people are behaving of it, of the value that all these piece parts, And that is the value of industry, So to me that's a freebie. from the marketplace, his customer, of what data resolves. I think it's-- And a lot of other companies not only have the ability for solutions, the user, the buyer's focus To what it can do by the implementation details. is one of the domains where SaaS has moved, Cloud is sucking the oxygen out of the big data event. I doubt it was sucking it out of the event, but you know, Did a lot of people coming in, exactly. We're sharing data. And a lot of the folks that are coming on theCUBE here is now not the only game in town. and apply it over here to how sales works, of a community, they can essentially construct their own. and they're applying it to a lot of new ways, from the fake news problem. hence the targeting, hence the weaponization. And at the end of the day, the most important thing We're going to see a lot more communities being created that can supply the data and the labeling services, is at an all-time high, I guess. and that the scuttlebutt that I'm hearing And there are some things that are still easier to do There's a lot of stuff that goes on in that funnel. that are happening, so of course. The expertise that you value enough to want to engage and around the community.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Jim	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
2017	DATE	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
Washington D.C.	LOCATION	0.99+
New York	LOCATION	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
two events	QUANTITY	0.99+
100%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
today	DATE	0.99+
Wednesday	DATE	0.99+
a year and a half ago	DATE	0.99+
Thursday	DATE	0.99+
one	QUANTITY	0.99+
Spark Summit	EVENT	0.99+
three days	QUANTITY	0.99+
Tuesday	DATE	0.98+
Javits Center	LOCATION	0.98+
Splunk	ORGANIZATION	0.98+
Paxata	ORGANIZATION	0.98+
Facebook	ORGANIZATION	0.98+
next year	DATE	0.97+
this year	DATE	0.97+
SaaS	TITLE	0.97+
day one	QUANTITY	0.96+
NYC	LOCATION	0.96+
first	QUANTITY	0.96+
one thing	QUANTITY	0.96+
WikiBon	ORGANIZATION	0.95+
one show	QUANTITY	0.94+
one shows	QUANTITY	0.94+
BigData	ORGANIZATION	0.94+
Many years ago	DATE	0.93+
Strata	LOCATION	0.93+
Strata Hadoop	LOCATION	0.92+
each	QUANTITY	0.91+
three Cs	QUANTITY	0.9+
Javits Center	ORGANIZATION	0.89+
midtown Manhattan	LOCATION	0.88+
theCUBE	ORGANIZATION	0.87+
Strata	TITLE	0.87+
past few years	DATE	0.87+

Murthy Mathiprakasam, - Informatica - Big Data SV 17 - #BigDataSV - #theCUBE1

(electronic music) >> Announcer: Live from San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are live in Silicon Valley for Big Data Silicon Valley. Our companion showed at Big Data NYC in conjunction with Strata Hadoop, Big Data Week. Our next guest is Murthy Mathiprakasam, with the director of product marketing Informatica. Did I get it right? >> Murthy: Absolutely (laughing)! >> Okay (laughing), welcome back. Good to see you again. >> Good to see you! >> Informatica, you guys had a AMIT on earlier yesterday, kicking off our event. It is a data lake world out there, and the show theme has been, obviously beside a ton of machine learning-- >> Murthy: Yep. >> Which has been fantastic. We love that because that's a real trend. And IOT has been a subtext to the conversation and almost a forcing function. Every year the big data world is getting more and more pokes and levers off of Hadoop to a variety of different data sources, so a lot of people are taking a step back, and a protracted view of their landscape inside their own companies and, saying, Okay, where are we? So kind of a checkpoint in the industry. You guys do a lot of work with customers, your history with Informatica, and certainly over the past few years, the change in focus, certainly on the product side, has been kind of interesting. You guys have what looks like to be a solid approach, a abstraction layer for data and metadata, to be the keys to the kingdom, but yet not locking it down, making it freely available, yet provide the governance and all that stuff. >> Murthy: Exactly. >> And my interview with AMIT laid it all out there. But the question is what are the customers doing? I'd like to dig in, if you could share just some of the best practices. What are you seeing? What are the trends? Are they taking a step back? How is IOT affecting it? What's generally happening? >> Yeah, I know, great question. So it has been really, really exciting. It's been kind of a whirlwind over the last couple years, so many new technologies, and we do get the benefit of working with a lot of very, very, innovative organizations. IOT is really interesting because up until now, IOT's always been sort of theoretical, you're like, what's the thing? >> John: Yeah. (laughing) What's this Internet of things? >> But-- >> And IT was always poo-pooing someone else's department (laughing). >> Yeah, exactly. But we have actually have customers doing this now, so we've been working with automative manufacturers on connected vehicle initiatives, pulling sensor data, been working with oil and gas companies, connected meters and connected energy, manufacturing, logistics companies, looking at putting meters on trucks, so they can actually track where all the trucks are going. Huge cost savings and service delivery kind of benefits from all this stuff, so you're absolutely right IOT, I think is finally becoming real. And we have a streaming solution that kind of works on top of all the open source streaming platforms, so we try to simplify everything, just like we have always done. We did that MapReduce, with Spark, now with all the streaming technologies. You gave a graphical approach where you can go in and say, Well, here's what the kind of processing we want. You'd lay it out visually and it executes in the Hadoop cluster. >> I know you guys have done a great job with the product, it's been very complimentary you guys, and it's almost as if there's been an transformation within Informatica. And I know you went private and everything, but a lot of good product shops there. You guys got a lot good product guys, so I got to ask you the question, I don't see IOT sometimes as an operational technology component, usually running their own stacks, not even plugged into IT, so that's the whole another story. I'll get to that in a second. But the trend here is you have the batch world, companies that have been in this ecosystem here that are on the show floor, at O'Reilly Media, or talking to us on The Cube. Some have been just pure play batch-related! Then the fashionable steaming technologies have come out, but what's happened with Spark, you're starting to see the collision between batch and realtime-- >> Umm-hmm. >> Called streaming or what not. And at the center of that's the deep learning, it's the IOT, and it's the AI, that's going to be at the intersection of these two colliding forces, so you can't have a one-trick pony here and there. You got to kind of have a blended, more of a holistic, horizontal, scalable approach. >> Murthy: Yes. >> So I want to get your reaction to that. And two, what product gaps and organizational gaps and process gaps emerge from this trend? And what do you guys do? So, three-part question. >> Murthy: Yeah (laughing). >> Go ahead. Go ahead. >> I'll try to cover all three. >> So, first, the collision and your reaction to that trend. >> Murthy: Yeah, yeah. >> And then the gaps. >> Absolutely. So basically, you know Informatica, we've supported every type of kind of variation of these type of environments, and so we're not really a believer in it's this or that. It's not on premise or cloud, it's not realtime or batch. We want to make it simple and no matter how you want to process the data, or where you want to process it. So customers who use our platform for their realtime or streaming solutions, are using the same interface, as if they were doing it batched. We just run it differently under the hood. And so, that simplifies and makes a lot of these initiatives more practical because you might start with a certain latency, and you think maybe it's okay to do it at one speed. Maybe you decide to change. It could be faster or slower, and you don't have to go through code rewrites and just starting completely from scratch. That's the benefit of the abstraction layer, like you were saying. And so, I think that's one way that organizations can shield themselves from the question because why even pose that question in the first... Why is it either this or that? Why not have a system that you can actually tune and maybe today you want to start batch, and tomorrow you evolve it to be more streaming and more realtime. Help me on the-- >> John: On the gaps-- >> Yes. >> Always product gaps because, again, you mentioned that you're solving it, and that might be an integration challenge for you guys. >> Yep. >> Or an integration solution for you guys, challenge, opportunity, whatever you guys want to call it. >> Absolutely! >> Organizational gaps maybe not set up for and then processed. >> Right. I think it was interesting that we actually went out to dinner with a couple of customers last night. And they were talking a lot about the organizational stuff because the technology they're using is Informatica, so that's part's easy. So, they're like, Okay, it's always the stuff around budgeting, it's around resourcing, skills gap, and we've been talking about this stuff for a long time, right. >> John: Yeah. >> But it's fascinating, even in 2017, it's still a persistent issue, and part of what their challenge was is that even the way IT projects have been funded in the past. You have this kind of waterfall-ish type of governance mechanism where you're supposed to say, Oh, what are you going to do over the next 12 months? We're going to allocate money for that. We'll allocate people for that. Like, what big data project takes 12 months? Twelve months you're going to have a completely (laughing) different stack that you're going to be working with. And so, their challenge is evolving into a more agile kind of model where they can go justify quick-hit projects that may have very unknown kind of business value, but it's just getting by in that... Hey, sometime might be discovered here? This is kind of an exploration-use case, discovery, a lot of this IOT stuff, too. People are bringing back the sensor data, you don't know what's going to coming out of that or (laughing)-- >> John: Yeah. >> What insights you're going to get. >> So there's-- >> Frequency, velocity, could be completely dynamic. >> Umm-hmm. Absolutely! >> So I think part of the best practice is being able to set outside of this kind of notion of innovation where you have funding available for... Get a small cross-functional team together, so this is part of the other aspect of your question, which is organizationally, this isn't just IT. You got to have the data architects from IT, you got to have the data engineers from IT. You got to have data stewards from the line of business. You got business analysts from the line of business. Whenever you get these guys together-- >> Yeah. >> Small core team, and people have been talking about this, right. >> John: Yeah. >> Agile development and all that. It totally applies to the data world. >> John: And the cloud's right there, too, so they have to go there. >> Murthy: That's right! Exactly. So you-- >> So is the 12-month project model, the waterfall model, however you want... maybe 24 months more like it. But the problem on the fail side there is that when they wake up and ship the world's changed, so there's kind of a diminishing return. Is that kind of what you're getting out there on that fail side? >> Exactly. It's all about failing fast forward and succeeding very quickly as well. And so, when you look at most of the successful organizations, they have radically faster project lifecycles, and this is all the more reason to be using something like Informatica, which abstracts all the technology away, so you're not mired in code rewrites and long development cycles. You just want to ship as quickly as possible, get the organization by in that, Hey, we can make this work! Here's some new insights that we never had before. That gets you the political capital-- >> John: Yeah. >> For the next project, the next project, and you just got to keep doing that over and over again. >> Yeah, yeah. I always call that agile more of a blank check in a safe harbor because, in case you fail forward, (laughing) I'm failing forward. (laughing) You keep your job, but there's some merit to that. But here's the trick question for you: Now let's talk about hybrid. >> Umm-hmm. >> On prem and cloud. Now, that's the real challenge. What are you guys doing there because now I don't want to have a job on prem. I don't want to have a job on the cloud. That's not redundancy, that's inefficient, that's duplicates. >> Yes. >> So that's an issue. So how do you guys tee it up there for the customer? And what's the playbook for them, and people who are trying to scratching their heads saying, I want on prem. And Oracle got this right. Their earnings came out pretty good, same code on prem, off prem, same code base. So workloads can move depending upon the use cases. >> Yep. >> How do you guys compare? >> Actually that's the exact same approach that we're taking because, again, it's all about that customer shouldn't have to make the either or-- >> So for you guys, interfacing code same on prem and cloud. >> That's right. So you can run our big data solutions on Amazon, Microsoft, any kind of cloud Hadoop environment. We can connect to data sources that are in the cloud, so different SAAS apps. >> John: Umm-hmm. >> If you want to suck data out of there. We got all the out-of-the-box connectivity to all the major SAAS applications. And we can also actually leverage a lot of these new cloud processing engines, too. So we're trying to be the abstraction layer, so now it's not just about Spark and Spark streaming, there's all these new platforms that are coming out in the cloud. So we're integrating with that, so you can use our interface and then push down the processing to a cloud data processing system. So there's a lot of opportunity here to use cloud, but, again, we don't want to be... We want to make things more flexible. It's all about enabling flexibility for the organization. So if they want to go cloud, great. >> John: Yep. >> There's plenty of organizations that if they don't want to go cloud, that's fine, too. >> So if I get this right, standard interface on prem and cloud for the usability, under the hood it's integration points in clouds, so that data sources, whatever they are and through whatever could be Kinesis coming off Amazon-- >> Exactly! >> Into you guys, or Ah-jahs got some stuff-- >> Exactly! >> Over there, That all works under the hood. >> Exactly! >> Abstracts from the user. >> That's right! >> Okay, so the next question is, okay, to go that way, that means it's a multicloud world. You probably agree with that. Multicloud meaning, I'm a customer. I might have multiple workloads on multiple clouds. >> That's where it is today. I don't know if that's the endgame? And obviously all this is changing very, very quickly. >> Okay (laughing). >> So I mean, Informatica we're neutral across multiple vendors and everything. So-- >> You guys are Switzerland. >> We're the Switzerland (laughing), so we work with all the major cloud providers, and there's new one that we're constantly signing up also, but it's unclear how the market rule shipped out. >> Umm-hmm. >> There's just so much information out there. I think it's unlikely that you're going to see mass consolidation. We all know who the top players are, and I think that's where a lot of large enterprises are investing, but we'll see how things go in the future, too. >> Where should customers spend their focus because this you're seeing the clouds. I was just commenting about Google yesterday, with AMIT, AI, and others. That they're to be enterprise-ready. You guys are very savvy in the enterprising, there's a lot of table stakes, SLAs to integration points, and so, there's some clouds that aren't ready for prime time, like Google for the enterprise. Some are getting there fast like Amazon Ah-jahs super enterprise-friendly. They have their own problems and opportunities. But they are very strong on the enterprise. What do you guys advise customers? What are they looking at right now? Where should they be spending their time, writing more code, scripts, or tackling the data? How do you guys help them shift their focus? >> Yeah, yeah! >> And where-- >> And definitely not scripts (laughing). >> It's about the worst thing you can do because... And it's all for all the reasons we understand. >> Why is that? >> Well, again, we we're talking about being agile. There's nothing agile about manually sitting there, writing Java code. Think about all the developers that were writing MapReduce code three or four years ago (laughing). Those guys, well, they're probably looking for new jobs right now. And with the companies who built that code, they're rewriting all of it. So that approach of doing things at the lowest possible level doesn't make engineering sense. That's why the kind of abstraction layer approach makes so much better sense. So where should people be spending their time? It's really... The one thing technology cannot do is it can't substitute for context. So that's business context, understanding if you're in healthcare there's things about the healthcare industry that only that healthcare company could possibly know, and know about their data, and why certain data is structured the way it is. >> John: Yeah. >> Or financial services or retail. So business context is something that only that organization can possibly bring to the table, and organizational context, as you were alluding to before, roles and responsibilities, who should have access to data, who shouldn't have access to data, That's also something that can be prescribed from the outside. It's something that organizations have to figure out. Everything else under the hood, there's no reason whatsoever to be mired in these long code cycles. >> John: Yeah. >> And then you got to rewrite it-- >> John: Yeah. >> And you got to maintain it. >> So automation is one level. >> Yep. >> Machine learning is a nice bridge between the taking advantage of either vertical data, or especially, data for that context. >> Yep. >> But then the human has to actually synthesize it. >> Right! >> And apply it. That's the interface. Did I get that right, that progression? >> Yeah, yeah. Absolutely! And the reason machine learning is so cool... And I'm glad you segway into that. Is that, so it's all about having the machine learning assist the human, right. So the humans don't go away. We still have to have people who understand-- >> John: Okay. >> The business context and the organizational context. But what machine learning can do is in the world of big data... Inherently, the whole idea of big data is that there's too much data for any human to mentally comprehend. >> John: Yeah. >> Well, you don't have to mentally comprehend it. Let the machine learning go through, so we've got this unique machine learning technology that will actually scan all the data inside of Hadoop and outside of Hadoop, and it'll identify what the data is-- >> John: Yeah. >> Because it's all just pattern matching and correlations. And most organizations have common patterns to their data. So we figured up all this stuff, and we can say, Oh, you got credit card information here. Maybe you should go look at that, if that's not supposed to be there (laughing). Maybe there's a potential violation there? So we can focus the manual effort onto the places where it matters, so now you're looking at issues, problems, instead of doing the day-to-day stuff. The day-to-day stuff is fully automated and that's not what organizations-- >> So the guys that are losing their jobs, those Java developers writing scripts, to do the queries, where should they be focusing? Where should they look for jobs? Because I would agree with you that their jobs would be because the the MapReduce guys and all the script guys and the Java guys... Java has always been the bulldozer of the programming language, very functional. >> Murthy: Yep. >> But where those guys go? What's your advice for... We have a lot of friends, I'm sure you do, too. I know a lot of friends who are Java developers who are awesome programmers. >> Yeah. >> Where should they go? >> Well, so first, I'm not saying that Java's going to go away, obviously (laughing). But I think Java-- >> Well, I mean, Java guys who are doing some of the payload stuff around some of the deep--- >> Exactly! >> In the bowels of big data. >> That's right! Well, there's always things that are unique to the organization-- >> Yeah. >> Custom applications, so all that stuff is fine. What we're talking about is like MapReduce coding-- >> Yeah, what should they do? What should those guys be focusing on? >> So it's just like every other industry you see. You go up the value stack, right. >> John: Right. >> So if you can become more of the data governor, the data stewards, look at policy, look at how you should be thinking about organizational context-- >> John: And governance is also a good area. >> And governance, right. Governance jobs are just going to explode here because somebody has to define it, and technology can't do this. Somebody has to tell the technology what data is good, what data is bad, when do you want to get flagged if something is going wrong, when is it okay to send data through. Whoever decides and builds those rules, that's going to be a place where I think there's a lot of opportunities. >> Murthy, final question. We got to break, we're getting the hook sign here, but we got Informatica World coming up soon in May. What's going to be on the agenda? What should we expect to hear? What's some of the themes that you could tease a little bit, get people excited. >> Yeah, yeah. Well, one thing we want to really provide a lot of content around the journey to the cloud. And we've been talking today, too, there's so many organizations who are exploring the cloud, but it's not easy, for all the reasons we just talked about. Some organizations want to just kind of break away, take out, rip out everything in IT, move all their data and their applications to the cloud. Some of them are taking more of a progressive journey. So we got customers who've been on the leading front of that, so we'll be having a lot of sessions around how they've done this, best practices that they've learned. So hopefully, it's a great opportunity for both our current audience who's always looked to us for interesting insights, but also all these kind of emerging folks-- >> Right. >> Who are really trying to figure out this new world of data. >> Murthy, thanks so much for coming on The Cube. Appreciate it. Informatica World coming up. You guys have a great solution, and again, making it easier (laughing) for people to get the data and put those new processes in place. This is The Cube breaking it down for Big Data SV here in conjunction with Strata Hadoop. I'm John Furrier. More live coverage after this short break. (electronic music)

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, Did I get it right? Good to see you again. and the show theme has been, So kind of a checkpoint in the industry. What are the trends? over the last couple years, John: Yeah. And IT was always poo-pooing and it executes in the Hadoop cluster. so I got to ask you the question, and it's the AI, And what do you guys do? Go ahead. So, first, the collision and you don't have to and that might be an integration for you guys, not set up for and then processed. it's always the stuff around is that even the way IT could be completely dynamic. Umm-hmm. from the line of business. and people have been and all that. John: And the cloud's right there, too, So you-- So is the 12-month project model, at most of the successful organizations, and you just got to keep doing But here's the trick question for you: Now, that's the real challenge. So how do you guys So for you guys, sources that are in the cloud, the processing to a cloud that if they don't want to go cloud, That all works under the hood. Okay, so the next question I don't know if that's the endgame? So I mean, Informatica We're the Switzerland (laughing), go in the future, too. Google for the enterprise. And it's all for all the Think about all the from the outside. is a nice bridge between the has to actually synthesize it. That's the interface. So the humans don't go away. and the organizational context. Let the machine learning go through, instead of doing the day-to-day stuff. So the guys that are losing their jobs, I'm sure you do, too. going to go away, obviously (laughing). so all that stuff is fine. So it's just like every John: And governance that's going to be a place where I think What's some of the themes that you could for all the reasons we just talked about. to figure out this new world of data. get the data and put those

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2017	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Murthy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AMIT	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Twelve months	QUANTITY	0.99+
Java	TITLE	0.99+
Informatica	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
24 months	QUANTITY	0.99+
May	DATE	0.99+
tomorrow	DATE	0.99+
yesterday	DATE	0.99+
Google	ORGANIZATION	0.99+
Spark	TITLE	0.99+
first	QUANTITY	0.99+
last night	DATE	0.99+
today	DATE	0.98+
Murth	PERSON	0.98+
Informatica World	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
two	QUANTITY	0.98+
three-part	QUANTITY	0.98+
three	QUANTITY	0.98+
both	QUANTITY	0.97+
three	DATE	0.96+
NYC	LOCATION	0.96+
Big Data Week	EVENT	0.96+
one level	QUANTITY	0.96+
one	QUANTITY	0.96+
one speed	QUANTITY	0.96+
two colliding forces	QUANTITY	0.95+
one-trick	QUANTITY	0.93+
MapReduce	TITLE	0.93+
one way	QUANTITY	0.93+
four years ago	DATE	0.92+
#BigDataSV	TITLE	0.91+
Kinesis	ORGANIZATION	0.87+
The Cube	ORGANIZATION	0.86+
MapReduce	ORGANIZATION	0.85+
agile	TITLE	0.84+
Big Data	ORGANIZATION	0.81+

Scott Gnau, Hortonworks Big Data SV 17 #BigDataSV #theCUBE

>> Narrator: Live from San Jose, California it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly Strata Hadoop, of course we have our Big Data NYC event and we have our special popup event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier, with my co-host Jeff Frick and our next guest is Scott Gnau, CTO of Hortonworks. Great to have you on, good to see you again. >> Scott: Thanks for having me. >> You guys have an event coming up in Munich, so I know that there's a slew of new announcements coming up with Hortonworks in April, next month in Munich for your EU event and you're going to be holding a little bit of that back, but some interesting news this morning. We had Wei Wang yesterday with Microsoft Azure team HDInsight's. That's flowering nicely, a good bet there, but the question has always been at least from people in the industry and we've been questioning you guys on, hey, where's your cloud strategy? Because as a disture you guys have been very successful with your always open approach. Microsoft as your guy was basically like, that's why we go with Hortonworks because of pure open source, committed to that from day one, never wavered. The question is cloud first, AI, machine learning this is a sweet spot for IoT. You're starting to see the collision between cloud and data, and in the intersection of that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this. Your thoughts and your cloud strategy. >> Obviously we see cloud as an enabler for these use cases. In many instances the use cases can be femoral. They might not be tied immediately to an ROI, so you're going to go to the capital committee and all this kind of stuff, versus let me go prove some value very quickly. It's one of the key enablers core ingredients and when we say cloud first, we really mean it. It's something where the solutions work together. At the same time, cloud becomes important. Our cloud strategy and I think we've talked about this in many different venues is really twofold. One is we want to give a common experience to our customers across whatever footprint they chose, whether it be they roll their own, they do it on print, they do it in public cloud and they have choice of different public cloud vendors. We want to give them a similar experience, a good experience that is enterprise great, platform level experience, so not point solution kind of one function and then get rid of it, but really being able to extend the platform. What I mean by that of course, is being able to have common security, common governance, common operational management. Being able to have a blueprint of the footprint so that there's compatibility of applications that get written. And those applications can move as they decide to change their mind about where their platform hosting the data, so our goal really is to give them a great and common experience across all of those footprints number one. Then number two, to offer a lot of choices across all of those domains as well, whether it be, hey I want to do infrastructure as a service and I know what I want on one end of the spectrum to I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly. Boom, here's a platform as a service offer that runs and is available very easy to consume, comes preconfigured and kind of everywhere in between. >> By the way yesterday Wei was pointing out 99.99 SLAs on some of the stuff coming out. >> Are amazing and obviously in the platform as a service space, you also get the benefit of other cloud services that can plug in that wouldn't necessarily be something you'd expect to be typical of a core Hadoop platform. Getting the SLAs, getting the disaster recovery, getting all of the things that cloud providers can provide behind the scenes is some additional upside obviously as well in those deployment options. Having that common look and feel, making it easy, making it frictionless, are all of the core components of our strategy and we saw a lot of success with that in coming out of year end last year. We see rapid customer adoption. We see rapid customer success and frankly I see that I would say that 99.9% of customers that I talk to are hybrid where they have a foot in nonprem and they have a foot in cloud and they may have a foot in multiple clouds. I think that's indicative of what's going on in the world. Think about the gravity of data. Data movement is expensive. Analytics and multi-core chipsets give us the ability to process and crunch numbers at unprecedented rates, but movement of data is actually kind of hard. There's latency, it can be expensive. A lot of data in the future, IoT data, machine data is going to be created and live its entire lifecycle in the cloud, so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire lifecycle outside the four walls of the data center. >> You guys really did a good job I thought on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time you guys really nailed that and you also had the IoT edge in mind, we've talked I think two years ago and this was really not on everyone's radar, but you guys saw that, so you've made some good bets on the HDInsight and we talked about that yesterday with Wei on here and Microsoft. So edge analytics and data in motion a very key right now, because that batch streaming world's coming together and IoTs flooding it with all this kind of data. We've seen the success in the clouds where analytics have been super successful with powering by the clouds. I got to ask you with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion, specifically IoT too. It's the common question we're getting, not necessarily the Microsoft question, but okay I've got edge coming in strong-- >> Scott: Mm-hmm >> and I'm going to run certainly hybrid in a multi cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal with the edge? >> Wow, there's a lot there (laughs) >> John: You got 10 seconds, go! (laughs) You have Microsoft as your premier cloud and you have an Amazon relationship with a marketplace and what not. You've got a great relationship with Microsoft. >> Yeah. I think it boils down to a bigger macro thing and hopefully I'll peel into some specifics. I think number one, we as an industry kind of short change ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than but certainly than, right, and this is where we started with the whole connected platforms indicating of traditional Hadoop comes from traditional thinking of data at rest. So I've got some data, I've stored it and I want to run some analytics and I want to be able to scale it and all that kinds of stuff. Really good stuff, but only part of the issue. The other part of the issue is data that's moving, data that's being created outside of the four walls of the data center. Data that's coming from devices. How do I manage and move and handle all of that? Of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future. We said look it's really about the entire lifecycle of data from inception to demise of the data or data being delayed, delete it, which very infrequently happens these days. >> Or cold storage-- >> Cold storage, whatever. You know it's created at the edge, it moves through, it moves in different places, its landed, its analyzed, there are models built. But as models get deployed back out to the edge, that entire problem set is a problem set that I think we, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints because when you think about a customer that may have multiple cloud footprints and trying to tie the data together, it creates a unique opportunity, I think there's a reversal in the way people need to think about the future of compute. Where having been around for a little bit of time, it's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center, data are going to be distributed and data movement will become the expensive thing, so it will be very important to be able to have applications that are deployable across a grid, and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability five years from now, ten years from now, it's not going to be about how many exabytes I have in my cloud instance, that will be part of it, it will be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere in between. >> It's totally radical, but it's also innovative. You mentioned the cost of moving data will be the issue. >> Scott: Yeah. >> So that's going to change the architecture of the edge. What are you seeing with customers, cuz we're seeing a lot of people taking a protracted view like you were talking about and looking at the architectures, specifically around okay. There's some pressure, but there's no real gun to the head yet, but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you can share, anecdotal stories, customer references. >> You know the common thing is that customers go, "Yep, that's going to be interesting. "It's not hitting me right now, "but I know it's going to be important. "How can I ease into it and kind of without the suspenders "how can I prove this is going to work and all that." We've seen a lot of certainly interest in that. What's interesting is we're able to apply some of that futuristic IoT technology in Hortonworks data flow that includes NiFi and MiNiFi out to the edge to traditional problems like, let me get the data from the branches into the central office and have that roundtrip communication to a banker who's talking to a customer and has the benefit of all the analytics at home, but I can guarantee that roundtrip of data and analytics. Things that we thought were solid before, can be solved very easily and efficiently with this technology, which is then also extensible even out further to the edge. In many instances, I've been surprised by customer adoption with them saying, "Yeah, I get that, but gee this helps me "solve a problem that I've had for the last 20 years "and it's very easy and it sets me up "on the right architectural course, "for when I start to add in those edge devices, "I know exactly how I'm going to go do it." It's been actually a really good conversation that's very pragmatic with immediate ROI, but again positioning people for the future that they know is coming. Doing that, by the way, we're also able to prove the security. Think about security is a big issue that everyone's talking about, cyber security and everything. That's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices are now outside that fence, so security and privacy and provenance become really, really interesting in that world. It's been gratifying to be able to go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands it. >> That's such great validation when they come back to you with a different solution based on what you just proposed. >> Scott: Yep. >> That means they really start to understand, they really start to see-- >> Scott: Yep. >> How it can provide value to them. >> Absolutely, absolutely. That is all happening and again like I said this I think the notion of the bigger problem set, where it's not just storing data and analyzing data, but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation. The future successful deployments out there because those deployments and folks are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer, not waiting for that roundtrip of really being able to push out customized, tailored interactions, whether it be again if it's driving your car and stopping on time, which is kind of important, to getting a coupon when you're walking past a store and anywhere in between. >> It's good you guys have certainly been well positioned for being flexible, being an open source has been a great advantage. I got to ask you the final question for the folks watching, I'm sure you guys answer this either to investors or whatnot and customers. A lot's changed in the past five years and a lot's happening right now. You just illustrated it out, the scenario with the edge is very robust, dynamic, changing, but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you thinks worth highlighting to people watching that are your customers, investors, or people in the industry. >> I think you brought up a good point, the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we've got to go solve, that's kind of one piece of this storm. The other piece of the storm is the adoption and the wave of open, open community collaboration of developers versus integrated silo stacks of software. That's manifesting itself in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world. That's one requirement slash advantage of being in the open world. I think the other thing that's happening is the generation of workforce. When I think about when I got my first job, I typed a resume with a typewriter. I'm dating myself. >> White out. >> Scott: Yeah, with white out. (laughter) >> I wasn't a good typer. >> Resumes today is basically name and get GitHub address. Here's my body of work and it's out there for everybody to see, and that's the mentality-- >> And they have their cute videos up there as well, of course. >> Scott: Well yeah, I'm sure. (laughter) >> So it's kind of like that shift to this is now the new paradigm for software delivery. >> This is important. You've got theCUBE interview, but I mean you're seeing it-- >> Is that the open source? >> In the entertainment. No, we're seeing people put huge interviews on their LinkedIn, so this notion of collaboration in the software engineering mindset. You go back to when we grew up in software engineering, now it went to open source, now it's GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts, they apply to data engineering, data science is still early days. Media media creation what not so, I think that's a really key point in the data science tools are still in their infancy. >> I think open, and by the way I'm not here to suggest that everything will be open, but I think a majority and-- >> Collaborative the majority of the problem that we're solving will be collaborative, it will be ecosystem driven and where there's an extremely large market open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. >> Yep. You guys are all on the cloud now, you got the Microsoft, any other updates that you think worth sharing with folks. >> You've got to come back and see us in Munich then. >> Alright. We'll be there, theCUBE will be there in Munich in April. We have the Hortonworks coverage going on in Data Works, the conference is now called Data Works in Munich. This is theCUBE here with Scott Gnau, the CTO of Hortonworks. Breaking it down I'm John Furrier with Jeff Frick. More coverage from Big Data SV in conjunction with Strata Hadoop after the short break. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big good to see you again. and in the intersection of blueprint of the footprint on some of the stuff coming out. of customers that I talk to are hybrid I got to ask you with Microsoft and you have an Amazon relationship of the data center. and be able to be selective You mentioned the cost of and looking at the architectures, and has the benefit on what you just proposed. and further out to the edge I got to ask you the final and the whole groundswell Scott: Yeah, with white out. and that's the mentality-- And they have their cute videos Scott: Well yeah, I'm sure. So it's kind of like that shift to but I mean you're seeing it-- in the data science tools the majority of the you got the Microsoft, You've got to come back We have the Hortonworks

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
New York	LOCATION	0.99+
Munich	LOCATION	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
April	DATE	0.99+
yesterday	DATE	0.99+
10 seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
99.99	QUANTITY	0.99+
two places	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
first job	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
next month	DATE	0.99+
two years ago	DATE	0.98+
today	DATE	0.98+
99.9%	QUANTITY	0.98+
ten years	QUANTITY	0.97+
Big Data	EVENT	0.97+
five years	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.96+
this morning	DATE	0.95+
O'Reilly Strata Hadoop	ORGANIZATION	0.95+
One	QUANTITY	0.95+
Data Works	EVENT	0.94+
year end last year	DATE	0.94+
one	QUANTITY	0.93+
Hadoop	TITLE	0.93+
theCUBE	ORGANIZATION	0.93+
one piece	QUANTITY	0.93+
Wei Wang	PERSON	0.91+
NYC	LOCATION	0.9+
Wei	PERSON	0.88+
past five years	DATE	0.87+
first	QUANTITY	0.86+
CTO	PERSON	0.83+
four walls	QUANTITY	0.83+
Big Data SV	ORGANIZATION	0.83+
#BigDataSV	EVENT	0.82+
one function	QUANTITY	0.81+
Big Data SV 17	EVENT	0.78+
EU	LOCATION	0.73+
HDInsight	ORGANIZATION	0.69+
Strata Hadoop	PERSON	0.69+
one requirement	QUANTITY	0.68+
number two	QUANTITY	0.65+

Brad Tewksbury, Oracle - On the Ground - #theCUBE

>> Announcer: theCUBE presents On the Ground. (light electronic music) >> Hello everyone, welcome to this special exclusive On the Ground Cube coverage here at Oracle's Headquarters. I'm John Furier the host of theCUBE, I'm here with my guest, Brad Tewksbury, who's the Senior Director of Business Development for the big data team at Oracle, welcome to On the Ground. >> Thank you, John, good to be here. >> So big day, Brad, you've been in this industry for a long time, you've seen the waves come and go. Certainly at Oracle you've been here for many, many years. >> Yeah. >> Oracle's transforming as as a company and you've been watching it play out. >> Brad: Yeah. >> What is the big thing that's most notable to you that you could illustrate that kind of highlights the Oracle transformation in terms of where it's come from? Obviously the database is the crown jewel, but this big data stuff that you're involved in is really transformative and getting tons of traction. With the Cloud Machine kind of tying in, is this kind of a similar moment for Oracle? Share some thoughts there. >> Yeah I think there's many, if you look at the data management path from going back to client server to where we are today, data has always played a pivotal role, but I would say now every customer is going through this decision making process where they're saying, "Ah-ha data I'm being disrupted by all different companies." Before it was you know, okay I got my data in a database and I do some reporting on it and I can run my business, but it wasn't like I was going to be disrupted by some digital company tomorrow. >> Cause the apps and the databases were kind of tied together. >> They were tied together and things just didn't move as fast as they do today. Now it's in these digital-only companies, they realize that data is their business, right? I think one of the pivotal things that we've been doing some studies with MIT is that 84% of the SMP value of some of these companies comes from companies that have no assets, right? Just data, so like UBER doesn't own any taxis. Airbnb doesn't own any hotels, yet they've got massive valuation, so companies are starting to freak out a little bit and they're starting to say, "Oh my god, I got to leverage my data." So the seminal moment here is saying, "How do I monetize my data?" Before it wasn't this urgency, now there's a sense of like I got to do something with this data, but the predicament they're in is, especially these legacy companies is they've got silos of stuff that's not talking to each other, it's all on different versions and different vendors. >> Well, Oracle's always been in the database business, so you made money by creating software to store data. >> Brad: Right. >> Now it sounds like there's a business model for moving the data around, is that kind of what I'm getting here? So it's not just storing the data software, store the data, it's software to make the data. >> Brad: Yeah. >> Accessible. Yeah, it's three things, I think it's three things. It's ingesting the data, right, from new sources outside of the company, so sensors and social media, right that's one thing. Secondly, it's then managing the data, which we've always done, and then the third thing is analyzing it, so it's that whole continuation and then what's happened here is the management platform is expanded. It's gone from just a relational base to this whole SEQUEL world and this Hadoop world, which we completely support. By no means is this relational a zero-sum game, where it's relational or nothing at all, it's we've expanded the whole data management platform to meet the criteria of whatever the application is and so these are the three data management platforms today, who knows what's going to come tomorrow, we'll support that as well, but the idea is choose the right platform for the application and what's really becoming about is applications, right? And this data management stuff is obviously table stakes, but how do I make my applications dynamic and real-time based on what I have here? >> Four years ago, and CUBE audience will remember, we did theCUBE in Hadoop World, that's called back then before it became Strata Hadoop and O'Reilly and Cloudera Show, but Mike Olson and Ping Lee said, "Oh we have a big data fund," so they thought there was going to be a tsunami of apps, never really happened. Certainly Hadoop didn't become as big as people had thought, but yet Analytics rose up, Analytics became the killer app. >> Brad: Yeah. >> But now we're beyond Analytics. >> Brad: Yeah. >> The use of data for insights, where are the apps coming from now? You had Rocana, here we had Win Disk Scope providing some solutions, where do you guys see the apps coming from? Obviously Oracle has their own set of apps, but outside of Oracle, where are the apps? >> So yeah, it's an interesting phenomena, right? Everyone thought Hadoop is the next great wave and the reality is if you go talk to customers and they're like, "Yeah, I've heard of it, but what do I do with it?" So it's like apps are like what's going to drive this whole stack forward and to that end, the number one thing that people are looking for is 360 view of customers, they all want to know more about customer. I was talking with a customer who represents the equivalent of the Tax Bureau of their county and instead of putting the customer, it's the taxpayer or the customer's at the center and all the different places that you pay taxes, so they want to have one view of you as the taxpayer, so whether you're public entity, private, the number one thing that the apps that people are looking for is show me more about customer. If I'm a bank, a retail, they want to cross-sell that's the number one app. In telcos, they want to know about networking. How do I get this network? I want to understand what's going on here so I can better support my Support Center, but secondary to that we're in this kind of holding pattern. Now what are the next set of apps and so there's a bunch of start-ups here in Silicon Valley that are thinking they have the answer for that and we're partnering with them and opening up a Cloud Marketplace to bring them in and we'll let customers decide who's going to win this. >> Talk about Rocana and their value proposition, they're here talking to us today, what's the deal with Rocana? >> So Rocana is an interesting play, what they have found is that customers, one of the ways they talk about themselves, is they offer a data warehouse to IT. So if I'm the IT guy, I want to go in and have basically a pool of all kinds of log analysis. How's my apps running, do I need to tune the apps? How's the network running, they want a one bucket of how can my operation perform better? So what we've seen from customers is they've come to us and they've said, "okay, what have you got in this new space "of Hadoop that can do that?" Look at log analysis and all kinds of app performances from a Hadoop perspective. They were one of the people, the first persons to answer that, so they're having great success finding out where security breaches are, finding out where network latencies are, better like I said, looking at logs and how things co6uld run better, so that's what they're answering for customers is basically improving IT functions, right, because what's happening is a lot of business people are in charge, right, and they're saying, "I no longer want "to go to IT for everything, I want to be able to just go to basically a data model and do my own analysis of this, "I don't want to have to call IT for everything." So these guys in some way are trying to help that manta. >> Talk about Win Disk Scope, what are they talking about here and how is their relationship with Oracle? They're speaking w6ith us today as well. >> Yeah, so you know, in this big data world what we're seeing a lot of is customers doing a lot of what we call a lab experiment. So they got all this data and they want to do lab experiments, okay great. So then they find this nugget of okay, here's a great data model, we want to do some analysis on this, so let's turn it into a production app. Okay, then what do you do, how do you take it to production? These are the guys that you would call. So they take it into an HA high-availability environment for you and they give you zero data loss, zero down time to do that. One of the things that Oracle's, we're touting is the differentiator in our Cloud is this hybrid approach where you have, you know, you could start out doing test-dev in the Cloud, bring it back on Primm, vice versa, they allow you to do that sync, that link between the Cloud and on Primm. We work today with Cloud Air, we OEM them in our big data appliance, if the customer has Hortonworks, but they also want to work with our stuff, their go-between with that as well. So it's basically they're giving you that production-ready environment that you need in an HA world. >> Brad, thanks for spending some time with us here On the Ground, really appreciate it. >> Yeah. >> I'm John Furier, we're here exclusively On the Ground here at Oracle Headquarters, thanks for watching. (light electronic music)

Published Date : Sep 6 2016

SUMMARY :

(light electronic music) for the big data team at Oracle, welcome to On the Ground. So big day, Brad, you've been in this industry and you've been watching it play out. What is the big thing that's most notable to you from going back to client server to where we are today, So the seminal moment here is saying, Well, Oracle's always been in the database business, So it's not just storing the data software, store the data, is the management platform is expanded. and Cloudera Show, but Mike Olson and Ping Lee said, and the reality is if you go So if I'm the IT guy, I want to go in and have basically about here and how is their relationship with Oracle? These are the guys that you would call. here On the Ground, really appreciate it. here at Oracle Headquarters, thanks for watching.

ENTITIES

Entity	Category	Confidence
Brad Tewksbury	PERSON	0.99+
Brad	PERSON	0.99+
UBER	ORGANIZATION	0.99+
John	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
John Furier	PERSON	0.99+
Airbnb	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Rocana	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
84%	QUANTITY	0.99+
Mike Olson	PERSON	0.99+
today	DATE	0.99+
Hadoop	TITLE	0.99+
three things	QUANTITY	0.99+
Ping Lee	PERSON	0.98+
tomorrow	DATE	0.98+
third thing	QUANTITY	0.98+
Four years ago	DATE	0.98+
one	QUANTITY	0.98+
zero	QUANTITY	0.96+
Secondly	QUANTITY	0.96+
one bucket	QUANTITY	0.95+
Primm	TITLE	0.95+
One	QUANTITY	0.94+
one thing	QUANTITY	0.94+
Oracle Headquarters	LOCATION	0.94+
three data management platforms	QUANTITY	0.94+
telcos	ORGANIZATION	0.92+
On the Ground Cube	TITLE	0.91+
Hadoop World	TITLE	0.88+
first persons	QUANTITY	0.84+
Tax Bureau	ORGANIZATION	0.83+
Win Disk Scope	TITLE	0.82+
one view	QUANTITY	0.81+
zero data	QUANTITY	0.81+
CUBE	ORGANIZATION	0.81+
Cloud Air	TITLE	0.77+
SEQUEL	ORGANIZATION	0.75+
Cloud	TITLE	0.73+
360 view	QUANTITY	0.72+
Analytics	TITLE	0.7+
Rocana	TITLE	0.65+
On the Ground	TITLE	0.63+
Strata Hadoop and	ORGANIZATION	0.61+
Show	ORGANIZATION	0.57+
the Ground	TITLE	0.56+
theCUBE	ORGANIZATION	0.55+
Hadoop	ORGANIZATION	0.54+
Cloudera	TITLE	0.54+
O'Reilly	ORGANIZATION	0.4+
Ground	TITLE	0.31+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for O'Reilly Strata Hadoop: