Kickoff | theCUBE NYC 2018

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

ENTITIES

Entity	Category	Confidence
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Diane Greene	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
John	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
$30	QUANTITY	0.99+
New York	LOCATION	0.99+
2010	DATE	0.99+
IBM	ORGANIZATION	0.99+
Doug Cutting	PERSON	0.99+
Mike Olson	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Dallas	LOCATION	0.99+
O'Reilly	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
five	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Abi Mehda	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
$2.5 billion	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
$40 billion	QUANTITY	0.99+
17 employees	QUANTITY	0.99+
VMworld	ORGANIZATION	0.99+
Today	DATE	0.99+
Impala	ORGANIZATION	0.99+
Nine years	QUANTITY	0.99+
four years ago	DATE	0.98+
last night	DATE	0.98+
last decade	DATE	0.98+
Strata Data Conference	EVENT	0.98+
Strata Conference	EVENT	0.98+
Hadoop Summit	EVENT	0.98+
ninth year	QUANTITY	0.98+
Four years ago	DATE	0.98+
two worlds	QUANTITY	0.97+
five companies	QUANTITY	0.97+
today	DATE	0.97+
Strata Hadoop	EVENT	0.97+
Hadoop World	EVENT	0.96+
CUBE	ORGANIZATION	0.96+
Google Next	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.95+
this year	DATE	0.95+
Spark	ORGANIZATION	0.95+
US	LOCATION	0.94+
CUBENYC	EVENT	0.94+
Strata O'Reilly	ORGANIZATION	0.93+
next decade	DATE	0.93+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Greg Sands, Costanoa | Big Data NYC 2017

(electronic music) >> Host: Live from Midtown Manhattan it's The Cube! Covering Big Data New York City 2017, brought to you by Silicon Angle Media, and its Ecosystem sponsors. >> Okay, welcome back everyone. We are here live, The Cube in New York City for Big Data NYC, this is our fifth year, doing our own event, not with O'Reilly or Cloud Era at Strata Data, which as Hadoop World, Strata Conference, Strata Hadoop, now called Strata Data, probably called Strata AI next year, we're The Cube every year, bringing you all the great data, and what's going on. Entrepreneurs, VCs, thought leaders, we interview them and bring that to you. I'm John Furrier with our next guest, Greg Sands, who's the managing director and founder of Costa Nova ventures in Palo Alto, started out as an entrepreneur himself, then single shingle out there, now he's a big VC firm on a third fund. >> On the third fund. >> Third fund. How much in that fund? >> 175 million dollar fund. >> So now you're a big firm now, congratulations, and really great to see your success. >> Thanks very much. I mean, we're still very much an early stage boutique focused on companies that change the way the world does business, but it is the case that we have a bigger team and a bigger fund, to go do the same thing. >> Well you've been great to work with, I've been following you, we've known each other for a while, watched you left Sir Hill and start Costanova, but what's interesting is that, I can kind of joke and kid you, the VC inside joke about being a big firm, because I know you want to be small, and like to be small, help entrepreneurs, that's your thing. But it's really not a big firm, it's a few partners, but a lot of people helping companies, that's your ethos, that's what you're all about at your firm. Take a minute to just share with the folks the kinds of things you do and how you get involved in companies, you're hands on, you roll up your sleeves. You get out of the way at the right time, you help when you can, share your ethos. >> Yeah, absolutely so the way we think of it is, combining the craft of old school venture capital, with a modern operating team, and so since most founder these days are product-oriented, our job is to think like product people, not think like investors. So we think like product people, we do product level analysis, we do customer discovery, we do, we go ride along on sales calls when we're making investment decisions. And then we do the things that great venture capitalists have done for years, and so for example, at Alatian, who I know has been on the show today, we were able to incubate them in our office for a year, I had many conversations with Sathien after he'd sold the first two or three customers. Okay, who's the next person we hire? Who isn't a founder? Who's going to go out and sell? What does that person look like? Do you go straight to a VP? Or do you hire an individual contributor? Do you hire someone for domain, or do you hire someone for talent? And that's the thing that we love doing. Now we've actually built out an operating team so marketing partner, Martino Alcenco, and Jim Wilson as a sales partner, to really help turn that into a program, so that they can, we can take these founders who find product market fit, and say, how do we help you build the right sales process and marketing process, sales team and marketing team, for your company, your customer, your product? >> Well it's interesting since you mention old school venture capital, I'll get into some of the dynamics that are going on in Silicon valley, but it's important to bring that forward, because now with cloud you can get to critical mass on the fly wheel, on economics, you can see the visibility faster now. >> Greg: Absolutely. >> So the game of the old school venture capitalist is all the same, how do you get to cruising altitude, whatever metaphor you want to use, the key was getting there, and sometimes it took a couple of rounds, but now you can get these companies with five million, maybe $10 million funding, they can have unit economics visibility, scales insight, then the scale game comes in, so that seems to be the secret trick right now in venture is, don't overspend, keep the valuation in range and allows you to look for multiple exits potentially, or growth. Talk about that dynamic, because this is like, I call it the hour glass. You get through the hour glass, everyone's down here, but if you can sneak through and get the visibility on the economics, then you grow quickly. >> Absolutely. I mean, it's exactly right an I haven't heard the hour glass metaphor before but I like it. You want to basically get through the narrows of product market fit and the beginnings of scalable sales and marketing. You don't need to know all the answers, but you can do that in a capital-efficient way, building really solid foundations for future explosive growth, look, everybody loves fast growth and big markets, and being grown into. But the number of people who basically don't build those foundations and then say, go big or go home! And they take a ton of money, and they go spend all the money, doing things that just fundamentally don't work, and they blow themselves up. >> Well this is the hourglass problem. You have, once you get through that unique economics, then you have true scale, and value will increase. Everybody wins there so it's about getting through that, and you can get through it fast with good mentoring, but here's the challenge that entrepreneurs fall into the trap. I call it the, I think I made it trap. And what happens is they think they're on the other side of the hourglass, but they still haven't even gone through the straight and narrow yet, and they don't know it. And what they do is they over fund and implode. That seems to be a major trap I see a lot of entrepreneurs fall into, while I got a 50 million pre on my B round, or some monster valuation, and they get way too much cash, and they're behaving as if they're scaling, and they haven't even nailed it yet. >> Well, I think that's right. So there's certainly, there are stages of product market fit, and so I think people hit that first stage, and they say, oh I've got it. And they try to explode out of the gates. And we, in fact I know one good example of somebody saying, hey, by the way, we're doing great in field sales, and our investors want us to go really fast, so we are going to go inside and we, my job was to hire 50 inside people, without ever having tried it. And so we always preach crawl, walk, run, right? Hire a couple, see how it works. Right, in a new channel. Or a new category, or an adjacent space, and I think that it's helpful to have an investor who has seen the whole picture to say, yeah, I know it looks like light at the end of the tunnel, but see how it's a relatively small dot? You still got to go a little farther, and then the other thing I say is, look, don't build your company to feed your venture capitalist ego. Right? People do these big rounds of big valuations, and the big dog investors say, go, go, go! But, you're the CEO. Your job is analyze the data. >> John: You can find during the day (laughs). >> And say, you know, given what we know, how fast should we go? Which investments should we make? And you've got to own that. And I think sometimes our job is just to be the pulling guard and clear space for the CEO to make good decisions. >> So you know I'm a big fan, so my bias is pretty much out there, love what you guys are doing. Tim Carr is a Pivot North doing the same thing. Really adding value, getting down and dirty, but the question that entrepreneurs always ask me and talk privately, not about you, but in general, I don't want the VC to get in the way. I want them, I don't want them to preach to me, I don't want too many know-it-alls on my board, I want added value, but again, I don't want the preaching, I don't want them to get in the way, 'cause that's the fear. I'm not saying the same about VCs in general, but that's kind of the mentality of an entrepreneur. I want someone who's going to help me, be in the boat with me, but not be in my way. How do you address that concern to the founders who think, not think like that, but might have a fear. >> Well, by the way, I think it's a legitimate fear, and I think it actually is uncorrelated with added value, right? I think the idea that the board has certain responsibilities, and management has certain responsibilities, is incredibly important. And I think, I can speak for myself in saying, I'm quite conscious of not crossing that line, I think you talk. >> John: You got to build a return, that's the thing. >> But ultimately I would say to an entrepreneur, I'd just say, hey look, call references. And by the way, here are 30 names and phone numbers, and call any one of them, because I think that people who are, so a venture capital know-it-all, in the board room, telling CEOs what to do, destroys value. It's sand in the gears, and it's bad for the company. >> Absolutely, I agree 100% >> And some of my, when I talk about being a pulling guard for the CEO, that's what I'm talking about, which is blocking people who are destructive. >> And rolling the block for a touchdown, kind of use the metaphor. Adding value, that's the key, and that's why I wanted to get that out there because most guys don't get that nuance, and entrepreneurs, especially the younger ones. So it's good and important. Okay, let's talk about culture, obviously in Silicon Valley, I get, reading this morning in the Wymo guy, and they're writing it, that's the Silicon Valley, that's not crazy, there's a lot of great people in Silicon Valley, you're one of them. The culture's certainly an innovative culture, there's been some things in the press, inclusion and diversity, obviously is super important. This whole brogrammer thing that's been kind of kicked around. How are you dealing with all that? Because, you know, this is a cultural shift, but I think it's being made out more than it really is, but there's still our core issues, your thoughts on the whole inclusion and diversity, and this whole brogrammer blowback thing. >> Yeah, well so I think, so first of all, really important issues, glad we're talking about them, and we all need to get better. And to me the question for us has been, what role do we play? And because I would say it is a relatively small subset of the tech industry, and the venture capital industry. At the same time the behavior of that has become public is appalling. It's appalling and totally unacceptable, and so the question is, okay, how can we be a part of the stand-up part of the ecosystem, and some of which is calling things out when we see them. Though frankly we work with and hang out with people and we don't see them that often, and then part of which is, how do we find a couple of ways to contribute meaningfully? So for example this summer we ran what we called the Costanova Access Fellowship, intentionally, trying to provide first opportunity and venture capital for people who traditionally haven't had as much access. We created an event in the spring called, Seat at the Table, really, particularly around women in the tech industry, and it went so well that we're running it in New York on October 19th, so if you're a woman in tech in New York, we'd love to see you then. And we're just trying to figure-- >> You're doing it in an authentic way though, you're not really doing it from a promotional standpoint. It's legit. >> Yeah, we're just trying to do, you know, pick off a couple of things that we can do, so that we can be on the side of the good guys. >> So I guess what you're saying is just have high integrity, and be part of the solution not part of the problem. >> That's right, and by the way, both of these initiatives were ones that were kicked off in late 2016, so it's not a reaction to things like binary capital, and the problems at uper, both of which are appalling. >> Self-awareness is critical. Let's get back to the nuts and bolts of the real reason why I wanted you to come on, one was to find out how much money you have to spend for the entrepreneurs that are watching. Give us the update on the last fund, so you got a new fund that you just closed, the new fund, fund three. You have your other funds that are still out there, and some funds reserved, which, what's the number amount, how much are you writing checks for? Give the whole thesis. >> Absoluteley. So we're an early stage investor, so we lead series A and seed financing companies that change the way the world does business, so up and down the stack, a business-facing software, data-driven applications. Machine-learning and AI driven applications. >> John: But the filter is changing the way the world works? >> The way, yes, but in particularly the way the world does business. You can think of it as a business-facing software stack. We're not social media investors, it's not what we know, it's not what we're good at. And it includes security and management, and the data stack and-- >> Joe: Enterprise and emerging tech. >> That's right. And the-- >> And every crazy idea in between. >> That's right. (laughs) Absolutely, and so we're participate in or leave seed financings as most typically are half a million to maybe one and a quarter, and we'll lead series A financing, small ones might be two or two and a half million dollars at the outer edge is probably a six million dollar check. We were just opening up in the next couple of days, a thousand square feet of incubation space at world headquarters at Palo Alto. >> John: Nice. >> So Alation, Acme Ticketing and Zen IQ are companies that we invested in. >> Joe: What location is this going to be at? >> That's, near the Fills in downtown Palo Alto, 164 staff, and those three companies are ones where we effectively invested at formation and incubated it for a year, we love doing that. >> At the hangout at Philsmore and get the data. And so you got some funds, what else do you have going on? 175 million? >> So one was a $100 million fund, and then fund two was $135 million fund, and the last investment of fund two which we announced about three weeks ago was called Roadster, so it's ecommerce enablement for the modern dealerships. So Omnichannel and Mobile First infrastructure for auto-dealers. We have already closed, and had the first board meeting for the first new investment of fund three, which isn't yet announced, but in the land of computer vision and deep learning, so a couple of the subjects that we care deeply about, and spend a lot of time thinking about. >> And the average check size for the A round again, seed and A, what do you know about the? The lowest and highest? >> The average for the seed is half a million to one and a quarter, and probably average for a series A is four or five. >> And you'll lead As. >> And we will lead As. >> Okay great. What's the coolest thing you're working on right now that gets you excited? It doesn't have to be a portfolio company, but the research you're doing, thing, tires you're kicking, in subjects, or domains? >> You know, so honestly, one of the great benefits of the venture capital business is that I get up and my neurons are firing right away every day. And I do think that for example, one of the things that we love is is all of the adulant infrastructure and so we've got our friends at Victor Ops that are in the middle of that space, and the thinking about how the modern programmer works, how everybody-- >> Joe: Is security on your radar? >> Security is very much on our radar, in fact, someone who you should have on your show is Asheesh Guptar, and Casey Ella, so she's just joined Bug Crowd as the CEO and Casey moves over to CTO, and the word Bug Bounty was just entered into the Oxford Dictionary for the first time last week, so that to me is the ultimate in category creation. So security and dev ops tools are among the things that we really like. >> And bounties will become the norm as more and more decentralized apps hit the scene. Are you doing anything on decentralized applications? I'm not saying Blockchain in particular, but Blockchain like apps, distributing computing you're well versed on. >> That's right, well we-- >> Blockchain will have an impact in your area. >> Blockchain will have an impact, we just spent an hour talking about it in the context our off site in Decosona Lodge in Pascadero, it felt like it was important that we go there. And digging into it. I think actually the edge computing is actually more actionable for us right now, given the things that we're, given the things that we're interested in, and we're doing and they, it is just fascinating how compute centralizes and then decentralizes, centralizes and then decentralizes again, and I do think that there are a set of things that are fascinating about what your process at the edge, and what you send back to the core. >> As Pet Gelson here said in the QU, if you're not out in front of that next wave, you're driftwood, a lot of big waves coming in, you've seen a lot of waves, you were part of one that changed the world, Netscape browser, or the business plan for that first project manager, congratulations. Now you're at a whole nother generation. You ready? (laughs) >> Absolutely, I'm totally ready, I'm ready to go. >> Greg Sands here in The Cube in New York City, part of Big Data NYC, more live coverage with The Cube after this short break, thanks for watching. (electronic jingle) (inspiring electronic music)

Published Date : Sep 29 2017

SUMMARY :

brought to you by Silicon Angle Media, and founder of Costa Nova ventures in Palo Alto, How much in that fund? congratulations, and really great to see your success. but it is the case that we have the kinds of things you do and how you get And that's the thing that we love doing. I'll get into some of the dynamics that are going on is all the same, how do you get to But the number of people who basically but here's the challenge that and the big dog investors say, go, go, go! for the CEO to make good decisions. but that's kind of the mentality of an entrepreneur. Well, by the way, I think it's a legitimate fear, And by the way, here are 30 names and phone numbers, And some of my, and entrepreneurs, especially the younger ones. and so the question is, okay, You're doing it in an authentic way though, so that we can be on the side of the good guys. not part of the problem. and the problems at uper, of the real reason why I wanted you to come on, companies that change the way the world does business, and the data stack and-- And the-- and a half million dollars at the outer edge So Alation, Acme Ticketing and Zen IQ That's, near the Fills in downtown Palo Alto, And so you got some funds, and the last investment of fund two The average for the seed is but the research you're doing, and the thinking about how the modern are among the things that we really like. more and more decentralized apps hit the scene. and what you send back to the core. or the business plan for that first I'm ready to go. Greg Sands here in The Cube in New York City,

ENTITIES

Entity	Category	Confidence
Greg Sands	PERSON	0.99+
Asheesh Guptar	PERSON	0.99+
John	PERSON	0.99+
two	QUANTITY	0.99+
Tim Carr	PERSON	0.99+
John Furrier	PERSON	0.99+
Costa Nova	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Joe	PERSON	0.99+
October 19th	DATE	0.99+
Costanova	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
$10 million	QUANTITY	0.99+
New York	LOCATION	0.99+
$100 million	QUANTITY	0.99+
five million	QUANTITY	0.99+
Casey Ella	PERSON	0.99+
$135 million	QUANTITY	0.99+
Zen IQ	ORGANIZATION	0.99+
Omnichannel	ORGANIZATION	0.99+
50 million	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Pascadero	LOCATION	0.99+
Greg	PERSON	0.99+
New York City	LOCATION	0.99+
100%	QUANTITY	0.99+
50	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
Jim Wilson	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Casey	PERSON	0.99+
Alation	ORGANIZATION	0.99+
half a million	QUANTITY	0.99+
30 names	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
175 million	QUANTITY	0.99+
first	QUANTITY	0.99+
Victor Ops	ORGANIZATION	0.99+
Pet Gelson	PERSON	0.99+
both	QUANTITY	0.99+
last week	DATE	0.99+
four	QUANTITY	0.99+
three customers	QUANTITY	0.99+
late 2016	DATE	0.99+
fifth year	QUANTITY	0.99+
Cloud Era	ORGANIZATION	0.99+
Acme Ticketing	ORGANIZATION	0.98+
164 staff	QUANTITY	0.98+
NYC	LOCATION	0.98+
five	QUANTITY	0.98+
Oxford Dictionary	TITLE	0.98+
Midtown Manhattan	LOCATION	0.98+
Alatian	ORGANIZATION	0.98+
175 million dollar	QUANTITY	0.98+
next year	DATE	0.98+
today	DATE	0.97+
first time	QUANTITY	0.97+
third fund	QUANTITY	0.97+
first board	QUANTITY	0.97+
Costanoa	PERSON	0.97+
a year	QUANTITY	0.97+
six	QUANTITY	0.97+
one	QUANTITY	0.97+
one and a quarter	QUANTITY	0.96+
Strata Conference	EVENT	0.96+
The Cube	TITLE	0.96+
Strata AI	EVENT	0.96+
million dollar	QUANTITY	0.96+
2017	EVENT	0.95+
first project	QUANTITY	0.95+
two and a half million dollars	QUANTITY	0.95+
Hadoop World	EVENT	0.94+
Sathien	PERSON	0.93+
single shingle	QUANTITY	0.93+
first two	QUANTITY	0.93+
an hour	QUANTITY	0.92+
this summer	DATE	0.92+
first stage	QUANTITY	0.92+
Bug Crowd	ORGANIZATION	0.91+

Matt Maccaux, Dell EMC | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan. It's the CUBE. Covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsor. (electronic music) >> Hey, welcome back everyone, live here in New York. This is the CUBE here in Manhattan for Big Data NYC's three days of coverage. We're one day three, things are starting to settle in, starting to see the patterns out there. I'll say it's Big Data week here, in conjunction with Hadoop World, formerly known as Strata Conference, Strata-Hadoop, Strata-Data, soon to be Strata-AI, soon to be Strata-IOT. Big Data, Mike Maccaux who's the Global Big Data Practice Lead at Dell EMC. We've been in this world now for multiple years and, well, what a riot it's been. >> Yeah, it has. It's been really interesting as the organizations have gone from their legacy systems, they have been modernizing. And we've sort of seen Big Data 1.0 a couple years ago. Big Data 2.0 and now we're moving on sort of the what's next? >> Yeah. >> And it's interesting because the Big Data space has really lagged the application space. You talk about microservices-based applications, and deploying in the cloud and stateless things. The data technologies and the data space has not quite caught up. The technology's there, but the thinking around it, and the deployment of those, it seems to be a slower, more methodical process. And so what we're seeing in a lot of enterprises is that the ones that got in early, have built out capabilities, are now looking for that, how do we get to the next level? How do we provide self-service? How do we enable our data scientists to be more productive within the enterprise, right? If you're a startup, it's easy, right? You're somewhere in the public cloud, you're using cloud based API, it's all fine. But if you're an enterprise, with the inertia of those legacy systems and governance and controls, it's a different problem to solve for. >> Let's just face it. We'll just call a spade a spade. Total cost of ownership was out of control. Hadoop was great, but it was built for something that tried to be something else as it evolved. And that's good also, because we need to decentralize and democratize the incumbent big data warehouse stuff. But let's face it, Hadoop is not the game anymore, it's everything else. >> Right, yep. >> Around it so, we've seen that, that's a couple years old. It's about business value right now. That seems to be the big thing. The separation between the players that can deliver value for the customers. >> Matt: Yep. >> And show a little bit of headroom for future AI things, they've seen that. And have the cloud on premise play. >> Yep. >> Right now, to me, that's the call here. What do you, do you agree? >> I absolutely see it. It's funny, you talk to organizations and they say, "We're going cloud, we're doing cloud." Well what does that mean? Can you even put your data in the cloud? Are you allowed to? How are you going to manage that? How are you going to govern that? How are you going to secure that? So many organizations, once they've asked those questions, they've realized, maybe we should start with the model of cloud on premise. And figure out what works and what doesn't. How do users actually want to self serve? What do we templatize for them? And what do we give them the freedom to do themselves? >> Yeah. >> And they sort of get their sea legs with that, and then we look at sort of a hybrid cloud model. How do we be able to span on premise, off premise, whatever your public cloud is, in a seamless way? Because we don't want to end up with the same thing that we had with mainframes decades ago, where it was, IBM had the best, it was the fastest, it was the most efficient, it was the new paradigm. And then 10 years later, organizations realized they were locked in, there was different technology. The same thing's true if you go cloud native. You're sort of locked in. So how do you be cloud agnostic? >> How do you get locked in a cloud native? You mean with Amazon? >> Or any of them, right? >> Okay. >> So they all have their own APIs that are really good for doing certain things. So Google's TensorFlow happens to be very good. >> Yeah. Amazon EMR. >> But you build applications that are using those native APIS, you're sort of locked. And maybe you want to switch to something else. How do you do that? So the idea is to-- >> That's why Kubernetes is so important, right now. That's a very key workload and orchestration container-based system. >> That's right, so we believe that containerization of workloads that you can define in one place, and deploy anywhere is the path forward, right? Deploy 'em on prem, deploy 'em in a private cloud, public cloud, it doesn't matter the infrastructure. Infrastructure's irrelevant. Just like Hadoop is sort of not that important anymore. >> So let me get your reaction on this. >> Yeah. So Dell EMC, so you guys have actually been a supplier. They've been the leading supplier, and now with Dell EMC across the portfolio of everything. From Dell computers, servers and what not, to storage, EMC's run the table on that for many generations. Yeah, there's people nippin' at your heels like Pure, okay that's fine. >> Sure. It's still storage is storage. You got to store the data somewhere, so storage will always be around. Here's what I heard from a CXO. This is the pattern I hear, but I'll just summarize it in one conversation. And then you can give a reaction to it. John, my life is hell. I have application development investment plan, it's just boot up all these new developers. New dev ops guys. We're going to do open source, I got to build that out. I got that, trying to get dev ops going on. >> Yep. >> That's a huge initiative. I got the security team. I'm unbundling from my IT department, into a new, difference in a reporting to the board. And then I got all this data governance crap underneath here, and then I got IOT over the top, and I still don't know where my security holes are. >> Yep. And you want to sell me what? (Matt laughs) So that's the fear. >> That's right. >> Their plates are full. How do you guys help that scenario? You walk in, actually security's pretty much, important obviously you can see that. But how do you walk into that conversation? >> Yeah, it's sort of stop the madness, right? >> (laughs) That's right. >> And all of that matters-- >> No, but this is all critical. Every room in the house is on fire. >> It is. >> And I got to get my house in order, so your comment to me better not be hype. TensorFlow, don't give me this TensorFlow stuff. >> That's right. >> I want real deal. >> Right, I need, my guys are-- >> I love TensorFlow but, doesn't put the fire out. >> They just want spark, right? I need to speed up my-- >> John: All right, so how do you help me? >> So, what we'd do is, we want to complement and augment their existing capabilities with better ways of scaling their architecture. So let's help them containerize their big data workload so that they can deploy them anywhere. Let's help them define centralized security policies that can be defined once and enforced everywhere, so that now we have a way to automate the deployment of environments. And users can bring their own tools. They can bring their data from outside, but because we have intelligent centralized policies, we can enforce that. And so with our elastic data platform, we are doing that with partners in the industry, Blue Talent and Blue Data, they provide that capability on top of whatever the customer's infrastructure is. >> How important is it to you guys that Dell EMC are partnering. I know Michael Dell talks about it all the time, so I know it's important. But I want to hear your reaction. Down in the trenches, you're in the front lines, providing the value, pulling things together. Partnerships seem to be really important. Explain how you look at that, how you guys do your partners. You mentioned Blue Talent and Blue Data. >> That's right, well I'm in the consulting organization. So we are on the front lines. We are dealing with customers day in and day out. And they want us to help them solve their problems, not put more of our kit in their data centers, on their desktops. And so partnering is really key, and our job is to find where the problems are with our customers, and find the best tool for the best job. The right thing for the right workload. And you know what? If the customer says, "We're moving to Amazon," then Dell EMC might not sell any more compute infrastructure to that customer. They might, we might not, right? But it's our job to help them get there, and by partnering with organizations, we can help that seamless. And that strengthens the relationship, and they're going to purchase-- >> So you're saying that you will put the customer over Dell EMC? >> Well, the customer is number one to Dell EMC. Net promoter score is one of the most important metrics that we have-- >> Just want to make sure get on the record, and that's important, 'cause Amazon, and you know, we saw it in Net App. I've got to say, give Net App credit. They heard from customers early on that Amazon was important. They started building into Amazon support. So people saying, "Are you crazy?" VMware, everyone's saying, "Hey you capitulated "by going to Amazon." Turns out that that was a damn good move. >> That's right. >> For Kelsinger. >> Yep. >> Look at VM World. They're going to own the cloud service provider market as an arms dealer. >> Yep. >> I mean, you would have thought that a year ago, no way. And then when they did the deal, they said, >> We have really smart leadership in the organization. Obviously Michael is a brilliant man. And it sort of trickles on down. It's customer first, solve the customer's problems, build the relationship with them, and there will be other things that come, right? There will be other needs, other workloads. We do happen to have a private cloud solution with Virtustream. Some of these customers need that intermediary step, before they go full public, with a hosted private solution using a Virtustream. >> All right, so what's the, final question, so what's the number one thing you're working on right now with customers? What's the pattern? You got the stack rank, you're requests, your deliverables, where you spend your time. What's the top things you're working on? >> The top thing right now is scaling architectures. So getting organizations past, they've already got their first 20 use cases. They've already got lakes, they got pedabytes in there. How do we enable self service so that we can actually bring that business value back, as you mentioned. Bring that business value back by making those data scientists productive. That's number one. Number two is aligning that to overall strategy. So organizations want to monetize their data, but they don't really know what that means. And so, within a consulting practice, we help our customers define, and put a road map in place, to align that strategy to their goals, the policies, the security, the GDP, or the regulations. You have to marry the business and the technology together. You can't do either one in isolation. Or ultimately, you're not going to be efficient. >> All right, and just your take on Big Data NYC this year. What's going on in Manhattan this year? What's the big trend from your standpoint? That you could take away from this show besides it becoming a sprawl of you know, everyone just promoting their wares. I mean it's a big, hyped show that O'Reilly does, >> It is. >> But in general, what's the takeaway from the signal? >> It was good hearing from customers this year. Customer segments, I hope to see more of that in the future. Not all just vendors showing their wares. Hearing customers actually talk about the pain and the success that they've had. So the Barclay session where they went up and they talked about their entire journey. It was a packed room, standing room only. They described their journey. And I saw other banks walk up to them and say, "We're feeling the same thing." And this is a highly competitive financial services space. >> Yeah, we had Packsotta's customer on Standard Bank. They came off about their journey, and how they're wrangling automating. Automating's the big thing. Machine learning, automation, no doubt. If people aren't looking at that, they're dead in my mind. I mean, that's what I'm seeing. >> That's right. And you have to get your house in order before you can start doing the fancy gardening. >> John: Yeah. >> And organizations aspire to do the gardening, right? >> I couldn't agree more. You got to be able to drive the car, you got to know how to drive the car if you want to actually play in this game. But it's a good example, the house. Got to get the house in order. Rooms are on fire (laughs) right? Put the fires out, retrench. That's why private cloud's kicking ass right now. I'm telling you right now. Wikibon nailed it in their true private cloud survey. No other firm nailed this. They nailed it, and it went viral. And that is, private cloud is working and growing faster than some areas because the fact of the matter is, there's some bursting through the clouds, and great use cases in the cloud. But, >> Yep. >> People have to get the ops right on premise. >> Matt: That's right, yep. >> I'm not saying on premise is going to be the future. >> Not forever. >> I'm just saying that the stack and rack operational model is going cloud model. >> Yes. >> John: That's absolutely happening, that's growing. You agree? >> Absolutely, we completely, we see that pattern over and over and over again. And it's the Goldilocks problem. There's the organizations that say, "We're never going to go cloud." There's the organizations that say, "We're going to go full cloud." For big data workloads, I think there's an intermediary for the next couple years, while we figure out operating pulse. >> This evolution, what's fun about the market right now, and it's clear to me that, people who try to get a spot too early, there's too many diseconomies of scale. >> Yep. >> Let the evolution, Kubernetes looking good off the tee right now. Docker containers and containerization in general's happened. >> Yep. >> Happening, dev ops is going mainstream. >> Yep. >> So that's going to develop. While that's developing, you get your house in order, and certainly go to the cloud for bursting, and other green field opportunities. >> Sure. >> No doubt. >> But wait until everything's teed up. >> That's right, the right workload in the right place. >> I mean Amazon's got thousands of enterprises using the cloud. >> Yeah, absolutely. >> It's not like people aren't using the cloud. >> No, they're, yeah. >> It's not 100% yet. (laughs) >> And what's the workload, right? What data can you put there? Do you know what data you're putting there? How do you secure that? And how do you do that in a repeatable way. Yeah, and you think cloud's driving the big data market right now. That's what I was saying earlier. I was saying, I think that the cloud is the unsubtext of this show. >> It's enabling. I don't know if it's driving, but it's the enabling factor. It allows for that scale and speed. >> It accelerates. >> Yeah. >> It accelerates... >> That's a better word, accelerates. >> Accelerates that horizontally scalable. Mike, thanks for coming on the CUBE. Really appreciate it. More live action we're going to have some partners on with you guys. Next, stay with us. Live in Manhattan, this is the CUBE. (electronic music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by Silicon Angle Media This is the CUBE here in Manhattan sort of the what's next? And it's interesting because the decentralize and democratize the The separation between the players And have the cloud on premise play. Right now, to me, that's the call here. the model of cloud on premise. IBM had the best, it was the fastest, So Google's TensorFlow happens to be very good. So the idea is to-- and orchestration container-based system. and deploy anywhere is the path forward, right? So let me get your So Dell EMC, so you guys have And then you can give a reaction to it. I got the security team. So that's the fear. How do you guys help that scenario? Every room in the house is on fire. And I got to get my house in order, doesn't put the fire out. the deployment of environments. How important is it to you guys And that strengthens the relationship, Well, the customer is number one to Dell EMC. and you know, we saw it in Net App. They're going to own the cloud service provider market I mean, you would have thought that a year ago, no way. build the relationship with them, You got the stack rank, you're the policies, the security, the GDP, or the regulations. What's the big trend from your standpoint? and the success that they've had. Automating's the big thing. And you have to get your house in order But it's a good example, the house. the stack and rack operational model John: That's absolutely happening, that's growing. And it's the Goldilocks problem. and it's clear to me that, Kubernetes looking good off the tee right now. and certainly go to the cloud for bursting, That's right, the right workload in the I mean Amazon's got It's not 100% yet. And how do you do that in a repeatable way. but it's the enabling factor. Mike, thanks for coming on the CUBE.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Michael	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Mike Maccaux	PERSON	0.99+
Matt Maccaux	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Matt	PERSON	0.99+
Manhattan	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
New York	LOCATION	0.99+
100%	QUANTITY	0.99+
Blue Data	ORGANIZATION	0.99+
Mike	PERSON	0.99+
Blue Talent	ORGANIZATION	0.99+
Dell EMC	ORGANIZATION	0.99+
Standard Bank	ORGANIZATION	0.99+
Big Data	ORGANIZATION	0.99+
this year	DATE	0.99+
one	QUANTITY	0.99+
VM World	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
thousands	QUANTITY	0.99+
Barclay	ORGANIZATION	0.99+
Hadoop	TITLE	0.98+
three days	QUANTITY	0.98+
decades ago	DATE	0.98+
NYC	LOCATION	0.98+
one day	QUANTITY	0.98+
one conversation	QUANTITY	0.98+
Goldilocks	PERSON	0.98+
O'Reilly	ORGANIZATION	0.98+
a year ago	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Midtown Manhattan	LOCATION	0.98+
10 years later	DATE	0.97+
TensorFlow	ORGANIZATION	0.97+
first 20 use cases	QUANTITY	0.97+
Google	ORGANIZATION	0.97+
Kelsinger	PERSON	0.97+
New York City	LOCATION	0.96+
first	QUANTITY	0.95+
VMware	ORGANIZATION	0.93+
Strata Conference	EVENT	0.93+
Big Data	EVENT	0.92+
Strata-Hadoop	EVENT	0.9+
Strata-Data	EVENT	0.9+
Number two	QUANTITY	0.9+
next couple years	DATE	0.86+
couple years ago	DATE	0.84+
2017	DATE	0.84+
Global Big Data	ORGANIZATION	0.83+
Packsotta	ORGANIZATION	0.83+
Hadoop World	ORGANIZATION	0.83+
Big Data 2.0	TITLE	0.81+
three	QUANTITY	0.79+
couple years	QUANTITY	0.76+
Big Data 1.0	TITLE	0.73+
Net App	TITLE	0.72+
2017	EVENT	0.71+
one place	QUANTITY	0.69+
number one	QUANTITY	0.67+
Kubernetes	ORGANIZATION	0.67+
enterprises	QUANTITY	0.66+

Murthy Mathiprakasam, Informatica | Big Data NYC 2017

>> Narrator: Live from midtown Manhattan, it's theCUBE. Covering BigData, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we're here live in New York City for theCUBE's coverage of BigData NYC, our event we've been running for five years, been covering BigData space for eight years, since 2010 when it was Hadoop World, Strata Conference, Strata Hadoop, Strata Data, soon to be called Strata AI, just a few. We've been theCUBE for all eight years. Here, live in New York, I'm John Furrier. Our next guest is Murthy Mathiprakasam, who is the Director of Product Marketing at Informatica. Cube alumni has been on many times, we cover Informatica World, every year. Great to see you, thanks for coming by and coming in. >> Great to see you. >> You guys do data, so there's not a lot of recycling going on in the data because we've been talking about it all week, total transformation, but the undercurrent has been a lot of AI, AI this, and you guys have the CLAIRE product, doing a lot of things there. But outside of the AI, the undertone is cloud, cloud, cloud. Governance, governance, governance. There's two kind of the drivers I'm seeing as the force of this week is, a lot of people trying to get their act together on those two fronts and you can kind of see the scabs on the industry, people, some people haven't been paying attention. And they're weak in the area. Cloud is absolutely going to be driving the BigData world, 'cause data is horizontal. Cloud's the power source that you guys have been on that. What's your thoughts, what other drivers encourage you? (mumbles) what I'm saying and what else did I miss? Security is obviously in there, but-- >> Absolutely, no, so I think you're exactly right on. So obviously governments security is a big deal. Largely being driven by the GDPR regulation, it's happening in Europe. But, I mean every company today is global, so. Everybody's essentially affected by it. So, I think data until now has always been a kind of opportunistic thing, that there's a couple guys and their organizations were looking at it as oh, let's do some experimentation. Let's do something interesting here. Now, it's becoming government managed so I think there's a lot of organizations who are, like, to your point, getting their act together, and that's driving a lot of demand for data management projects. So now, people say, well, if I got to get my act together, I don't have to hire armies of people to do it, let me look for automated machine learning based ways of doing it. So that they can actually deliver on their audit reports that they need to deliver on, and ensure the compliance that they need to ensure, but do it in a very scalable way. >> I've been kind of joking all week, and I kind of had this meme in my head, so I've been pounding on it all week, calling it the tool shed problem. The tool shed problem is, everyone's got these tools. They throw them into the tool shed. They bought a hammer and the company that sells them the hammer is trying to turn it to a lawnmower, right? You can't mow your lawn with a hammer, it's not going to work, and so this, these tools are great but it defines work. What you do, but, the platforming issue is a huge one. And you start to see people who took that view. You guys were one of them because in a platform centric view with tools that are enabled, to be highly productive. You don't have to worry about new things like a government's policy, the GDPR that might pop up, or the next Equifax that's around the corner. There's probably two or three of them going on right now. So, that's an impact, the data, who uses it, how it's used, and who's at fault or whatever. So, how does a company deal with that? And machine learning has proven to be a great horse that a lot of people are riding right now. You guys are doing it, how does a customer deal with that tsunami of potential threats? Architecture challenges, what is your solution, how do you talk about that? >> Well, I think machine learning, you know, up until now has been seen as the kind of, nice to have, and I think that very quickly, it's going to become a must have. Because, exactly like you're saying, it really is a tsunami. I mean, you could see people who are nervous about the fact that I mean, there's different estimates. It's like 40% growth in data assets from most organizations every year. So, you can try to get around this somehow with one of these (mumbles) tools or something. But at some point, something is going to break, either you just don't, run out of manpower, you can't train the manpower, people start leaving. whatever the operational challenges are, it just isn't going to scale. Machine learning is the only approach. It is absolutely the only approach that actually ensures that you can maintain data for these kind of defensive reasons like you're saying. The structure and compliance, but also the kind of offensive opportunistic reasons, and do it scalably, 'cause there's just no other way mathematically speaking, that when the data is growing 40% a year, just throwing a bunch of tools at it just doesn't work. >> Yeah, I would just amplify and look right in the camera, say, if you're not on machine learning, you're out of business. That's a straight up obvious trend, 'cause that's a precursor to AI, real AI. Alright, let's get down to data management, so when people throw around data management, it's like, oh yeah we've got some data management. There are challenges with that. You guys have been there from day one. But now if you take it out in the future, how do you guys provide the data management in a totally cloud world where now the customer certainly has public and private, or on premise but theirs might have multi cloud? So now, comes a land grab for the data layer, how do you guys play in that? >> Well, I think it's a great opportunity for these kind of middle work platforms that actually do span multiple clouds, that can span the internal environments. So, I'll give you an example. Yesterday we actually had a customer speaking at Astrada here, and he was talking about from him, the cloud is really just a natural extension of what they're already doing, because they already have a sophisticated data practice. This is a large financial services organization, and he's saying well now the data isn't all inside, some of it's outside, you've got partners, who've got data outside. How do we get to that data? Clearly, the cloud is the path for doing that. So, the fact that the cloud is a national extension a lot of organizations were already doing internally means they don't want to have a completely different approach to the data management. They want to have a consistent, simple, systematic repeatable approach to the data management that spans, as you said, on premise in the cloud. That's why I think the opportunity of a very mature and sophisticated platform because you're not rewriting and re-platforming for every new, is it AWS, is it Azure? Is it something on premise? You just want something that works, that shields you from the underlying infrastructure. >> So I put my skeptic hat on for a second and challenge you on this, because this I think is fundamental. Whether it's real or not, it's perceived, maybe in the back of the mind of the CXO or the CDO, whoever is enabled to make these big calls. If they have the keys to the kingdom in Informatica, I'm going to get locked in. So, this is a deep fear. People wake up with nightmares in the enterprise, they've seen locked in before. How do you explain that to a customer that you're going to be an enabling opportunity for them, not a lock in and foreclosing future benefits. Especially if I have an unknown scenario called multi-cloud. I mean, no one's really doing multi-cloud let's face it. I mean, I have multiple clouds with stuff on it, >> At least not intentionally. Sometimes you got a line of businesses and doing things, but absolutely I get it. >> No one's really moving workloads dynamically between clouds in real time. Maybe a few people doing some hacks, but for the most part of course, not a standard practice. >> Right. >> But they want it to be. >> Absolutely. >> So that's the future. From today, how do you preserve that position with the customer where you say hey we're going to add value, but we're not going to lock you in? >> So the whole premise again of, I mean, this goes back to classic three tier models of how you think about technology stacks, right? There's an infrastructure layer, there's a platform layer, there's an analytics layer and the whole premise of the middle of the layer, the platform layer, is that it enables flexibility in the other two layers. It's precisely when you don't have something that's kind of intermediating the data and the use of the data, that's when you run into challenges with flexibility and with data being locked in the data store. But you're absolutely right. We had dinner with a bunch of our customers last night. They were talking about they'd essentially evaluated every version of sort of BigData platform and data infrastructure platform right? And why? It was because they were a large organization and your different teams start stuff and they had to compute them out and stuff. And I was like that must have been pretty hard for you guys. Now what we were using Informatica, so it didn't really matter where the data was, we were still doing everything as far as the data management goes from a consistent layer and we integrate with all those different platforms. >> John: So you didn't get in the way? >> We didn't get in the way. >> You've actually facilitated. >> We are facilitating increased flexibility. Because without a layer like that, a fabric, or whatever you want to call it a data platform that's facilitating this the complexity's going to get very, very crazy very soon. If it hasn't already. The number of infrastructure platforms that are available like you said, on premise and on the cloud now, keeps growing. The number of analytical tools that are available is also growing. And all this is amazing innovation by the way. This is all great stuff, but to your point about it if your the chief officer of an organization going, I got to get this thing figured out somehow. I need some sanity, that's really the purpose of-- >> They just don't want to know the tool for tool's sake, they need to have it be purposeful. >> And that's why this machine learning aspect is very, very critical because I was thinking about an analogy just like you were and I was thinking, in a way you can think of data managing as sort of cleaning stuff up and there are people that have brooms and mops and all these different tools. Well, we are bringing a Roomba to market, right? Because you don't want to just create tools that transfer the laborer around, which is a little bit of what's going on. You want to actually get the laborer out of the equation, so that the people are focused on the context, business strategy and the data management is sort of cleaning itself. It's doing the work for you. That's really what Informatica's vision is. It's about being a kind of enterprise cloud data management vendor that is leveraging AI under the hood so that you can sort of set it and forget it. A lot of this ingestion and the cleansing, telling annals what data they should be looking for. All the stuff is just happening in an automated way and you're not in this total chaos. >> And that can be some tools will be sitting in the back for a long time. In my tool shed, when I had one back in a big enough property back east. No one has tool sheds by the way. No one does any gardening. The issue is in the day, I need to have a reliable partner. So I want you to take a minute and explain to the folks who aren't yet Informatica customers why they should be and the Informatica customers why they should stay with Informatica. >> Absolutely, so certainly the ones we have, a very loyal customer base. In fact the guy who was presenting with us yesterday, he said he's been with Informatica since 1999, going through various versions of our products and adopting new innovations. So we have a very loyal customer base, so I think that loyalty itself speaks for itself as well. As far as net new customers, I think that in a world of this increasing data complexity, it's exactly what you were saying, you need to find an approach that is going to scale. I keep hearing this word from the chief data officer, I kind of got something some going on today, I don't know how I scale it. How is this going to work in 2018 and 2019, in 2025? And it's just daunting for some of these guys. Especially going back to your point about compliance, right? So it's one thing if you have data sitting around, data so to speak, that you're not using it. But god forbid now, you got legal and regulatory concerns around it as well. So you have to get your arms around the data and that's precisely where Informatica can help because we've actually thought through these problems and we've talked about them. >> Most of them were a problem you solved because at the end of the day, we were talking about problems that have massive importance, big time consequences people can actually quantify. >> That's right. >> So what specific problem highest level do you solve is the most important, has the most consequences? >> Everything from ingestion of raw data sets from wherever like you said, in the cloud on premise, all the way through all the processes you need to make it fully usable. And we view that as one problem. There's other vendors who think that one aspect of that is a problem and it is worth solving. We really think, look at the end of the day, you got raw stuff and you have to turn it into useful stuff. Everything in there has to happen, so we might as well just give you everything and be very, very good at doing all those things. And so that's what we call enterprise cloud data management. It's everything from raw material to finished goods of insights. We want to be able to provide that in a consistent integrated and machine learning integrate it. >> Well you guys have a loyal customer base but to be fair and you kind of have to acknowledge that there is a point in time and not throw Informatica's away the big customers, big engagements. But there was a time in Informatica's history where you went private. There was some new management came in. There was a moment where the boat was taking on water, right? And you could almost look at it and say, hmm, you know, we're in this space. You guys retooled around that. Success to the team. Took it to another dimension. So that's the key thing. You know a lot of the companies become big and it's hard to get rid of. So the question is that's a statement. I think you guys done a great job. Yet, the boat might have taken on water, that's my opinion, but you can probably debate that. But I think as you get mature and you're in public, you just went private. But here's the thing, you guys have had a good product chop in Informatica, so I got to ask you the question. What cool things are you doing? Because remember, cool shiny new toys help put a little flash and glam on the nuts and bolts that scales. What are you guys doing? I know you just announced claire, some AI stuff. What's the hot stuff you're doing that's adding value? >> Yeah, absolutely, first of all, this kind of addresses your water comment as well. So we are probably one of the few vendors that spends almost about $200 million in R and D. And that hasn't changed through the acquisition. If anything, I think it actually increased a little bit because now our investors are even more committed to innovation. >> Well you're more nimble in private. A lot more nimble. >> Absolutely, a lot more ideas that are coming to the forefront. So there's never been any water just to be clear. But to answer your follow on question about some examples of this innovation. So I think Ahmed yesterday talked about some of our recent release as well but we really just keep pushing on this idea of, I know I keep saying this but it's this whole machine learning approach here of how can we learn more about the data? So one of the features, I'll give you an example, is if we can actually go look at a file and if we spot like a name and an address and some order information, that probably is a customer, right? And we know that right, because we've seen past data sets. So, there's examples of this pattern matching where you don't even have to have data that's filled out. And this is increasingly the way the data looks we are not dealing with relational tables anymore it's JSON files, it's web blogs, XML files, all of that data that you had to have that data scientists go through and parse and sift through, we just automatically recognize it now. If we can look for the data and understand it, we can match it. >> Put that in context in the order of benefits that, from the old way versus the current way, what's the pain levels? One versus the other, can you put context around that? In terms of, it's pretty significant. >> It's huge because again, back to this sort of volume and variety of data that people are trying to get into systems and do it very rapidly. I'll give you a really tangible customer case. So, this is a customer that presented at Informatica World a couple months ago. It's Jewelry TV, I can actually tell you the name. So there are one of these online kind of shopping sites and they've got a TV program that goes with the online site. So what they do is obviously when you promote something on TV, your orders go up online, right? They wanted to flip it around and they said, look, let's look at the web logs of the traffic that's on the website and then go promote that on the TV program. Because then you get a closed loop and start to have this explosion of sales. So they used Informatica, didn't have to do any of this hand coding. They just build this very quickly and with the graphical user interface that we provide, it leverages sparks streaming under the hood. So they are using all these technologies under the hood, they just didn't have to do any of the manual coding. Got this thing out in a couple days and it works. And they have been able to measure it and they're actually driving increased sales by taking the data and just getting it out to the people that need to see the data very, very quickly. So that's an example of a use case where this isn't just to your point about is this a small, incremental type of thing. No, there is a lot of money behind data if you can actually put it to good use. >> The consequences are grave and I think you've seen more and more, I mean the hacks just amplify it over and over again. It's not a cost center when you think about it. It has to be somehow configured differently as a profit center, even though it might not drive top line revenue directly like an app or anything else. It's not a cost center. If anything it will be treated as a profit center because you get hacked or someone's data is misused, you can be out of business. There is no profit. Look at the results of these hacks. >> The defensive argument is going to become very, very strong as these regulations come out. But, let's be clear, we work with a lot of the most advanced customers. There are people making money off of this. It can be a top line driver-- >> No it should be, it should be. That's exactly the mindset. So the final question for you before we break. I know we're out of time here. There are some chief data officers that are enabled, some aren't and that's just my observation. I don't want to pidgeonhole anyone, but some are enable to really drive change, some are just figureheads that are just managing the compliance risk and work for the CFO and say no to everything. I'm over-generalizing. But that's essentially how I see it. What's the problem with that? Because the cost center issue has, we've seen this moving before in the security business. Security should not be part of IT. That's it's own deal. >> Exactly. >> So we're kind of, this is kind of smoke, but we're coming out of the jungle here. Your thoughts on that. >> Yeah, you're absolutely right. We see a variety of models. We can see the evolution of those models and it's also very contextual to different industries. There are industries that are inherently more regulated, so that's why you're seeing the data people maybe more in those cost center areas that are focused on regulations and things like that. There's other industries that are a lot more consumer oriented. So for them, it makes more sense to have the data people be in a department that seems more revenue basing. So it's not entirely random. There are some reasons, that's not to say that's not the right model moving forward, but someday, you never know. There is a reason why this role became a CXO in the first place. Maybe it is somebody who reports to the CEO and they really view the data department as a strategic function. And it might take a while to get there, but I don't think it's going to take a long time. Again, we're talking about 40% growth in the data and these guys are realizing that now and I think we're going to see very quickly people moving out of the whole tool shed model, and moving to very systematic, repeatable practices. Sophisticated middleware platforms and-- >> As we say don't be a tool, be a platform. Murphy thanks so much for coming on to theCUBE, we really appreciate it. What's going on in Informatica real quick. Things good? >> Things are great. >> Good, awesome. Live from New York, this is theCUBE here at BigData NYC more live coverage continuing day three after this short break. (digital music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media soon to be called Strata AI, just a few. Cloud's the power source that you guys have been on that. the compliance that they need to ensure, And you start to see people who took that view. that you can maintain data for these kind So now, comes a land grab for the data layer, that shields you from the underlying infrastructure. So I put my skeptic hat on for a second and challenge you Sometimes you got a line of businesses and doing things, but for the most part of course, not a standard practice. So that's the future. is that it enables flexibility in the other two layers. the complexity's going to get very, very crazy very soon. they need to have it be purposeful. so that you can sort of set it and forget it. The issue is in the day, I need to have a reliable partner. So you have to get your arms around the data because at the end of the day, we were talking about all the processes you need to make it fully usable. But here's the thing, you guys have had a good product So we are probably one of the few vendors that spends almost Well you're more nimble in private. So one of the features, I'll give you an example, of benefits that, from the old way versus the current way, So what they do is obviously when you promote something on It's not a cost center when you think about it. of the most advanced customers. So the final question for you before we break. So we're kind of, this is kind of smoke, So for them, it makes more sense to have the data people Murphy thanks so much for coming on to theCUBE, Live from New York, this is theCUBE here at BigData NYC

ENTITIES

Entity	Category	Confidence
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2018	DATE	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
Europe	LOCATION	0.99+
Astrada	ORGANIZATION	0.99+
2025	DATE	0.99+
New York	LOCATION	0.99+
yesterday	DATE	0.99+
five years	QUANTITY	0.99+
2019	DATE	0.99+
three	QUANTITY	0.99+
New York City	LOCATION	0.99+
Murphy	PERSON	0.99+
eight years	QUANTITY	0.99+
two layers	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
first	QUANTITY	0.99+
today	DATE	0.99+
two fronts	QUANTITY	0.99+
1999	DATE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one problem	QUANTITY	0.99+
last night	DATE	0.98+
Ahmed	PERSON	0.98+
Yesterday	DATE	0.98+
2010	DATE	0.98+
one thing	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
NYC	LOCATION	0.98+
40% a year	QUANTITY	0.97+
Hadoop World	EVENT	0.97+
Equifax	ORGANIZATION	0.97+
day three	QUANTITY	0.96+
Strata Hadoop	EVENT	0.95+
Informatica World	ORGANIZATION	0.95+
two kind	QUANTITY	0.95+
2017	DATE	0.95+
about $200 million	QUANTITY	0.94+
one aspect	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Informatica World	EVENT	0.91+
this week	DATE	0.9+
40% growth	QUANTITY	0.88+
BigData	ORGANIZATION	0.87+
three tier	QUANTITY	0.87+
day one	QUANTITY	0.87+
Strata Data	EVENT	0.85+
CXO	TITLE	0.85+
Cube	ORGANIZATION	0.84+
midtown Manhattan	LOCATION	0.83+
about 40% growth	QUANTITY	0.8+
couple months ago	DATE	0.8+
Strata AI	EVENT	0.79+
couple guys	QUANTITY	0.76+
claire	PERSON	0.71+
lot of money	QUANTITY	0.67+
One	QUANTITY	0.66+
BigData	TITLE	0.64+
CDO	TITLE	0.63+
couple days	QUANTITY	0.63+
JSON	TITLE	0.62+

Santhosh Mahendiran, Standard Chartered Bank | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat techno music) >> Okay welcome back, we're live here in New York City. It's theCUBE's presentation of Big Data NYC, our fifth year doing this event in conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now Big Data. This is theCUBE, I'm John Furrier. Our next guest is Santhosh Mahendiran, who is the global head of technology analytics at Standard Chartered Bank. A practitioner in the field, here getting the data, checking out the scene, giving a presentation on your journey with Data at a bank, which is big financial obviously an adopter. Welcome to theCUBE. >> Thank you very much. >> So we always want to know what the practitioners are doing because at the end of the day there's a lot of vendors selling stuff here, so you got, everyone's got their story. End of the day you got to implement. >> That's right. >> And one of the themes is the data democratization which sounds warm and fuzzy, collaborating with data, this is all good stuff and you feel good and you move into the future, but at the end of the day it's got to have business value. >> That's right. >> And as you look at that, how do you look at the business value? Cause you want to be in the bleeding edge, you want to provide value and get that edge operationally. >> That's right. >> Where's the value in data democratization? How did you guys roll this out? Share your story. >> Okay, so let me start with the journey first before I come to the value part of it, right? So, data democratization is an outcome, but the journey has been something we started three years back. So what did we do, right? So we had some guiding principles to start our journey. The first was to say that we believed in the three S's, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So, the first step that we did was to re-pivot it to say that okay, let's embrace Hadoop. And what you mean by embracing is just not putting in the data lake, but we said that all our data will land into the data lake. And this journey started in 2015, so we have close to 80% of the Bank's data in the lake and it is end of day data right now and this data flows in on daily basis, and we have consumers who feed off that data. Now coming to your question about-- >> So the data lake's working? >> The data lake is working, up and running. >> People like it, you just got a good spot, batch 'em all you throw everything in the lake. >> So it is not real time, it is end of day. There is some data that is real-time, but the data lake is not entirely real-time, that I have to tell you. But one part is that the data lake is working. Second part to your question is how do I actually monetize it? Are you getting some value out of it? But I think that's where tools like Paxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part it's not about having the data. We want the business users to actually use the data. Typically, data has always been either delayed or denied in most of the cases to end-users and we have end-users waiting for the data but they don't get access to the data. It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how did tools like Paxata and the data lake help us? So what we did with data democratization is basically to say that "hey we'll get end-users to access the data first in a fast manner, in a self-service manner, and something that gives operational assurance to the data, so you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say "go figure it out." It's about ensuring that "okay, you've got the tools, you've got the data, but we'll also govern it," so that you obviously have control over what they're doing. >> So now you govern it, they don't have to get involved in the governance, they just have access? >> No they don't need to. Yeah, they have access. So governance works both ways. We establish the boundaries. Look at it as a referee, and then say that "okay, there are guidelines that you don't," and within the datasets that key people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific cases which actually helped us to move the needle. The first is on stress testing, so being a financial institution, we typically have to report various numbers to our regulators, etc. The turnaround time was extremely huge. These kind of stress testing typically involve taking huge amount-- >> What were some of the turnaround times? >> Normally it was two to three weeks, some cases a month-- >> Wow. >> So we were able to narrow it down to days, but what we essentially did was as with any stress testing or reporting, it involved taking huge amounts of data, crunching them and then running some models and then showing the output, basically a number of transformations involved. Earlier, you first couldn't access the entire dataset, so that we solved-- >> So check, that was a good step one-- >> That was step one. >> But was there automation involved in that, the Paxata piece? >> Yeah, I wouldn't say it was fully automated end-to-end, but there was definitely automation given the fact that now you got Paxata to work off the data rather than someone extracting the data and then going off and figuring what needs to be done. The ability to work off the entire dataset was a big plus. So stress testing, bringing down the cycle time. The second one use case I can talk about is again anti-money laundering, and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying "hey, this"-- >> How about know your user, because we have to anti-money launder, you need to have to know your user base, that's all set their too? >> Yeah. So the good part is know the user, know your customer, KYCs all that part is set, but the key part is making sure the end-users are able to access the data much more earlier in the life cycle and are able to play with it. In the case of anti-money laundering, again first question of three weeks to four weeks was shortened down to question of days by giving tools like Paxata again in a structured manner and with which we're able to govern. >> You control this, so you knew what you were doing, but you let their tools do the job? >> Correct, so look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is, you source the data which is IT-led, then you model the data which is IT-led, then you prepare then massage the data which is again IT-led and then you have tools on top of it which is again IT-led so the end-users get it only after the fourth stage. Now look at the generations within. All these life cycles apart from the fact that you source the data which is typically an IT issue, the rest need to be done by the actual business users and that's what we did. That's the progression of the generations in which we now we're in the third generation as I call it where our role is just to source the data and then say, "yeah we'll govern it in the matter and then preparation-- >> It's really an operating system and we were talking with Aaron with Elation's co-founder, we used the analogy of a car, how this show was like a car show engine show, what's in the engine and the technology and then it evolved every year, now it's like we're talking about the cars, now we're talking about driver experience-- >> That's right. >> At the end of the day, you just want to drive. You don't really care what's under the hood, you do but you don't, but there's those people who do care what's under the hood, so you can have best of both worlds. You've got the engines, you set up the infrastructure, but ultimately, you in the business side, you just want to drive, that's what's you're getting at? >> That's right. The time-to-market and speed to empower the users to play around with the data rather than IT trying to churn the data and confine access to data, that's a thing of the past. So we want more users to have faster access to data but at the same time govern it in a seamless manner. The word governance is still important because it's not about just give the data. >> And seamless is key. >> Seamless is key. >> Cause if you have democratization of data, you're implying that it is community-oriented, means that it's available, with access privileges all transparently or abstracted away from the users. >> Absolutely. >> So here's the question I want to ask you. There's been talk, I've been saying it for years going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here in this show, I heard things like intelligent information fabric that is business, consumer-friendly. Okay, it's a mouthful, but intelligent information fabric in essence talks about an abstraction layer-- >> That's right. >> That doesn't really compromise anything but gives some enablement, creates some enabling value-- >> That's right. >> For software, how do you see that? >> As the word suggests, the earlier model was trying to build something for the end-users, but not which was end-user friendly, meaning to say, let me just give you a simple example. You had a data model that existed. Historically the way that we have approached using data is to say "hey, I've got a model and then let's fit that data into this model," without actually saying that "does this model actually serve the purpose?" You abstracted the model to a higher level. The whole point about intelligent data is about saying that, I'll give you a very simple analogy. Take zip code. Zipcode in US is very different from zipcode in India, it's very different from zipcode in Singapore. So if I had the ability for my data to come in, to say that "I know it's a zipcode, but this zipcode belongs to US, this zipcode belongs to Singapore, and this zipcode belongs to India," and more importantly, if I can further rev it up a notch, if I say that "this belongs to India, and this zipcode is valid." Look at where I'm going with intelligent sense. So that's what's up. If you look at the earlier model, you have to say that "yeah, this is a placeholder for zipcode." Now that makes sense, but what are you doing with it? >> Being a relational database model, it's just a field in a schema, you're taking it and abstracting it and creating value out of it. >> Precisely. So what I'm actually doing is accelerating the adoption, I'm making it more simpler for users to understand what the data is. So I don't need to as a user figure out "I got a zipcode, now is it a Singapore, India or what zipcode." >> So all this automation, Paxata's got a good system, we'll come back to the Paxata question in a second, I do want to drill down on that. But the big thing that I've been seeing at the show, and again Dave Alonte, my partner, co-CEO of Silicon Angle, we always talk about this all the time. He's more less bullish on Hadoop than I am. Although I love Hadoop, I think it's great but it's not the end-all, be-all. It's a great use case. We were critical early on and the thing we were critical on it was it was too much time being spent on the engine and how things are built, not on the business value. So there's like a lull period in the business where it was just too costly-- >> That's right. >> Total cost of ownership was a huge, huge problem. >> That's right. >> So now today, how did you deal with that and are you measuring the TCO or total cost of ownership cause at the end of the day, time to value, which is can you be up and running in 90 days with value and can you continue to do that, and then what's the overall cost to get there. Thoughts? >> So look I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader, so TCO is obviously a driving factor. But TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of this system. So talking from an implementation perspective, what I look at as TCO is my whole ecosystem which is my hardware, software, so you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, etc? I don't want to get into that debate of cheaper or not but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, I'm talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper and given the fact that software is also becoming more open-sourced and people are open to using open-source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of our project. The way that we do is our implementation cost, at the time of writing out our PETs, we call it PETs, which is the Project Execution Document, we talk about cost. We say that "what's the implementation cost?" What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from few weeks to a few days, and that in turn means the number of people involved in this whole process, you're reducing the overheads and the operational folks involved in it. That itself tells you how much we're able to save. So definitely, TCO is there and to say that-- >> And you are mindful of, it's what you look at, it's key. TCO is on your radar 100% you evaluate that into your deals? >> Yes, we do. >> So Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, what's the deal. Was it the tech, was it the automation, the team? What was the key thing that got you engaged with them or specifically why Paxata? >> Look, I think the key to partnership there cannot be one ingredient that makes a partnership successful, I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. Given that we're a bank and we have multiple different systems and we have lot of manual processing involved, we saw Paxata as a good fit to govern these processes and ensure at the same time, users don't lose their experience. The good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one. Simplicity was a big point. The second one is about scale. The scale, the fact that it can take in millions of roles, it's not about just working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem, so it's not about saying "okay you give me this data, let me go figure out what to do and then," so Paxata works off the data lake. The fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so end-use people like you-- >> So it makes it usable. >> It's extremely user-friendly and usable in a very short period of time. >> And that helped with the journey? >> That really helped with the journey. >> Santosh, thanks so much for sharing. Santosh Mahendiran, who is the Global Tech Lead at the Analytics of the Bank at Standard Chartered Bank. Again, financial services, always a great early adopter, and you get success under your belt, congratulations. Data democratization is huge and again, it's an ecosystem, you got all that anti-money laundering to figure out, you got to get those reports out, lot of heavylifting? >> That's right, >> So thanks so much for sharing your story. >> Thank you very much. >> We'll give you more coverage after this short break, I'm John Furrier, stay tuned. More live coverage in New York City, its theCube.

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media here getting the data, checking out the scene, End of the day you got to implement. but at the end of the day it's got to have business value. how do you look at the business value? Where's the value in data democratization? So one of the challenges that we had was People like it, you just got a good spot, in most of the cases to end-users and we have end-users guidelines that you don't," and within the datasets that Earlier, you first couldn't access the entire dataset, So stress testing, bringing down the cycle time. So the good part is know the user, know your customer, That's the progression of the generations in which we At the end of the day, you just want to drive. but at the same time govern it in a seamless manner. Cause if you have democratization of data, So here's the question I want to ask you. So if I had the ability for my data to come in, and creating value out of it. So I don't need to as a user figure out "I got a zipcode, But the big thing that I've been seeing at the show, at the end of the day, time to value, which is can you be So the reality is all components in the whole ecosystem And you are mindful of, it's what you look at, it's key. Was it the tech, was it the automation, the team? The fact that it can leverage the lake that we built, It's extremely user-friendly and usable in a very at the Analytics of the Bank at Standard Chartered Bank. We'll give you more coverage after this short break,

ENTITIES

Entity	Category	Confidence
Dave Alonte	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
2012	DATE	0.99+
2015	DATE	0.99+
Santosh Mahendiran	PERSON	0.99+
two	QUANTITY	0.99+
Aaron	PERSON	0.99+
US	LOCATION	0.99+
Santhosh Mahendiran	PERSON	0.99+
Singapore	LOCATION	0.99+
Santosh	PERSON	0.99+
four weeks	QUANTITY	0.99+
TCO	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
90 days	QUANTITY	0.99+
India	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
fifth year	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
one ingredient	QUANTITY	0.99+
third	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
millions	QUANTITY	0.99+
first	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
Silicon Angle	ORGANIZATION	0.99+
Second part	QUANTITY	0.98+
third generation	QUANTITY	0.98+
fourth stage	QUANTITY	0.98+
two specific cases	QUANTITY	0.98+
both ways	QUANTITY	0.98+
one	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
both worlds	QUANTITY	0.98+
first step	QUANTITY	0.97+
three years back	DATE	0.97+
second one	QUANTITY	0.97+
One	QUANTITY	0.97+
2017	DATE	0.96+
Hadoop	TITLE	0.96+
Strata Data	ORGANIZATION	0.96+
Strata Hadoop	ORGANIZATION	0.94+
step one	QUANTITY	0.94+
first question	QUANTITY	0.93+
a month	QUANTITY	0.92+
Elation	ORGANIZATION	0.9+
Data	EVENT	0.89+
2017	EVENT	0.89+
80%	QUANTITY	0.88+
Paxata	TITLE	0.88+
Big Data	EVENT	0.84+
theCube	ORGANIZATION	0.83+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Gus Horn, NetApp | Big Data NYC 2017

>> Narrator: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello everyone. Welcome back to our CUBE coverage here in New York City, live in Manhattan for theCUBE's coverage of Big Data NYC, our event we've had five years in a row. Eight years covering Big Data, Hadoop World originally in 2010, then it moved to Hadoop Strata Conference, Strata Hadoop, now called Strata Data. In conjunction with that event we have our Big Data NYC event. SiliconANGLE Media's CUBE. I'm John Furrier, your cohost, with Jim Kobielus, analyst at wikibon.com for Big Data. Our next guest is Gus Horn who is the global Big Data analytics and CTO ambassador for NetApp, machine learning, AI, guru, gives talks all around the world. Great to have you, thanks for coming in and spending the time with us. >> Thanks, John, appreciate it. >> So we were talking before the camera came on, you're doing a lot of jet setting really around Evangelize But also educating a lot of folks on the impact of machine learning and AI in particular. Obviously AI we love, we love the hype. It motivates young kids getting into software development, computer science, makes it kind of real for them. But still, a lot more ways to go in terms of what AI really is. And that's good, but what is really going on with AI? Machine learning is where the rubber hits the road. That seems to be the hot air, that's your wheelhouse. Give us the update, where is AI now? Obviously machine learning is super important, it's one of the hot topics here in New York City. >> Well, I think it's super important globally, and it's going to be disruptive. So before we were talking, I said how this is going to be a disruptive technology for all of society. But regardless of that, what machine learning is bringing is a methodology to deal with this influx of IOT data, whether it's autonomous vehicles, active safety in cars, or even looking at predictive analytics for complex manufacturing processes like an automotive assembly line. Can I predict when a welding machine is going to break and can I take care of it during a scheduled maintenance cycle so I don't take the whole line down? Because the impacts are really cascading and dramatic when you have a failure that you couldn't predict. And what we're finding is that Hadoop and the Big Data space is uniquely positioned to help solve these problems, both from quality control and process management and how you can get better uptime, better quality, and then we take it full circle and how can I build an environment to help automotive manufacturers to do test and DEV and retest and retraining and learning of the AI modules and the AI engines that have to exist in these autonomous vehicles. And the only way you can do that is with data, and managing data like a data steward, which is what we do at NetApp. So for us, it's not just about the solution, but the underlying architecture is going to be absolutely critical in setting up the agility you'll need in this environment, and the flexibility you need. Because the other thing that's happening in the space right now is that technology's evolving very quickly. You see this with the DGX from NVIDIA, you see P100 cards from NVIDIA. So I have an architecture that we have in Germany right now where we have multiple NVIDIA cards in our Hadoop cluster that we've architected. But I don't make NVIDIA cards. I don't make servers. I make really good storage. And I have an ecosystem that helps manage where that data is when it needs to be there, and especially when it doesn't need to there so we can get new data. >> Yeah, Gus, we were talking also before camera, the folks watching that you were involved with AI going way back to in your days at MIT, and that's super important. Because a lot of people, the pattern that we're seeing across all the events that we go to, and we'll be at the NetApp event next week, Insight, in Vegas, but the pattern is pretty clear. You have one camp, oh, AI is just the same thing that was going on in the late '70s, '80s, and '90s, but it now has a new dynamic with the cloud. So a lot of people are saying okay, there's been some concepts that have been developed in AI, in computer science, but now with the evolution of hyperconvergence infrastructure, with cloud computing, with now a new architecture, it seems to be turbocharging and accelerating. So I'd like to get your thoughts on why is it so hot now? Obviously machine learning, everyone should be on that, no doubt, but you got the dynamic of the cloud. And NetApp's in the storage business, so that's stores data, I get that. What's the dynamic with the cloud? Because that seems to be the accelerant right now with open source and in with AI. >> Yeah, I think you got to stay focused. The cloud is going to be playing an integral role in everything. And what we do at NetApp as a data steward, and what George Kurian said, our CEO, that data is the currency of today actually, right? It's really fundamentally what drives business value, it's the data. But there's one little slight attribute change that I'd like to add to that, and that it's a perishable commodity. It has a certain value at T-sub zero when you first get it. And especially true when you're trying to do machine learning and you're trying to learn new events and new things, but it rapidly degrades and becomes less valuable. You still need to keep it because it's historical and if we forget historical data, we're doomed to repeat mistakes. So you need to keep it and you have to be a good steward. And that's where we come into play with our technologies. Because we have a portfolio of different kinds of products and management capabilities that move the data where it needs to be, whether you're in the cloud, whether you're near the cloud, like in an Equinox colo, or even on prem. And the key attribute there, and especially in automotive they want to keep the data forever because of liability, because of intellectual property and privacy concerns. >> Hold on, one quick question on that. 'Cause I think you bring up a good point. The perishability's interesting because realtime, we see this now, bashing in realtime is the buzzword in the industry, but you're talking about something that's really important. That the value of the data when you get it fast, in context, is super important. But then the historical piece where you store it also plays into the machine learning dynamics of how deep learning and machine learning has to use the historical perspective. So in a way, it's perishable in the realtime piece in the moment. If you're a self-driving car you want the data in milliseconds 'cause it's important, but then again, the historical data will then come back. Is that kind of where you're getting at with that? >> Yeah, because the way that these systems operate is the paradigm is like deep learning. You want them to learn the way a human learns, right? The only reason we walk on our feet is 'cause we fell down a lot. But we remember falling down, we remember how we got up and could walk. So if you don't have the historical context, you're just always falling down, right? So you have to have that to build up the proper machine learning neural network, the kind of connections you need to do the right things. And then as you get new data and varieties of data, and I'll stick with automotive, because it can almost be thought of as an intractable amount of data. Because most people will keep cars for measured in decades. The quality of the car is incredible now, and they're all just loaded with sensors, right? High definition cameras, radars, GPS tracking. And you want to make sure you get improvements there because you have liability issues coming as well with these same technologies, so. >> Yeah, so we talk about the perishability of the data, that's a given. What is less perishable, it seems to me and Wikibon, is that what you derive from the data, the correlations, the patterns, the predictive models, the meat of machine learning and deep learning, AI in general, is less perishable in the sense that it has a validity over time. What are your thoughts at NetApp about how those data derived assets should be stored, should be managed for backup and recovery and protected? To what extent do those requirements need to be reflected in your storage retention policies if you're an enterprise doing this? >> That's a great question. So I think what we find is that that first landing zone, and everybody talks about that being the cloud. And for me it's a cloudy day, although in New York today it's not. There are lots of clouds and there are lots of other things that come with that data like GDPR and privacy, and what are you allowed to store, what are you allowed to keep? And how do you distinguish one from the other? That's one part. But then you're going to have to ETL it, you're going to have to transform that data. Because like everything, there's a lot of noise. And the noise is really fundamentally not that important. It's those anomalies within the stream of noise that you need to capture. And then use that as your training data, right? So that you learn from it. So there's a lot of processing, I think, that's going to have to happen in the cloud regardless of what cloud, and it has to be kind of ubiquitous in every cloud. And then from there you decide, how am I going to curate the data and move it? And then how am I going to monetize the data? Because that's another part of the equation, and what can I monetize? >> Well that's a question that we hear a lot on theCUBE. On day one we were ripping at some of the concepts that we see, and certainly we talk to enterprise customers. Whether it's a CIO, CVO, chief data officer, chief security officer. There's a huge application development going on in the enterprise right now. You see the opensource booming. This huge security practice is being built up and then it's got this governance with the data. Overlay that with IOT, it's kind of an architectural, I don't want to say reset, but a retrenching for a lot of enterprises. So the question I have for you guys as a critical part of the infrastructure of storage, storage isn't going away, there's no doubt about that, but now the architecture's changing. How are you guys advising your customers? What's your position on when you come into CXO and you give a talk and I said, hey, Gus, the house is on fire, we got so much going on. Bottom line me, what's the architecture? What's best for me, but don't lose the headroom. I need to have some headroom to grow, that's where I see some machine learning, what do I do? >> I think you have to embrace the cloud, and that's one of the key attributes that NetApp brings to the table. We have our core software, our ONTAP software, is in the cloud now. And for us, we want to make sure we make it very easy for our customers to both be in the cloud, be very protected in the cloud with encryption and protection of the data, and also get the scale and all of the benefits of the cloud. But on top of that, we want to make it easy for them to move it wherever they want it to be as well. So for us it's all about the data mobility and the fact that we want to become that data steward, that data engine that helps them drive to where they get the best business value. >> So it's going to be on prem, on cloud. 'Cause I know just for the record, you guys if not the earliest, one of the earliest in with AWS, when it wasn't fashionable. I interviewed you guys on that many years ago. >> And let me ask a related question. What is NetApp's position, or your personal thinking, on what data should be persisted closer to the edge in the new generation of IOT devices? So IOT, edge devices, they do inference, they do actuation and sensing, but they also do persistence. Now should any data be persisted there longterm as part of your overall storage strategy, if you're an enterprise? >> It could be. The question is durability, and what's the impact if for some reason that edge was damaged, destroyed or the data lost. So a lot of times when we start talking about opensource, one of the key attributes we always have to take into account is data durability. And traditionally it's been done through replication. To me that's a very inefficient way to do it, but you have to protect the data. Because it's like if you've got 20 bucks in your wallet, you don't want to lose it, right? You might split it into two 10s, but you still have 20, right? You want that durability and if it has that intrinsic value, you've got to take care of it and be a good steward. So if it's in the edge, it doesn't mean that's the only place it's going to be. It might be in the edge because you need it there. Maybe you need what I call reflexive actions. This is like when a car is well, you have deep learning and machine learning and vision and GPS tracking and all these things there, and how it can stay in the lane and drive, but the sensors themself that are coming from Delphi and Bosch and ZF and all of these companies, they also have to have this capability of being what I call a reflex, right? The reason we can blink and not get a stone in our eye is not because it went to our cerebral cortex. Because it went to the nerve stem and it triggered the blink. >> Yeah, it's cache. And you have to do the same thing in a lot of these environments. So autonomous vehicles is one. It could be using facial recognition for restricting access to a gate. And all the sudden this guy's on a blacklist, and you've stopped the gate. >> Before we get into some of the product questions I have for you, Hadoop in-place analytics, as well as some of the regulations around GDPR, to end the trend segment here is what's your thoughts on decentralization? You see a lot of decentralized apps coming out, you see blockchain getting a lot of traction. Obviously that's a tell sign, certainly in the headroom category of what may be coming down. Not really on the agenda for most enterprises today, but it does kind of indicate that the wave is coming for a lot more decentralization on top of distributed computing and storage. So how do you look at that, as someone who's out on the cutting edge? >> For me it's just yet another industry trend where you have to embrace it. I'm constantly astonished at the people who are trying to push back from things that are coming. To think that they're going to stop the train that's going to run 'em over. And the key is how can we make even those trends better, more reliable, and do the right thing for them? Because if we're the trusted advisor for our customers, regardless of whether or not I'm going to sell a lot of storage to them, I'm going to be the person they're going to trust to give 'em good advice as things change, 'cause that's the one thing that's absolutely coming is change. And oftentimes when you lock yourself into these quote, commodity approaches with a lot of internal storage and a lot of these things, the counterpart to that is that you've also locked yourself in probably for two to four years now, in a technology that you can't be agile with. And this is one of the key attributes for the in-place analytics that we do with our ONTAP product and we also have our E series product that's been around for six plus years in the space, is the defacto performance leader in the space, even. And by decoupling that storage, in some cases very little but it's still connected to the data node, and in other cases where it's shared like an NFS share, that decoupling has enormous benefits from an agility perspective. And that's the key. >> That kind of ties up with the blockchain thing as kind of a tell sign, but you mentioned the in-place analytics. That decoupling gives you a lot more cohesiveness, if you will, in each area. But tying 'em together's critical. How do you guys do that? What's the key feature? Because that's compelling for someone, they want agility. Certainly DevOps' infrastructure code, that's going mainstream, you're seeing that now. That's clearly cloud operation, whatever you want to call it, on prem, off prem. Cloud ops is here. This is a key part of it, what's the unique features of why that works so well? >> Well, some of the unique features we have, so if we look at your portfolio products, so I'll stick with the ONTAP product. One of the key things we have there is the ability to have incredible speed with our AFF product, but we can also Dedoop it, we can clone it, and snapshot it, snapshotting it into, for example, NPS or NetApp Private Storage, which is in Equinox. And now all the sudden I can now choose to go to Amazon, or I can go to Azure, I can go to Google, I can go to SoftLayer. It gives me options as a customer to use whoever has got the best computational engine. Versus I'm stuck there. I can now do what's right for my business. And I also have a DR strategy that's quite elegant. But there's one really unique attribute too, and that's the cloning. So a lot of my big customers have 1000 plus node traditional Hadoop clusters, but it's nearly impossible for them to set up a test DEV environment with production data without having an enormous cost. But if I put it in my ONTAP, I can clone that. I can make hundreds of clones very efficiently. >> That gets the cost of ownership down, but more importantly gets the speed to getting Sandboxes up and running. >> And the Sandboxes are using true production data so that you don't have to worry about oh, I didn't have it in my test set, and now I have a bug. >> A lot of guys are losing budget because they just can't prove it and they can't get it working, it's too clunky. All right, cool, I want to get one more thing in before we run out of time. The role of machine learning we talked about, that's super important. Algorithms are going to be here, it's going to be a big part of it, but as you look at that policy, where the foundational policy governance thing is huge. So you're seeing GDPR, I want to get your comments on the impact of GDPR. But in addition to GDPR, there's going to be another Equifax coming, they're out there, right? It's inevitable. So as someone who's got code out there, writing algorithms, using machine learning, I don't want to rewrite my code based upon some new policy that might come in tomorrow. So GDPR is one we're seeing that you guys are heavily involved in. But there might be another policy I might want to change, but I don't want to rewrite my software. How should a CXO think about that dynamic? Not rewriting code if a new governance policy comes in, and then the GDPR's obvious. >> I don't think you can be so rigid to say that you don't want to rewrite code, but you want to build on what you have. So how can I expand what I already have as a product, let's say, to accommodate these changes? Because again, it's one of those trains. You're not going to stop it. So GDPR, again, it's one of these disruptive regulations that's coming out of EMEA. But what we forget is that it has far reaching implications even in the United States. Because of their ability to reach into basically the company's pocket and fine them for violations. >> So what's the impact of the Big Data system on GDPR? >> It can potentially be huge. The key attribute there is you have to start when you're building your data lakes, when you're building these things, you always have to make sure that you're taking into account anonymizing personal identifying information or obfuscating it in some way, but it's like with everything, you're only as strong as your weakest link. And this is again where NetApp plays a really powerful role because in our storage products, we actually can encrypt the data at rest, at wire speed. So it's part of that chain. So you have to make sure that all of the parts are doing that because if you have data at rest in a drive, let's say, that's inside your server, it doesn't take a lot to beat the heck out of it and find the data that's in there if it's not encrypted. >> Let me ask you a quick question before we wrap up. So how does NetApp incorporate ML or AI into these kinds of protections that you offer to customers? >> Well for us it's, again, we're only as successful as our customers are, and what NetApp does as a company, we'll just call us the data stewards, that's part of the puzzle, but we have to build a team to be successful. So when I travel around the world, the only reason a customer is successful is because they did it with a team. Nobody does it on an island, nobody does it by themself, although a lot of times they think they can. So it's not just us, it's our server vendors that work with us, it's the other layers that go on top of it, companies like Zaloni or BlueData and BlueTalon, people we've partnered with that are providing solutions to help drive this for our customers. >> Gus, great to have you on theCUBE. Looking forward to next week. I know you're super busy at NetApp InSight. I know you got like five major talks you're doing but if we can get some time I think you'd be great. My final question, a personal one. We were talking that you're a search and rescue in Tahoe in case there's an avalanche, a lost skier. A lot of enterprises feel lost right now. So you kind of come in a lot and the avalanche is coming, the waves or whatever are coming, so you probably seen situations. You don't need to name names, but talk about what should someone do if they're lost? You come in, you can do a lot of consulting. What's the best advice you could give someone? A lot of CXOs and CEOs, their heads are spinning right now. There's so much on the table, so much to do, they got to prioritize. >> It's a great question. And here's the one thing is don't try to boil the ocean. You got to be hyper-focused. If you're not seeing a return on investment within 90 days of setting up your data lake, something's going wrong. Either the scope of what you're trying to do is too large, or you haven't identified the use case that will give you an immediate ROI. There should be no hesitation to going down this path, but you got to do it in a manner where you're tackling the biggest problems that have the best hit value for you. Whether it's ETLing goes into your plan of record systems, your enterprise data warehouses, you got to get started, but you want to make sure you have measurable, tangible success within 90 days. And if you don't, you have to reset and say okay, why is that not happening? Am I reinventing the wheel because my consultant said I have to write all this SCOOP and Flume code and get the data in? Or maybe I should have chosen another company to be a partner that's done this 1000 times. And it's not a science experiment. We got to move away from science experiment to solving business problems. >> Well science experiments and boiling of the ocean is don't try to overreach, build a foundational building block. >> The successful guys are the ones who are very disciplined and they want to see results. >> Some call it baby steps, some call it building blocks, but ultimately the foundation right now is critical. >> Gus: Yeah. >> All right, Gus, thanks for coming on theCUBE. Great day, great to chat with you. Great conversation about machine learning impact to organizations. theCUBE bringing you the data here live in Manhattan. I'm John Furrier, Jim Kobielus with Wikibon. More after this short break. We'll be right back. (digital music) (synthesizer music)

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media and spending the time with us. But also educating a lot of folks on the impact And the only way you can do that is with data, the folks watching that you were involved with AI and management capabilities that move the data That the value of the data when you get it fast, the kind of connections you need to do the right things. is that what you derive from the data, and everybody talks about that being the cloud. So the question I have for you guys and the fact that we want to become that data steward, one of the earliest in with AWS, when it wasn't fashionable. in the new generation of IOT devices? it doesn't mean that's the only place it's going to be. And you have to do the same thing but it does kind of indicate that the wave is coming And the key is how can we make even those trends better, What's the key feature? And now all the sudden I can now choose to go to Amazon, but more importantly gets the speed so that you don't have to worry about oh, But in addition to GDPR, there's going to be another Equifax to say that you don't want to rewrite code, and find the data that's in there if it's not encrypted. into these kinds of protections that you offer to customers? that's part of the puzzle, but we have to build a team What's the best advice you could give someone? Either the scope of what you're trying to do Well science experiments and boiling of the ocean The successful guys are the ones who are very disciplined but ultimately the foundation right now is critical. Great day, great to chat with you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John	PERSON	0.99+
Gus Horn	PERSON	0.99+
BlueTalon	ORGANIZATION	0.99+
George Kurian	PERSON	0.99+
BlueData	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Germany	LOCATION	0.99+
two	QUANTITY	0.99+
Zaloni	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
Tahoe	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
1000 times	QUANTITY	0.99+
New York City	LOCATION	0.99+
20 bucks	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Delphi	ORGANIZATION	0.99+
Vegas	LOCATION	0.99+
20	QUANTITY	0.99+
New York	LOCATION	0.99+
Gus	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
2010	DATE	0.99+
Amazon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
United States	LOCATION	0.99+
ZF	ORGANIZATION	0.99+
90 days	QUANTITY	0.99+
GDPR	TITLE	0.99+
next week	DATE	0.99+
today	DATE	0.99+
NetApp	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
four years	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
tomorrow	DATE	0.98+
hundreds of clones	QUANTITY	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
NYC	LOCATION	0.98+
One	QUANTITY	0.98+
one part	QUANTITY	0.97+
Big Data	EVENT	0.97+
Wikibon	ORGANIZATION	0.97+
one camp	QUANTITY	0.96+
NetApp	TITLE	0.96+
Strata Data	EVENT	0.96+
NetApp	EVENT	0.96+
late '70s	DATE	0.96+
six plus years	QUANTITY	0.95+
Midtown Manhattan	LOCATION	0.95+
Hadoop Strata Conference	EVENT	0.95+
Equinox	ORGANIZATION	0.95+
one thing	QUANTITY	0.94+
Strata Hadoop	EVENT	0.94+
one more thing	QUANTITY	0.94+
one quick question	QUANTITY	0.93+
1000 plus	QUANTITY	0.92+
DGX	COMMERCIAL_ITEM	0.91+

Sergei Rabotai, InData Labs | Big Data NYC 2017

>> Live from Midtown Manhattan, it's the CUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Fifth year of coverage of our own event Big Data NYC where we cover all the action in New York City. For this week in big data, in conjunction with Strata Data which was originally Hadoop World in 2010. We've been covering it for eight years. It became Strata Conference, Strata Hadoop, now called Strata Data. Will probably called Strata AI tomorrow. Who knows, but certainly the trends are going in that direction. I'm John Furrier, your co-host. Our next guest here in New York City is Sergei Rabotai, who is the Head of Business Development at InData Labs from Belarus. In town, doing some biz dev in the big data ecosystem. Welcome to theCUBE. >> Yeah. Good morning. >> Great to have you. So, obviously Belarus is becoming known as the Silicon Valley of Eastern Europe. A lot of great talent. We're seeing that really explode. A lot of great stuff going on globally, even though there's a lot of stuff, you know GDPR and all these other things happening. It's clearly a global economy with tech. Silicon Valley still is magical. I live there in Palo Alto but you're starting to see peering points within these ecosystems of entrepreneurship and now big companies are taking advantage of it as well. What do you guys do? I mean you're in the middle of that. What is InData Labs do in context of all this? >> Well, InData Labs is a full stack data science company. Which means that we provide professional services for data strategy, big data engineering and the data science. So, yeah, like you just said, we are based - my team is based in Minsk, Belarus. We are about 40 people strong at the moment. And in our recent years we have been very successful starting this business and we have been getting customers from all over the world, including United States, Great Britain, and European Union. The company was launched about four years ago and very important thing, that it was launched by two tech leaders who come from very data-driven industries. Our CEO, Ilya Kirillov, has been running several EdTech companies for many years. Our second founder, Marat Karpeko, has been holding C-Level positions in one of the most successful gaming companies in the world. >> John: So they know data. They're data guys. >> Yeah they're data guys. They know data from different aspects and that brings synergy to our business. >> You guys bring that expertise now into professional services for us. Give me an example of some of the things someone might want to call you up on, because the thing we're hearing here in New York City this week is look, we need more data sciences and they got to be more productive. They're spending way too much time wrangling and doing stuff that they shouldn't be doing. In the old days, sysadmins were built to let people be productive and they ran the infrastructure. That's not what data scientists should be doing. They're the users. There's a level of setting things up and then there's a level of provisioning, it's actually data assets, but then the data scientists just want to do their job. How do you help companies do that? >> Well I would probably, if I take all of our activities, I would split them into two big parts. First of all, we are helping big companies, who already have a lot of data. We help them in managing this data more effectively. We help them with predictive analytics. We help them with, helping them build the churn prediction and user segmentation solutions. We have been recently involved into several natural language processing projects. In one of our successful key studies we helped one of the largest gaming companies to automate their customer feedback processing. So, like, a couple years ago they were working manually with their customer feedback and we built them a tool that allows them to instantly get the sentiment of what the user says. It's kind of like a voice of a customer, which means they can be more effective in developing new things for their games. So, we-- >> So what would someone engage? I'm just trying to peg a order of magnitude of the levels of engagements you do. Startups come in? Is it big companies? What kind of size scoped work do you do? >> So I would say at the moment we work with startups, but it's a bit of a different approach than we have with big or well-established companies. When startups typically approach us with asking to help them implement some brand new technologies like neural networks or deep learning. So they want to be effective from the start. They want to use the cutting edge technology to be more attractive, to provide a better value on the market and just to be effective and to be a successful business from the start. The other part, the well-established companies, who already have the data but they understand that so far their data might not be used that effectively as it should have been used. Therefore, they approach us with a request to help them to get more insights out of the data. Let's say, implement some machine learning that can help them. >> How about larger companies? What kind of projects do you work for them? >> It could be a typical project like churn prediction, that is very actual for the companies who have got a lot of customer data. Then it could be companies from such industries like betting industry, where churn is a very big issue. And, the same probably applies to companies who do trading. >> So is scale one of the things you differentiate around? It sounds like your founders have an EdTech background obviously must be a larger, large data set. Is your profile of engagements large scale? Is it ... I'm just trying to get a handle of if someone's watching who, what is the kind of engagements people should be calling you for? Give us an example of that. >> Like, let's say there is a company who has got a lot of customer data, has got some products and they have a problem of churn, or they have a problem of segmenting their customers so they can later address the specific segments of the customers with the right offers at the right time and through the right marketing channel. Then it could be customers or requests where natural text processing is required where we have to automate some understanding of the written or spoken text. Then I should say that we have been getting recently some requests where computer vision skills are required. I think the first stage of AI being really intelligent was the speech recognition and I think nowadays we manage to reach to the level of what we earlier saw in fantastic movies or sci-fi movies. Computer vision is going to be the next leap in all that AI buzz we're having at the moment. >> So you solve, the problem that you solve for customers is data problems. If they're swimming in a lot of data, you can help them. >> Sergei: Yep. >> If they actually want to make that data do things that are cutting edge, you guys can help them. >> Sergei: Yeah. That's-- >> Alright, so here's a question for you. I mean, Belarus has obviously got good things going on. I've heard the press that you guys have been getting, the whole area, and you guys in particular. So I'm a buyer, one of the questions I might ask is "Hey Sergei, how do I know that you'll keep that talent because the churn is always a big problem. I've dealt with outsourcing before and in the US it's hard to keep talent but I've heard there's a churn." How do you guys keep the talent in the country? How do you keep talent on the projects? Is there certain economic rules over there? What's happening in Belarus? Give us the economical. >> Yeah, so, basically what you're saying. The churn problem has always been known for companies who have their development teams in Asian regions. That's a known problem because I have a lot of meetings with clients in the UK and the US, potential prospects, I would say. So they say it is a problem for them. With Belarus, I don't think we have that because from what I know, we have an average churn of under 10 percent. That's the figures across the industry. In smaller companies, the churn is even less and there are specific reasons for that. First of all, that due to Belarusian mentality, we always try to keep to a job that we're having. Yeah? So we do not-- >> John: That's a cultural thing. >> That's just the cultural thing. We do not ... >> You honor, you honor a code, if you will. >> Yeah. >> Okay. >> So, that's one of the things. Another thing is that Belarusian IT industry is very small. We have, I would say, no more than 40 thousand people being involved in different IT companies. The community is very small, so if somebody is hopping jobs from one job to another, it is going to be known and this person is not likely to have like, a good career. >> So job hoppers is kind of like a code of community, honor. Silicon Valley works that way too, by the way. >> Yeah. >> You get identified, that's who you are. >> Yeah. And so nowadays-- >> Economic tax breaks going on over there? What's the government to get involved? >> One of the key things is, the special tax and legal regulations that Belarus has got at the moment. I can definitely say that there is no country in the world that has got the same tax preferences, and the same support from the government. If a Belarusian company, IT company, becomes a part of Belarusian High Tech Park it means the company becomes automatically exempt from BET tax, corporate income tax. The employees of that company having the reliefs on their income, personal income tax rate, and there are a lot more reliefs that make the talent stay in the country. Having this relief for the IT business allows the companies to provide better working conditions for the employees and stop the people from migrating to other parts of the world. That's what we have. >> Sort of created an environment where there's not a lot of migration out of the area. The tech community kind of does it's own policing of behavior for innovation. >> Yeah but I think before those initiatives were adopted there was a certain percentage of people migrating but I think that nowadays even if it happens, yes, you're right, it's not that substantial. >> Great. Tell us ... Great overview of the company and congratulations, it's a good opportunity for folks watching to explore new areas of talent, especially ones that have the work ethic and knowledge you guys have over there. New York here, there's codes here too. Get the job done. Be on time. What's your experience like in New York here? What's your goal this week? What's some of the meetings you're having? Share with the folks kind of your game plan for Big Data NYC. >> Well, yeah, I've really enjoyed my stay here. It, so far, has been a very enjoyable experience. From the business perspective, I had over 10 meetings with the prospective customers. And we are likely to have follow-ups coming in the next couple of weeks. I can definitely say there is a great demand for professional services. You can see that if you go to whichever center you can see there's a lot of jobs being posted on the job boards. It means that there is lack of knowledge here in the US, yeah? One more important thing that I wanted to share with you from my personal observations that USA, UK and maybe Nordic countries, they have very, very strong background for creating the business ideas but Eastern Europe or Eastern European countries and Belarus in particular, they are very strong in actually implementing those ideas. >> Building them. >> Yes, building them. I think we have lots of synergies and we can ... we can ... >> John: Great. >> We can work together. I also got some meetings with our existing customers here in the US and so far we had good experiences. I can see that New York is moving fast. I travel a lot. I've been to over 40 countries in the previous five years and I just ... New York is different. >> It's fun. >> Different. Even different from many other cities in the US. >> Lot of banks are here. Lot of business in New York. New York is a great town. Love New York City. It's one of my favorites. Love coming here as I grew up right across the river in New Jersey. >> Yeah. But, great town, obviously California, Palo Alto, >> Yeah. >> Is a little more softer in terms of weather, but they have a culture there too. Sounds a lot like what's going on in Belarus, so congratulations. If we get some business for you, should we give them theCUBE discount, tell them John sent you and you get 10 percent off? Alright? >> Alright, yes. Sounds great. We can make it a good deal. (laughter) >> Tell them John sent you, you get 10% off. No I'm only kidding because it's services. Congratulations. Final question. What's the number one thing that people are buying for service from you guys? Number one thing. What's the most requested service you provide? >> The most requested services ... First of all, many customers they understand that they have got a lot of data. They want to do something with their data. But before you actually do some implementation you have to do a lot of discovery or preparatory work. I would say, no matter how we end up with a customer, this stage is basically ... The idea of that stage is to identify the ways data science can be implemented and can provide benefits to the business. That's the most important. I think that, like, 95 percent of the customers they approach us with this thing in the first place. And based on the results of that preparatory stage we can then advise the customers. What can they do? Or how they can actually benefit from the existing data? Or what other things they should collect in order to make their business more effective. >> Sergei, thanks for coming on. Belarus has got a lot of builders there. Check 'em out. >> Thanks a lot. >> Builders are critical in this new world. Lots of them with clout, a lot of great opportunities. A lot of builders in Belarus. This is theCUBE, bringing you all the action from New York City. More after this short break. We'll be right back. (theme music) (no audio) >> Hi, I'm John Furrier, the co-founder of SiliconANGLE Media and co-host of theCUBE. I've been in the tech ...

Published Date : Sep 28 2017

SUMMARY :

Live from Midtown Manhattan, it's the CUBE. in the big data ecosystem. a lot of stuff, you know GDPR and all gaming companies in the world. John: So they know data. different aspects and that brings synergy to our business. Give me an example of some of the things one of the largest gaming companies to automate What kind of size scoped work do you do? on the market and just to be effective and to be And, the same probably applies to companies who do trading. So is scale one of the things you differentiate around? can later address the specific segments of the in a lot of data, you can help them. do things that are cutting edge, you guys can help them. the whole area, and you guys in particular. First of all, that due to Belarusian mentality, That's just the cultural thing. So, that's one of the things. by the way. The employees of that company having the reliefs Sort of created an environment where adopted there was a certain percentage of people especially ones that have the work ethic in the next couple of weeks. I think we have lots of synergies here in the US and so far we had good experiences. in the US. Lot of business in New York. Yeah. and you get 10 percent off? We can make it a good deal. What's the most requested service you provide? The idea of that stage is to identify the ways a lot of builders there. Lots of them with clout, a lot of great opportunities. I've been in the tech ...

ENTITIES

Entity	Category	Confidence
Sergei Rabotai	PERSON	0.99+
Ilya Kirillov	PERSON	0.99+
John	PERSON	0.99+
Belarus	LOCATION	0.99+
Marat Karpeko	PERSON	0.99+
Sergei	PERSON	0.99+
InData Labs	ORGANIZATION	0.99+
US	LOCATION	0.99+
New York City	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
10 percent	QUANTITY	0.99+
UK	LOCATION	0.99+
New York	LOCATION	0.99+
New Jersey	LOCATION	0.99+
Fifth year	QUANTITY	0.99+
eight years	QUANTITY	0.99+
95 percent	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
10%	QUANTITY	0.99+
USA	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two tech leaders	QUANTITY	0.99+
Great Britain	LOCATION	0.99+
NYC	LOCATION	0.99+
United States	LOCATION	0.99+
2010	DATE	0.99+
under 10 percent	QUANTITY	0.98+
this week	DATE	0.98+
Eastern Europe	LOCATION	0.98+
two big parts	QUANTITY	0.98+
first stage	QUANTITY	0.98+
over 40 countries	QUANTITY	0.98+
over 10 meetings	QUANTITY	0.98+
First	QUANTITY	0.98+
One	QUANTITY	0.97+
first	QUANTITY	0.97+
second founder	QUANTITY	0.97+
European Union	LOCATION	0.97+
one job	QUANTITY	0.97+
tomorrow	DATE	0.97+
Asian	LOCATION	0.96+
Strata Conference	EVENT	0.96+
about 40 people	QUANTITY	0.95+
Big Data	EVENT	0.95+
Nordic	LOCATION	0.93+
EdTech	ORGANIZATION	0.93+
Strata Data	TITLE	0.91+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Chuck Yarbough, Pentaho | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017 brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hey, welcome back everyone live here in New York City it's theCUBE's special presentation Big Data NYC. This is our fifth year doing our own event here in New York City, our eighth year covering the Hadoop World ecosystem from the beginning. Through eight years, it's had a lot evolutions, Hadoop World, Strata Conference, Strata Hadoop, now it's called Strata Data happening right around the corner. We run our own event here, talk about thought leaders and the expert CEO's, entrepreneurs. Getting the data for you, sharing that with you. I'm John Furrier co-host theCUBE with my co-host here Jim Kobielus who's the Lead Analyst at Wikibon Big Data. And Chuck Yarbough who's the Vice President at Pentaho Solutions part of Hitachi's new Vantara. A new company created just announced last week. Hitachi in a variety of their portfolio technologies into a new company, out to bring in a lot of those integrated solutions. Chuck great to see you again, theCUBE alumni. We chatted multiple times at Pentaho World, going back 2015. >> Always he always great to be at theCUBE. >> What a couple of years it's been. Give us quickly hard news, it's pretty awesome you guys have a variety of things at Pentaho you know with Hitachi, that happened, now the market's evolved, what's this new entity, this new company they're bringing together? >> Yes, so the big news Hitachi Vantara. So what that is, two years ago Hitachi Data Systems acquired Pentaho and so fast forward two years. A new company gets created from Hitachi Data Systems. Pentaho, in a third organization at Hitachi called the Insight Group so Hitachi Insight Group. Those three groups come together to form Hitachi Vantara >> What's the motivation behind that. I mean, I go connect the dots but I want to hear your perspective because it really is about pulling things together. The trend this year the show is as Jim calls it, hybrid data, integrated data. Things seem to be coming together, is that part the purpose? What's the reason behind pulling this together? >> Yeah, I think there's a lot of reasons. One of them is what we're seeing not just in our own business, but in our customers business, and that is digital transformation. Right, this this need to evolve So Hitachi Vantara is all about data and analytics. And a big focus of what we do is what Pentaho's been doing for years which is driving in all kinds of data, big data, all data. I think we're getting on the cusp of closing out the big data term, but you know, it's all data right. >> Data everywhere, every application. >> And applying analytics across the board. One of the big initiatives, part of why Pentaho was originally acquired we were actually Hitachi Data Systems was a customer of Pentaho when we got acquired, so we we knew each other pretty well. And part of the reason for that acquisition was to drive analytics in around internet of things. The IoT space, which is something that Hitachi being a very large IT and operational technology, OT, company probably does as well as anybody if not better. >> So going back couple of years, I'm just looking at my notes here from our our video index. You visited theCUBE in 2015, but really the concepts have evolved significantly. I want to just highlight a few of them. What data warehouse optimizations, we talk about that. Data refinery concepts, 360 view as applied to big data. Again that was foundational concepts that all are in play right now. >> Absolutely. >> What is the update in those areas? Because refinery, everyone talks about data refinery, you know, oil, the easy oil example but I mean, come on, data is everywhere it is most important, you can use it multiple times unlike oil, as you were pointing out. >> So interesting you bring that up. So to me data refinery in a digital transformation really in an IoT world where lots of data is is streaming through in fact, yesterday I read something by IDC that 95% of all data in the future and the data growth is dramatic it's 10x what it is today in just a few years. 95% of the that growth of data's IoT related. The question is how are you using most of that, right, and what what are you going to do with it. So that data's is streaming through, there's a lot happening, we can do things at the edge, we can apply analytics and filtering and do things. But ultimately that data is going to land somewhere and that's where that refinery, think of it as the big data center refinery, right, where I'm going to take that large amount of data and do the things that Jim does, you know and apply machine learning and deep algorithms too really. >> I had some thoughts on the IoT Jim and I were arguing, not arguing, discussing, with others in theCube about the role. >> We were bickering. >> The role of the edge because I was saying the refiner of the data can come back depending on what kind of data or you push compute to the edge, kind of known concepts, people been discussing that. But the issue is been, how do you view the edge? I'd love to get your reaction to that question because a lot of people are saying you have to think of IoT as a completely different category, than just cloud, than just data center, because the way some people are looking at IoT I know this can be semantics whether it's industrial or just straight internet of things device, or person, that is a different animal when it comes to like what you call it and how it gets put into a bucket. I mean most people put a lot of the IT bucket but. Some are saying IT edge should be completely different category of how you look at those problems. Your thoughts on how that IoT conversation shape. >> The question I always ask when I'm talking to somebody about the edge is, well what do you mean? Because it is something that can be defined a little bit differently but in an industrial IoT context I think, you know we look at it as one, you you have to know what those things are you have to really understand them. And part of understanding those things is having a digital representation of what those things are. >> A digital twin? >> A digital twin. Right, or asset avatar, as we call it at Hitachi. >> Oh I like that. >> So this idea of really managing those assets, understanding what they are and then being able to know what the current state, what the previous state, things are like that are. And then that refinery we just talked about is sort of where that information goes to so you can do other kinds of analytics right. But when you're talking about the edge, typically what we're seeing is the kinds of analytics might happen at the edge, are probably more around filtering you know, it's not quite as complex of analytics that's what we're seeing today. Now, the future I don't know. >> Sort of tiered analytics from the edge on in with more minimal, I mean, not minimal that's the wrong term, with a more narrowly scoped inference. Like predictions and so forth being handled at the edge with larger more complex models being like deep learning whatever being processed in the cloud is that it? >> Yeah that's exactly the way that I see it. Now the other thing about the edge, depends on who you're talking to, again, but what is an edge device or the the gateways or the compute right, so part of IoT is in my mind, it's not cloud, it's not on-prem or it's not, I mean it's a little bit of everything right, it depends on the use case and what you're operating. We have a customer who does trains as a service in England, in Europe, and so they don't sell the trains anymore they actually manufacture trains, and they sell the service of getting a passenger from here to there. But for them, edge is everything that happens on those trains. And tracking, as a digital representation, the train and then being able to drill down deeper and deeper, and you, know one of the things that I understand is one of the major delays for train service is doors opening and closing or being delayed, so maybe that comes down to a small part and the vibration of it and tracking that. So you've got to be able to track that appropriately. Now, on a train you might have a lot of extra space so you could put compute devices that have a lot of power. >> What's interesting you said the edge, in this context, is everything that happens on that train. In other words, it sounds like all the real world outcomes that are enabled, perhaps optimized, by embedding of the analytics in those physical devices or in that entire vehicle that is essentially. One way that you're describing the edge which is not a single device but as a complete assembly of devices that play together. Amongst themselves and in with the services in the cloud. Is that a logical sort of framework? >> That's why I said I usually ask what do we mean by edge. If you've got millions, thousands, whatever, devices out there feeding sensors whatever feeding this data, collecting, processing you know there's some some level of edge computing gateways, processes that are going to happen. >> Well, my question for ya, I'd like to get your thoughts, as we, again we're having a, we love the hyperbio we think its completely legit and it's going to be continued to be hyped because it's obvious what you see with IoT standing on the edge. But lot of customers we talked to are like, look I got a lot going on I got application development I got to break out my security got to build that up. I've got data governance issues, and now you throw in IoT over the top. They're like, I'm choking in projects. So they they come down to one of a selection criteria. How do they define a working IoT project? And the trend that we're seeing is that it has to do with their industrial equipment or something related to their business. Call it industrial IoT, because if they have something in their business, say trains, as a critical part of what they do, that's easy to say let's justify this. Everything else then tends to go on the back burner, if they don't have clear visibility of what their instrumenting. That's kind of weird do you agree with that? Do you see a pattern as well as what customers are doing by saying I'm going to bring this project in and were going to connect our IoT. >> That's exactly what I see. Industrial internet of things is where I see the biggest value today when you have trains or mining equipment or you know whatever. >> John: Whatever your business runs. >> Your manufacturing line right. and being able to a fine tune those lines to either predicts failures, maybe improve quality. Those are those are impactful and they can be done right now today and that's what we're seeing is kind of the big emerging thing. IoT's interesting to talk about, the reality is it's really digital transformation that we're seeing. Companies transforming into new business models, doing things significantly different to grow into the future. And IoT is an enabler of that. So you're not going to see IoT everywhere today. >> The low hanging fruit is where it gets to the real business. >> Yeah, but it's going to go across all verticals, right, no doubt. >> So what solutions does Pentaho have for digital twins, or managing digital twins, the objects, the data itself, within and IoT context, is this something you're engaged in already? >> So within the Hitachi Vantara, the larger company. Bigger company, we have, we have what we call our Lumada IoT Platform and in that there is this asset avatar technology that that does exactly what you're describing. Now I'm going to throw quick plug out if you don't mind. Pentaho World in a couple, in about a month. >> John: theCUBE will be there. >> theCUBE will be there, and we're excited to have theCUBE and we're going to we're going to give you complete information about asset avatar with all the right people. >> There's a movie in there somewhere I could feel it, Avatar two. There's a lot of great representations of data I want to get your thoughts on how the new firm's going to solve customer problems. Because now as the customer see this new entity from you guys, Vantara's been doing real well, we covered the acquisition and you were kind of left alone Pentaho was integrating in, but it wasn't like a radical shift. Now there's some movement, what does it mean to the customer, what's the story to the customer. >> You know I think it's great news for the customer because Pentaho's always been very customer focused. But when you look at Hitachi Vantara the wealth of technology and expertise. Everything from all of the the great IT oriented stuff that Hitachi Data Systems has done and been well known for in the past still exists. But this broader focus of taking data and processing it in a variety of ways to solve real business problems. All the way to orchestrating machine learning in applying algorithms and then with the Hitachi. >> What specifically in Hitachi is coming into this? Because again this is again a focused solution company now with data, so Hitachi Data Centers, >> Yeah, so Hitachi Data Systems, think of it as the the infrastructure company. Hitachi Insight was the really focused largely on the IoT platform development, with some Pentaho assets and then the Pentaho business. But here's the thing about Hitachi, very large company, builds everything. Mining equipment and and all kinds of stuff. So nobody understands how all those things fit together better, I believe, than Hitachi. But some of the things that we have at that organization is this idea of the Hitachi labs. And data scientists that are really doing interesting things Jim you'd love to get more embedded into what some of those things are, and making that available to customers is a huge opportunity for customers to now be able to embrace a lot of the technologies we've been talking about. I said last year that this year was going to be the year of machine learning. And if you look through the expo hall that's what everybody's talking about. Right, it's AI or machine learning. >> I'm wondering if you're commercializing R&D that's coming straight out of Hitachi labs already or whether the Vantara combination will enable that. In other words, more innovation straight out of the labs, into into the commercial arena. >> That's something that we are absolutely trying to to, right because there's great things that these lab organizations and at Hitachi they're big labs. They're really legit, I kind of joke about that. The kinds of stuff that they're able to bring about now, Pentaho is part of the engine to help actually commercialize those things. >> Chuck I know you're looking forward to Pentaho World I'll give you the final word here in this segment how you see the big data worlds evolve. Take your Pentaho hat off and put your industry guru hat on. What's happening, I mean this AI watch, that's pretty obvious, not a lot of blockchain discussion which is going to completely open up some things we getting on the decentralized application market which is going to compliment the distributed nature of how we see a date analytics flow and certainly the immutability of it's interesting. But that's kind of down the road. But here you're starting to see the swim lanes in the industry, you've seen people who've been successful and the ones who have fallen by the wayside. But now the customers, they want real solutions. They don't want more hype, they don't want another eighth year of hype, they want OK let's get into the real meat and potatoes of data impact to my organization, call it digital transformation. What's happening, what is going on the landscape. >> So you know I mentioned before and to me it's digital transformation which is a big huge thing. But that's what companies are interested in that's what they're beginning to think. If they're not thinking about those things they're falling behind, five or six, seven years ago we talked about the same exact thing with big data. It's like a big data is really you know it's a big opportunity and they're like well I don't know those that didn't adopt it aren't necessarily in a position now to transform digitally and to do some of the things that they're going to need to evolve into new business opportunities. >> And the big data examples of winner is the ones who actually made it valuable. Whether it's insight that converted to a new customer or change an outcome in a positive way, they go that wouldn't have been possible without data. The proof points kind of hit the table. >> That's right the other thing is you know, who's going to win, who's going to lose. I think people that are implementing technology for technology's sake are going to lose. People that are focused on the outcomes are going to win. That's what it is, technology enables all that but you've really got to be focused on. I want to get your quick, one more quick thing, before we go I know we got we're tight on time but I want to get thoughts on the open ecosystem. Open source going to whole other level. The projections are code will be shipping at an exponential rate, it's be a lot of onboarding of new stuff, so open obviously works, community models work, partnering is critical. So we're seeing that good partnerships, not fake deals or optical deals or Barney deals, whatever you want to call it. But real partnerships. You starting to see technology partnerships. What's your view on that, how is the new Vantara going to go forward, are you going to continue to do partnerships and what's the strategy? >> Yeah I think the opportunity with one, Hitachi Vantara is we have a breadth that can touch many different aspects. So as Pentaho we had great partnerships, very meaningful but it always comes down to what we doing for the customer. How are we changing things for customer. So I'm not a believer in those Barney kind of relationships those are nice but let's talk about what we're doing for customers. >> Yeah, real proof points. >> You guys will continue to parner. >> Yes, we will continue to do that. >> Okay great, Chuck, thank you so much. CUBE coverage Live in New York City in Manhattan it's theCUBE with Big Data NYC, out fifth year doing our own event in conjunction with Strata Data. Now bless the new name of the show. It was Strata Hadoop, Hadoop World before that. But we're still theCUBE covering eight years of the action here back with more after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media Chuck great to see you again, theCUBE alumni. now the market's evolved, what's this new entity, Yes, so the big news Hitachi Vantara. is that part the purpose? the big data term, but you know, it's all data right. One of the big initiatives, part of why Pentaho the concepts have evolved significantly. What is the update in those areas? and do the things that Jim does, you know on the IoT Jim and I were arguing, not arguing, But the issue is been, how do you view the edge? to somebody about the edge is, well what do you mean? Right, or asset avatar, as we call it at Hitachi. to know what the current state, what the previous state, I mean, not minimal that's the wrong term, it depends on the use case and what you're operating. by embedding of the analytics in those physical devices gateways, processes that are going to happen. to be continued to be hyped because it's obvious what you I see the biggest value today when you have trains and being able to a fine tune those lines it gets to the real business. Yeah, but it's going to go across all verticals, Now I'm going to throw quick plug out if you don't mind. and we're going to we're going to give you Because now as the customer see this new entity Everything from all of the the great But some of the things that we have of the labs, into into the commercial arena. now, Pentaho is part of the engine to help But now the customers, they want real solutions. and to do some of the things that they're going to need Whether it's insight that converted to a new customer People that are focused on the outcomes are going to win. to what we doing for the customer. continue to parner. to do that. of the action here back with more after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Chuck Yarbough	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
England	LOCATION	0.99+
Hitachi Data Systems	ORGANIZATION	0.99+
Chuck	PERSON	0.99+
Vantara	ORGANIZATION	0.99+
2015	DATE	0.99+
Pentaho	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
One	QUANTITY	0.99+
95%	QUANTITY	0.99+
New York City	LOCATION	0.99+
John Furrier	PERSON	0.99+
10x	QUANTITY	0.99+
last year	DATE	0.99+
eighth year	QUANTITY	0.99+
three groups	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
last week	DATE	0.99+
eight years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two years	QUANTITY	0.99+
Hitachi Insight Group	ORGANIZATION	0.99+
Big Data	ORGANIZATION	0.99+
Insight Group	ORGANIZATION	0.99+
this year	DATE	0.99+
Midtown Manhattan	LOCATION	0.98+
Strata Conference	EVENT	0.98+
third organization	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
two years ago	DATE	0.98+
Strata Hadoop	EVENT	0.98+
Wikibon Big Data	ORGANIZATION	0.98+
seven years ago	DATE	0.97+
Hitachi Insight	ORGANIZATION	0.97+
today	DATE	0.97+
Strata Data	EVENT	0.97+
Hadoop World	EVENT	0.96+
one	QUANTITY	0.96+
One way	QUANTITY	0.96+
NYC	LOCATION	0.96+
Pentaho Solutions	ORGANIZATION	0.96+
thousands	QUANTITY	0.95+
Hitachi Data Centers	ORGANIZATION	0.95+

Yaron Haviv, iguazio | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Verizon Ventures	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
Asia	LOCATION	0.99+
NYC	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Jim	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 cores	QUANTITY	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Yaron	PERSON	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
Kafka	TITLE	0.99+
third element	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Dow Jones	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two racks	QUANTITY	0.99+
today	DATE	0.99+
Grab	ORGANIZATION	0.99+
Nuclio	TITLE	0.99+
two key challenges	QUANTITY	0.99+
Cloud Native Foundation	ORGANIZATION	0.99+
about $33 million	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Hadoop	TITLE	0.98+
second type	QUANTITY	0.98+
Lambda	TITLE	0.98+
10 years ago	DATE	0.98+
each cloud	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
Equanix	LOCATION	0.98+
10-year-old	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first generation	QUANTITY	0.98+
one	QUANTITY	0.98+
second generation	QUANTITY	0.98+
Hadoop World	EVENT	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
Nutanix	ORGANIZATION	0.97+
MemSQL	TITLE	0.97+
each one	QUANTITY	0.97+
2010	DATE	0.97+
Kinesis	TITLE	0.97+
SAS	ORGANIZATION	0.96+
Wikibon	ORGANIZATION	0.96+
Chicago Mercantile Exchange	ORGANIZATION	0.96+
about two hours	QUANTITY	0.96+
this week	DATE	0.96+
one thing	QUANTITY	0.95+
dozen	QUANTITY	0.95+

Christian Rodatus, Datameer | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
HSBC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Christian Rodatus	PERSON	0.99+
Stefan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
60%	QUANTITY	0.99+
2017	DATE	0.99+
Datameer	ORGANIZATION	0.99+
2010	DATE	0.99+
32 projects	QUANTITY	0.99+
Last year	DATE	0.99+
United Kingdom	LOCATION	0.99+
1,000	QUANTITY	0.99+
New York City	LOCATION	0.99+
14%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Excel	TITLE	0.99+
eighth year	QUANTITY	0.99+
late 2009	DATE	0.99+
early 2010	DATE	0.99+
Mike Olson	PERSON	0.99+
60	QUANTITY	0.99+
27 open source projects	QUANTITY	0.99+
last week	DATE	0.99+
thousands	QUANTITY	0.99+
two things	QUANTITY	0.99+
Kafka	TITLE	0.99+
seven	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
Christian	PERSON	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
two open source projects	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
two platform technologies	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
billions of rows	QUANTITY	0.98+
first	QUANTITY	0.98+
MapReduce	ORGANIZATION	0.98+
2,000 users	QUANTITY	0.98+
Bin Laden	PERSON	0.98+
NYC	LOCATION	0.97+
Strata Data	ORGANIZATION	0.97+
32 data lakes	QUANTITY	0.97+
six	QUANTITY	0.97+
Hadoop	TITLE	0.97+
secondly	QUANTITY	0.96+
next fiscal year	DATE	0.96+
three things	QUANTITY	0.96+
today	DATE	0.95+
four different products	QUANTITY	0.95+
Teradata	ORGANIZATION	0.95+
Christian	ORGANIZATION	0.95+
this morning	DATE	0.95+
TD	ORGANIZATION	0.94+
EDW	ORGANIZATION	0.94+
BigData	EVENT	0.92+

Emma McGrattan, Actian | Big Data NYC 2017

>> Announcer: Live from midtown Manhattan it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and it's ecosystem sponsors. (upbeat techno music) >> Hello, everyone. Welcome back to theCUBE's exclusive coverage of Big Data NYC for all the access. It's our fifth year doing our own event in New York City. The hashtag is BigDataNYC. Also, in conjunction with Strata Hadoop, used to be called Hadoop World, then Strata Hadoop. Now, it's called Strata Data as they try to grope to where the future's going to be. A lot of hype over there. A lot of action. But here as where we do the intimate interviews and the stories. I'm John Furrier, co-host of theCUBE with Emma McGrattan who is the Senior Vice President of Engineering at Actian. Great to have you on. >> Thanks for having me. >> We love having everyone from Ireland cause the accidents great traction. So, I appreciate you coming on. Have a beer later at the pub. New York's got to lot of great Irish pubs. In all seriousness, we've had Actian on before. Mike Hoskins has been on. We had Jeff Veis on yesterday giving us the marketing angle of hybrid data that you guys are doing. What's under the hood? Because Actian has a lot of technology in their portfolio through how you guys had your growth strategy. But now as the world wants to bring it together you're seeing some real critical trends. >> Emma: Right. >> A lot of application development where data's important. Huge amount of security challenges. People are trying to build out and bring security out of IT. And then you've got all this data covering stuff. That's just on the top line. Then you got IOT. So, people are busy. Their plates are full, and data's the center of it. So, what are you guys doing to bring all of Actian together? >> Emma: That's a great question, perfect question for Actian. So, we have in Actian a number of products in the portfolio. And we believe that best fit product. So, if you're doing something like graph database, it doesn't make sense to put a Vector in Hadoop solution against that. And we've got the right fit technology for what we're doing. And for IOT we've got an embedded database that's as small as 30 megs. So, I've got PowerPoint files that are bigger than this database. You put it in a device, set it, it can run for 20 years. You never have to touch it. But all that data that's being generated typically you're generating it because you want, at some point, to be able to analyze it. And we've gone in the portfolio and Vector in Hadoop has the ability to take that data from the IOT sources and perform very high-speed analytics on that. So, the products that we have within the portfolio are focused around data integration, so pulling data into an environment where you're going to perform analysis or otherwise operationalize that data, data management. A lot of our customers are just doing CRM, ERP applications on our product platforms. And then the analytics is where I get really excited cause there's so much happening in the analytics world in terms of new types of applications being built, in terms of real time requirements, in terms of security and governance that you're talking about in reference in your question. And we've got a unique solution that can address all of those areas in our Vector in Hadoop products. So, it's interesting that we see the name Hadoop coming out of the show this week because we see that the focus on Hadoop kind of moving to the background and where the real focus is around the data and not so much-- >> And the business value. >> I hate to sound cliché about outcomes but we were joking on theCUBE yesterday and kind of can't coin the term, "Outcomes as a service." Which is kind of a goof on the whole, "It's about the outcomes." Which is a cliché in tech. But that really is the truth. At the end of the day, you've got a business goal. But the role of data now in real time is key. You're seeing people want real time. Not real time response with old data, they want the real data. So, people are starting to look at data as a really instrumental part of the development process. Similar with DevOps did with infrastructure as code, people want data to be like code. >> Emma: Exactly. >> And that is a hard >> Architectural challenge. So, if you go into your customer base what do you guys tell them? And I was going to the hybrid cloud as the marketing message. But I have challenged, I'm the CXO. I'm the CDO. I'm the CIO. I'm the CFO, COO, whatever the person making these huge, sweeping operational cost decisions. What's the architecture? Cause that's what people are working on right now. And how do you present that? >> Right. So, we recognize the fact that everybody's got a very distributed environment. And part of the message around hybrid data is that data can be generated pretty much any place. You may be generating data in the cloud with your own custom applications. You may be using salesforce.com or NetSuite or whatever. And you've got your on-premise sources of data generation. And what we provide in Actian is the ability to access all of that data in real time, and make it part of the applications that you're deploying that is going to be able to react in real time to changes. You don't want to be acting on yesterday's data because things have happened, things have moved on. So, the importance of real time is not lost on Actian. And all of these solutions that we bring together enable that real time analysis of what's happening in every part of the environment. So, it's hybrid in terms of the type of data that you're working with. It's hybrid in terms of it could be generated in the cloud, in any cloud or on-premise, and being able to pull all of that together an perform real time analysis is incredibly important to generate value from the data. >> Emma, I want to get your thoughts on a comment that I heard last night and then multiple times but the same pattern, they don't get it. "They" could be the venture capitalists as part of the startup. Or the customer has, "Oh, this is the way we do it." There's definitely things that are out there Silo's Legacy things that are-- Still not going away, and we know that. But how do you go into a customer saying look, there's a whole new way of doing things right now. It's not necessarily radical lift and shift or rip and replace. Whatever word you want to use. There's always a word that, you don't like rip and replace, we'll say lift and shift. It's the same thing, right? >> Right. >> You don't want to do a lot of incremental operational wholesale changes. >> Right. >> But you want to do incremental value now. How do you go in and say, "Look, this is the way you want to think about real time in your architecture." Because I don't necessarily want to change my operational mindset for the sake of Salesforce and all these different data sources. How do you guys have that conversation? >> So, Actian is unique in that we have a consumer base that goes back 20, 30 years. I personally will be at Actian 25 years in December. So, we've got customers that are running our I'd like to call them Legacy products, but they're products that powering their business every day of the week. And we've also got incredibly innovative product that we're on the bleeding edge. And what we've done in our recent release of Actian X is do combined bleeding edge technology with this more mature and proven technology. So, at Actian X you've got the OLTP database that was Ingres and now got rebranded because it's got new capabilities. And then we've taken the engine from Actian Vector product, and brought that into Actian X so that you can do in real time analysis of your OLTP data. And we act in real time to changes in the data. And it's interesting that you talk about real time because it means different things to different people. So, if you're talking to somebody doing risk analysis, real time is milliseconds. If you're talking to some customers, real time is yesterday's data and that's fine. And what we've done with Actian X is to provide that ability to determine for yourself what real time means to you and to provide a solution that enables you to respond in real time. Now, bringing analytics into what is a more traditional OLTP database, and kind of demonstrating for them some of the new capabilities it enables and opens up other opportunities as far as we can have conversations about maybe backing up that dataset to the cloud. Somebody that may have been risk averse and not looking at cloud all of a sudden is looking at cloud, looking at analytics, and then kind of opening up new opportunities for us. And new opportunities for them cause the data, as they say, is the new oil. >> That's great, great. And you guys have a good customer base to draw from. So, you've got to bring in the shiny new toy but make it work with existing. So, it sounds like you been like an extraction layer that you're building on tech that was very useful and is useful, by decoupling it with new software that adds value. Is it an extraction layer of sorts? >> We don't think of it as an extraction layer but certainly one could think of it that way because it's ... Well, yeah it's-- >> John: It's a product. You basically take the old product and bring new stuff to it. >> Exactly. >> Okay, so I got to ask you about the trend around IOT. Because IOT is one of those things right now that's super hype. And I think it's going to be even more hype. But security has been a big problem and I hear a lot honestly, certainly IOTs on the agenda. Industrial IOT is kind of the low-hanging fruit. They go to that first. But no one wants to be the next Equifax. So, there's a lot of security stuff that causes, plus there's other things going on they got to take care of. How do you guys talk about the security equation where you can come in and put in a reliable workable solution and still make the customer's feel like they're moving the ball down the field. >> So, that's one of the benefits that we have of being in the industry for as long as we have. We have very deep understanding as to what security requirements are. In terms of providing capabilities within the product to do things like control who can access what data and to what degree. Can they update it? Can they only read it? Providing the ability to encrypt the data. So, for many usecases the data is so sensitive that you'd always want to encrypt it when it's stored. You'd want any traffic coming in and out of the environment to be encrypted. Being able to audit everything that's happening in the environment, who's issuing what queries and from where and to set alarms or something if somebody attempts to access data that they shouldn't be attempting to access. So, taking all of those capabilities together, we're then able to look at things like GDPR. What are the requirements for securing the data? And we've got all the capabilities within the product. And we've got the credibility cause we've been doing this for 30 years, that we can secure these environments. We can conform to the various standards and mandates that are put in place for data security. So, we have a very strong story to tell-- >> John: What is your position >> John: On GDPR? Obviously, you've got a super important, I call it the Y2K that actually is real cause you have there compliance issues. There's a lot of, obviously, political things going on but this is a real problem, about to move fast as a solution. What are you guys offer there? >> Equifax was a prime example of why GDPR is incredibly important. So, for Actian, and you know, I talked about the capabilities we provide with regard to securing data, and secure access to that data. And when it comes to GDPR, a lot of it is around process. So, what we're doing is guiding our customers and making sure that they have secure processes in place. Putting all of the smarts into the technology, and then having somebody doing an offline backup on a CD that they leave on a seat on the train which has, in the past, been a source of data breeches, is an issue with process and not with technology. So, we're helping with that. And helping in educating-- >> John: Equifax had some >> BPN issues but also, I mean, I haven't reported on this yet also have confirmed that there were state actors involved, foreign actors penetrating in through their franchise relationships. So, in partnering in an open internet these days you need to understand who the partners are even if they're in the network. >> Absolutely. And that's why this whole idea of providing all of the capabilities required for data security including auditing, who's coming in. So, failed attempts to get into the system should be reported as problems. And that's a capability that we have within the database. >> So, you've been at Actian for 25 years, I did not know. That's cool. Good folks over there. I've been to the office a few times. I'm sure you got a good healthy customer base but for the folks that don't know Actian. What's the pitch from your standpoint? Not the marketing pitch hybrid data, I get that. I mean, what should they know about you guys. What is the problem that you saw? What do you bring to the table? From an engineering perspective, how do you differentiate? >> So, my primary focus is around high-speed analytics. And so, Actian enables the fastest SQL access to data, on Hadoop and off of Hadoop, proven through benchmarks. So, high-speed analytics is incredibly important. But for Actian, we're unique in having this 30 year history where we understand what it is to run 24/7, mission critical operational databases. So, Actian's known for products like Ingres, like Psql, and being able to analyze data that's operationalized but then also bringing in new data sources. Cause that's where things are really going. But people want to choose the best application whether it's in the cloud or on-premise, it doesn't matter. It's the best application for their need. And being able to pull all of that data together, and for operational purposes, and for analytics purposes is incredibly important. And Actian enables all of that. >> And that's where the hybrid is really clever and smart because you got the consumption side and the creation side, and data integration isn't a project, it's real. It just happens. >> Emma: Right. >> So, you want to enable that. I can see that would be a key benefit. Certainly as, whether these decentralized apps get more traction, you're going to start to see more immutable things transactions happening. Blockchain clearly points to that direction of the market where that's cool. Distributed computing has been around for awhile but now decentralized we know how to behave there. So, we're seeing some apps that will probably be rewritten for that. But again, if architected properly that should be a problem. >> Right, exactly. And we don't want anybody to have to rewrite apps. What we want to be able to do is to provide a platform where the data that you need is available. >> John: Yeah, they're called Dapps for decentralized apps. It's a whole new wave coming, it's not being talked about here at the show. We are on, obviously, at Silicon Angle and Wikibon are those trends as we're riding the big wave. Okay, Em, I want to ask you a final question. Kind of take your Actian hat off, put your Irish techie hat on, and let's get down and dirty on what the main problem in the industry is right now. If you look back and kind of go to the balcony if you will, look at the stage of the industry, obviously Hadoop is now in the background. It's an element of the bigger picture. We're seeing, we were commenting yesterday that these customers have these tool sheds of all these tools they've bought. They bought a hammer that wants to be a lawnmower, right? It's just like they have their tool platforms are being pitched at them. There's a lot of confusion. What's the main problem that the industry's trying to solve? If you look at it, if you can put the dots together. What is the big problem that needs to be solved, that the industry should be solving? >> So, I think data is every place, right? And there's not a whole lot of discipline around corralling that and putting security around it. Being able to deploy security policies across data regardless of where it's deployed or sourced. So, I think that's probably the biggest challenge is bringing compute to the data and pulling all of that together. And that's the challenge that we're addressing. >> And so, the unification, if you will, people use that word, all unifying data. What does that actually mean? You guys call it hybrid data which means you have some flexibility if you need it. >> Emma: Right. >> All right, cool. Emma, thanks so much for coming on theCUBE. Really appreciate it. Congratulations on your success. And again, you guys got to a good spot. You got a broad portfolio, you're bringing together with hybrid data. Best of luck. We'll keep in touch. Emma McGrattan here, the Senior Vice President of Engineering at Actian here on theCUBE. More live coverage here in New York City from theCUBE's coverage of Big Data NYC after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by Silicon Angle Media and the stories. hybrid data that you guys are doing. So, what are you guys doing to bring all of Actian together? So, the products that we have within the portfolio and kind of can't coin the term, "Outcomes as a service." So, if you go into your customer base and make it part of the applications that you're deploying Or the customer has, "Oh, this is the way we do it." You don't want to do a lot of incremental operational my operational mindset for the sake of Salesforce And it's interesting that you talk about real time And you guys have a good customer base to draw from. but certainly one could think of it that way and bring new stuff to it. Industrial IOT is kind of the low-hanging fruit. So, that's one of the benefits that we have I call it the Y2K that actually is real Putting all of the smarts into the technology, So, in partnering in an open internet these days all of the capabilities required for data security What is the problem that you saw? And so, Actian enables the fastest SQL access to data, And that's where the hybrid is really clever and smart So, you want to enable that. is to provide a platform where the data that you need What is the big problem that needs to be solved, And that's the challenge that we're addressing. And so, the unification, if you will, And again, you guys got to a good spot.

ENTITIES

Entity	Category	Confidence
Emma McGrattan	PERSON	0.99+
John	PERSON	0.99+
Emma	PERSON	0.99+
20 years	QUANTITY	0.99+
Mike Hoskins	PERSON	0.99+
John Furrier	PERSON	0.99+
Actian	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Ireland	LOCATION	0.99+
New York City	LOCATION	0.99+
December	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
25 years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
yesterday	DATE	0.99+
30 year	QUANTITY	0.99+
20	QUANTITY	0.99+
Jeff Veis	PERSON	0.99+
fifth year	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
New York	LOCATION	0.99+
Actian X	ORGANIZATION	0.99+
30 megs	QUANTITY	0.99+
Actian Vector	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Ingres	ORGANIZATION	0.99+
this week	DATE	0.99+
Wikibon	ORGANIZATION	0.98+
one	QUANTITY	0.98+
last night	DATE	0.98+
SQL	TITLE	0.97+
theCUBE	ORGANIZATION	0.97+
Strata Hadoop	TITLE	0.97+
Vector	ORGANIZATION	0.95+
Y2K	ORGANIZATION	0.95+
Hadoop	TITLE	0.95+
DevOps	TITLE	0.95+
NYC	LOCATION	0.94+
NetSuite	TITLE	0.92+
Silicon Angle	ORGANIZATION	0.91+
Irish	OTHER	0.9+
2017	DATE	0.89+
2017	EVENT	0.88+
Psql	TITLE	0.86+
Salesforce	ORGANIZATION	0.86+
first	QUANTITY	0.85+
Strata Data	TITLE	0.84+

Prakash Nanduri, Paxata | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. (upbeat techno music) >> Hey, welcome back, everyone. Here live in New York City, this is theCUBE from SiliconANGLE Media Special. Exclusive coverage of the Big Data World at NYC. We call it Big Data NYC in conjunction also with Strata Hadoop, Strata Data, Hadoop World all going on kind of around the corner from our event here on 37th Street in Manhattan. I'm John Furrier, the co-host of theCUBE with Peter Burris, Head of Research at SiliconANGLE Media, and General Manager of WikiBon Research. And our next guest is one of our famous CUBE alumni, Prakash Nanduri co-founder and CEO of Paxata who launched his company here on theCUBE at our first inaugural Big Data NYC event in 2013. Great to see you. >> Great to see you, John. >> John: Great to have you back. You've been on every year since, and it's been the lucky charm. You guys have been doing great. It's not broke, don't fix it, right? And so theCUBE is working with you guys. We love having you on. It's been a pleasure, you as an entrepreneur, launching your company. Really, the entrepreneurial mojo. It's really what it's all about. Getting access to the market, you guys got in there, and you got a position. Give us the update on Paxata. What's happening? >> Awesome, John and Peter. Great to be here again. Every time I come here to New York for Strata I always look forward to our conversations. And every year we have something exciting and new to share with you. So, if you recall in 2013, it was a tiny little show, and it was a tiny little company, and we came in with big plans. And in 2013, I said, "You know, John, we're going to completely disrupt the way business consumers and business analysts turn raw data into information and they do self-service data preparation." That's what we brought to the market in 2013. Ever since, we have gone on to do something really exciting and new for our customers every year. In '14, we came in with the first Apache Spark-based platform that allowed business analysts to do data preparation at scale interactively. Every year since, last year we did enterprise grade and we talked about how Paxata is going to be delivering our self-service data preparation solution in a highly-scalable enterprise grade deployment world. This year, what's super exciting is in addition to the recent announcements we made on Paxata running natively on the Microsoft Azure HDI Spark system. We are truly now the only information platform that allows business consumers to turn data into information in a multi-cloud hybrid world for our enterprise customers. In the last few years, I came and I talked to you and I told you about work we're doing and what great things are happening. But this year, in addition to the super-exciting announcements with Microsoft and other exciting announcements that you'll be hearing. You are going to hear directly from one of our key anchor customers, Standard Chartered Bank. 150-year-old institution operating in over 46 countries. One of the most storied banks in the world with 87,500 employees. >> John: That's not a start up. >> That's not a start up. (John laughs) >> They probably have a high bar, high bar. They got a lot of data. >> They have lots of data. And they have chosen Paxata as their information fabric. We announced our strategic partnership with them recently and you know that they are going to be speaking on theCUBE this week. And what started as a little experiment, just like our experiment in 2013, has actually mushroomed now into Michael Gorriz, and Shameek Kundu, and the entire leadership of Standard Chartered choosing Paxata as the platform that will democratize information in the bank across their 87,500 employees. We are going in a very exciting way, a very fast way, and now delivering real value to the bank. And you can hear all about it on our website-- >> Well, he's coming on theCUBE so we'll drill down on that, but banks are changing. You talk about a transformation. What is a teller? An Internet of Things device. The watch potentially could be a terminal. So, the Internet of Things of people changes the game. Are the ATMs going to go away and become like broadcast points? >> Prakash: And you're absolutely right. And really what it is about is, it doesn't matter if you're a Standard Chartered Bank or if you're a pharma company or if you're the leading healthcare company, what it is is that everyone of our customers is really becoming an information-inspired business. And what we are driving our customers to is moving from a world where they're data-driven. I think being data-driven is fine. But what you need to be is information-inspired. And what does that mean? It means that you need to be able to consume data, regardless of format, regardless of source, regardless of where it's coming from, and turn it into information that actually allows you to get inside in decisions. And that's what Paxata does for you. So, this whole notion of being information-inspired, I don't care if you're a bank, if you're a car company, or if you're a healthcare company today, you need to have-- >> Prakash, for the folks watching that might not know our history as you launched on theCUBE in 2013 and have been successful every year since. You guys have really deploying the classic entrepreneurial success formula, be fast, walk the talk, listen to customers, add value. Take a minute quickly just to talk about what you guys do. Just for the folks that don't know you. >> Absolutely, let's just actually give it in the real example of you know, a customer like Standard Chartered. Standard Chartered operates in multiple countries. They have significant number of lines of businesses. And whether it's in risk and compliance, whether it is in their marketing department, whether it's in their corporate banking business, what they have to do is, a simple example could be I want to create a customer list to be able to go and run a marketing campaign. And the customer list in a particular region is not something easy for a bank like Standard Charter to come up with. They need to be able to pull from multiple sources. They need to be able to clean the data. They need to be able to shape the data to get that list. And if you look at what is really important, the people who understand the data are actually not the folks in IT but the folks in business. So, they need to have a tool and a platform that allows them to pull data from multiple sources to be able to massage it, to be able to clean it-- >> John: So, you sell to the business person? >> We sell to the business consumer. The business analyst is our consumer. And the person who supports them is the chief data officer and the person who runs the Paxata platform on their data lake infrastructure. >> So, IT sets the data lake and you guys just let the business guys go to town on the data. >> Prakash: Bingo. >> Okay, what's the problem that you solve? If you can summarize the problem that you solve for the customers, what is it? >> We take data and turn it into information that is clean, that's complete, that's consumable and that's contextual. The hardest problem in every analytical exercise is actually taking data and cleaning it up and getting it ready for analytics. That's what we do. >> It's the prep work. >> It's the prep work. >> As companies gain experience with Big Data, John, what they need to start doing increasingly is move more of the prep work or have more of the prep work flow closer to the analyst. And the reason's actually pretty simple. It's because of that context. Because the analyst knows more about what their looking for and is a better evaluator of whether or not they get what they need. Otherwise, you end up in this strange cycle time problem between people in back end that are trying to generate the data that they think they want. And so, by making the whole concept of data preparation simpler, more straight forward, you're able to have the people who actually consume the data and need it do a better job of articulating what they need, how they need it and making it presentable to the work that they're performing. >> Exactly, Peter. What does that say about how roles are starting to merge together? Cause you've got to be at the vanguard of seeing how some of these mature organizations are working. What do you think? Are we seeing roles start to become more aligned? >> Yes, I do think. So, first and foremost, I think what's happening is there is no such thing as having just one group that's doing data science and another group consuming. I think what you're going to be going into is the world of data and information isn't all-consuming and that everybody's role. Everybody has a role in that. And everybody's going to consume. So, if you look at a business analyst that was spending 80% of their time living in Excel or working with self-service BI tools like our partner's Tableau and Power BI from Microsoft, others. What you find is these people today are living in a world where either they have to live in coding scripting world hell or they have to rely on IT to get them the real data. So, the role of a business analyst or a subject matter expert, first and foremost, the fact that they work with data and they need information that's a given. There is no business role today where you can't deal with data. >> But it also makes them real valuable, because there aren't a lot of people who are good at dealing with data. And they're very, very reliant on these people to turn that data into something that is regarded as consumable elsewhere. So, you're trying to make them much more productive. >> Exactly. So, four years years ago, when we launched on theCUBE, the whole premise was that in order to be able to really drive towards a world where you can make information and data-driven decisions, you need to ensure that the business analyst community, or what I like to call the business consumer needs to have the power of being able to, A, get access to data, B, make sense of the data, and then turn that data into something that's valuable for her or for him. >> Peter: And others. >> And others, and others. Absolutely. And that's what Paxata is doing. In a collaborative, in a 21st Century world where I don't work in a silo, I work collaboratively. And then the tool, and the platform that helps me do that is actually a 21st Century platform. >> So, John, at the beginning of the session you and Jim were talking about what is going to be one of the themes here at the show. And we observed that it used to be that people were talking about setting up the hardware, setting up the clutters, getting Hadoop to work, and Jim talked about going up the stack. Well, this is one of the indicators that, in fact, people were starting to go up the stack because they're starting to worry more about the data, what it can do, the value of how it's going to be used, and how we distribute more of that work so that we get more people using data that's actually good and useful to the business. >> John: And drives value. >> And drives value. >> Absolutely. And if I may, just put a chronological aspect to this. When we launched the company we said the business analyst needs to be in charge of the data and turning the data into something useful. Then right at that time, the world of create data lakes came in thanks to our partners like Cloudera and Hortonworks, and others, and MapR and others. In the recent past, the world of moving from on premise data lakes to hybrid, multicloud data lakes is becoming reality. Our partners at Microsoft, at AWS, and others are having customers come in and build cloud-based data lakes. So, today what you're seeing is on one hand this complete democratization within the business, like at Standard Chartered, where all these business analysts are getting access to data. And on the other hand, from the data infrastructure moving into a hybrid multicloud world. And what you need is a 21st Century information management platform that serves the need of the business and to make that data relevant and information and ready for their consumption. While at the same time we should not forget that enterprises need governance. They need lineage. They need scale. They need to be able to move things around depending on what their business needs are. And that's what Paxata is driving. That's why we're so excited about our partnership with Microsoft, with AWS, with our customer partnerships such as Standard Chartered Bank, rolling this out in an enterprise-- >> This is a democratization that you were referring to with your customers. We see this-- >> Everywhere. >> When you free the data up, good things happen but you don't want to have IT be the constraint, you want to let them enable-- >> Peter: And IT doesn't want to be the constraint. >> They don't. >> This is one of the biggest problems that they have on a daily basis. >> They're happy to let it go free as long as it's in they're mind DevOps-like related, this is cool for them. >> Well, they're happy to let it go with policy and security in place. >> Our customers, our most strategic customers, the folks who are running the data lakes, the folks who are managing the data lakes, they are the first ones that say that we want business to be able to access this data, and to be able to go and make use out of this data in the right way for the bank. And not have us be the impediment, not have us be the roadblock. While at the same time we still need governance. We still need security. We still need all those things that are important for a bank or a large enterprise. That's what Paxata is delivering to the customers. >> John: So, what's next? >> Peter: Oh, I'm sorry. >> So, really quickly. An interesting observation. People talk about data being the new fuel of business. That really doesn't work because, as Bill Schmarzo says, it's not the new fuel of business, it's new sunlight of business. And the reason why is because fuel can only be used once. >> Prakash: That's right. >> The whole point of data is that it can be used a lot, in a lot of different ways, and a lot of different contexts. And so, in many respects what we're really trying to facilitate or if someone who runs a data lake when someone in the business asks them, "Well, how do you create value for the business?" The more people, the more users, the more context that they're serving out of that common data, the more valuable the resource that they're administering. So, they want to see more utilization, more contexts, more data being moved out. But again, governance, security have to be in place. >> You bet, you bet. And using that analogy of data, and I've heard this term about data being the new oil, etc. Well, if data is the oil, information is really the refined fuel or sunlight as we like to call it. >> Peter: Yeah. >> John: Well, you're riffing on semantics, but the point is it's not a one trick pony. Data is part of the development, I wrote a blog post in 1997, I mean 2007 that said data's the new development kit. And it was kind of riffing on this notion of the old days >> Prakash: You bet. >> Here's your development kit, SDK, or whatever was how people did things back then Enter the cloud, >> Prakash: That's right. >> And boom, there it is. The data now is in the process of the refinery the developers wanted. The developers want the data libraries. Whatever that means. That's where I see it. And that is the democratization where data is available to be integrated in to apps, into feeds, into ... >> Exactly, and so it brings me to our point about what was the exciting, new product innovation announcement we made today about Intelligent Ingest. You want to be able to access data in the enterprise regardless of where it is, regardless of the cloud where it's sitting, regardless of whether it's on-premise, in the cloud. You don't need to as a business worry about whether that is a JSON file or whether that's an XML file or that's a relational file. That's irrelevant. What you want is, do I have the access to the right data? Can I take that data, can I turn it into something valuable and then can I make a decision out of it? I need to do that fast. At the same time, I need to have the governance and security, all of that. That's at the end of the day the objective that our customers are driving towards. >> Prakash, thanks so much for coming on and being a great member of our community. >> Fantastic. >> You're part of our smart network of great people out there and entrepreneurial journey continues. >> Yes. >> Final question. Just observation. As you pinch yourself and you go down the journey, you guys are walking the talk, adding new products. We're global landscape. You're seeing a lot of new stuff happening. Customers are trying to stay focused. A lot of distractions whether security or data or app development. What's your state of the industry? How do you view the current market, from your perspective and also how the customer might see it from their impact? >> Well, the first thing is that I think in the last four years we have seen significant maturity both on the providers off software technology and solutions, and also amongst the customers. I do think that going forward what is really going to make a difference is one really driving towards business outcomes by leveraging data. We've talked about a lot of this over the last few years. What real business outcomes are you delivering? What we are super excited is when we see our customers each one of them actually subscribes to Paxata, we're a SAS company, they subscribe to Paxata not because they're doing the science experiment but because they're trying to deliver real business value. What is that? Whether that is a risk in compliance solution which is going to drive towards real cost savings. Or whether that's a top line benefit because they know what they're customer 360 is and how they can go and serve their customers better or how they can improve supply chains or how they can optimize their entire efficiency in the company. I think if you take it from that lens, what is going to be important right now is there's lots of new technologies coming in, and what's important is how is it going to drive towards those top three business drivers that I have today for the next 18 months? >> John: So, that's foundational. >> That's foundational. Those are the building blocks-- >> That's what is happening. Don't jump... If you're a customer, it's great to look at new technologies, etc. There's always innovation projects-- >> RND, GPOCs, whatever. Kick the tires. >> But now, if you are really going to talk the talk about saying I'm going to be, call your word, data-driven, information-driven, whatever it is. If you're going to talk the talk, then you better walk the walk by delivering the real kind of tools and capabilities that you're business consumers can adopt. And they better adopt that fast. If they're not up and running in 24 hours, something is wrong. >> Peter: Let me ask one question before you close, John. So, you're argument, which I agree with, suggests that one of the big changes in the next 18 months, three years as this whole thing matures and gets more consistent in it's application of the value that it generates, we're going to see an explosion in the number users of these types of tools. >> Prakash: Yes, yes. >> Correct? >> Prakash: Absolutely. >> 2X, 3X, 5X? What do you think? >> I think we're just at the cusp. I think is going to grow up at least 10X and beyond. >> Peter: In the next two years? >> In the next, I would give that next three to five years. >> Peter: Three to five years? >> Yes. And we're on the journey. We're just at the tip of the high curve taking off. That's what I feel. >> Yeah, and there's going to be a lot more consolidation. You're going to start to see people who are winning. It's becoming clear as the fog lifts. It's a cloud game, a scale game. It's democratization, community-driven. It's open source software. Just solve problems, outcomes. I think outcome is going to be much faster. I think outcomes as a service will be a model that we'll probably be talking about in the future. You know, real time outcomes. Not eight month projects or year projects. >> Certainly, we started writing research about outcome-based management. >> Right. >> Wikibon Research... Prakash, one more thing? >> I also just want to say that in addition to this business outcome thing, I think in the last five years I've seen a lot of shift in our customer's world where the initial excitement about analytics, predictive, AI, machine-learning to get to outcomes. They've all come into a reality that none of that is possible if you're not able to handle, first get a grip on your data, and then be able to turn that data into something meaningful that can be analyzed. So, that is also a major shift. That's why you're seeing the growth we're seeing-- >> John: Cause it's really hard. >> Prakash: It's really hard. >> I mean, it's a cultural mindset. You have the personnel. It's an operational model. I mean this is not like, throw some pixie dust on it and it magically happens. >> That's why I say, before you go into any kind of BI, analytics, AI initiative, stop, think about your information management strategy. Think about how you're going to democratize information. Think about how you're going to get governance. Think about how you're going to enable your business to turn data into information. >> Remember, you can't do AI with IA? You can't do AI without information architecture. >> There you go. That's a great point. >> And I think this all points to why Wikibon's research have all the analysts got it right with true private cloud because people got to take care of their business here to have a foundation for the future. And you can't just jump to the future. There's too much just to come and use a scale, too many cracks in the foundation. You got to do your, take your medicine now. And do the homework and lay down a solid foundation. >> You bet. >> All right, Prakash. Great to have you on theCUBE. Again, congratulations. And again, it's great for us. I totally have a great vibe when I see you. Thinking about how you launched on theCUBE in 2013, and how far you continue to climb. Congratulations. >> Thank you so much, John. Thanks, Peter. That was fantastic. >> All right, live coverage continuing day one of three days. It's going to be a great week here in New York City. Weather's perfect and all the players are in town for Big Data NYC. I'm John Furrier with Peter Burris. Be back with more after this short break. (upbeat techno music).

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE with Peter Burris, and it's been the lucky charm. In the last few years, I came and I talked to you That's not a start up. They got a lot of data. and Shameek Kundu, and the entire leadership Are the ATMs going to go away and turn it into information that actually allows you Take a minute quickly just to talk about what you guys do. And the customer list in a particular region and the person who runs the Paxata platform and you guys just let the business guys and that's contextual. is move more of the prep work or have more of the prep work are starting to merge together? And everybody's going to consume. to turn that data into something that is regarded to be able to really drive towards a world And that's what Paxata is doing. So, John, at the beginning of the session of the business and to make that data relevant This is a democratization that you were referring to This is one of the biggest problems that they have They're happy to let it go free as long as Well, they're happy to let it go with policy and to be able to go and make use out of this data And the reason why is because fuel can only be used once. out of that common data, the more valuable Well, if data is the oil, I mean 2007 that said data's the new development kit. And that is the democratization At the same time, I need to have the governance and being a great member of our community. and entrepreneurial journey continues. How do you view the current market, and also amongst the customers. Those are the building blocks-- it's great to look at new technologies, etc. Kick the tires. the real kind of tools and capabilities in it's application of the value that it generates, I think is going to grow up at least 10X and beyond. We're just at the tip of Yeah, and there's going to be a lot more consolidation. Certainly, we started writing research Prakash, one more thing? and then be able to turn that data into something meaningful You have the personnel. to turn data into information. Remember, you can't do AI with IA? There you go. And I think this all points to Great to have you on theCUBE. Thank you so much, John. It's going to be a great week here in New York City.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Peter	PERSON	0.99+
Prakash	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
1997	DATE	0.99+
New York	LOCATION	0.99+
Three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Michael Gorriz	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
2007	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
87,500 employees	QUANTITY	0.99+
Paxata	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
last year	DATE	0.99+
37th Street	LOCATION	0.99+
SAS	ORGANIZATION	0.99+
WikiBon Research	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Excel	TITLE	0.99+
24 hours	QUANTITY	0.99+
One	QUANTITY	0.99+
this year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
This year	DATE	0.99+
21st Century	DATE	0.99+
one	QUANTITY	0.99+
eight month	QUANTITY	0.99+
one question	QUANTITY	0.99+
four years years ago	DATE	0.99+
3X	QUANTITY	0.99+
5X	QUANTITY	0.99+
first	QUANTITY	0.99+
three years	QUANTITY	0.99+

Itamar Ankorian, Attunity | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsor. >> Okay, welcome back, everyone, to our live special CUBE coverage in New York City in Manhattan, we're here in Hell's Kitchen for theCUBE's exclusive coverage of our Big Data NYC event and Strata Data, which used to be called Strata Hadoop, used to be Hadoop World, but our event, Big Data NYC, is our fifth year where we gather every year to see what's going on in big data world and also produce all of our great research. I'm John Furrier, the co-host of theCUBE, with Peter Burris, head of research. Our next guest, Itamar Ankorion, who's the Chief Marketing Officer at Attunity. Welcome back to theCUBE, good to see you. >> Thank you very much. It's good to be back. >> We've been covering Attunity for many, many years. We've had many conversations, you guys have had great success in big data, so congratulations on that. But the world is changing, and we're seeing data integration, we've been calling this for multiple years, that's not going away, people need to integrate more. But with cloud, there's been a real focus on accelerating the scale component with an emphasis on ease of use, data sovereignty, data governance, so all these things are coming together, the cloud has amplified. What's going on in the big data world, and it's like, listen, get movin' or you're out of business has pretty much been the mandate we've been seeing. A lot of people have been reacting. What's your response at Attunity these days because you have successful piece parts with your product offering? What's the big update for you guys with respect to this big growth area? >> Thank you. First of all, the cloud data lakes have been a major force, changing the data landscape and data management landscape for enterprises. For the past few years, I've been working closely with some of the world's leading organizations across different industries as they deploy the first and then the second and third iteration of the data lake and big data architectures. And one of the things, of course, we're all seeing is the move to cloud, whether we're seeing enterprises move completely to the cloud, kind of move the data lakes, that's where they build them, or actually have a hybrid environment where part of the data lake and data works analytics environment is on prem and part of it is in the cloud. The other thing we're seeing is that the enterprises are starting to mix more of the traditional data lake, the cloud is the platform, and streaming technologies is the way to enable all the modern data analytics that they need, and that's what we have been focusing on on enabling them to use data across all these different technologies where and when they need it. >> So, the sum of the parts is worth more if it's integrated together seems to be the positioning, which is great, it's what customers want, make it easier. What is the hard news that you guys have, 'cause you have some big news? Let's get to the news real quick. >> Thank you very much. We did, today, we have announced, we're very excited about it, we have announced a new big release of our data integration platform. Our modern platform brings together Attunity Replicate, Attunity Compose for Hive, and Attunity Enterprise Manager, or AEM. These are products that we've evolved significantly, invested a lot over the last few years to enable organizations to use data, make data available, and available in the real time across all these different platforms, and then, turn this data to be ready for analytics, especially in Hive and Hadoop environments on prem and now also in the cloud. Today, we've announced a major release with a lot of enhancements across the entire product line. >> Some people might know you guys for the Replicate piece. I know that this announcement was 6.0, but as you guys have the other piece part to this, really it's about modernization of kind of old-school techniques. That's really been the driver of your success. What specifically in this announcement makes it, you know, really work well for people who move in real time, they want to have good data access. What's the big aha for the customers out there with Attunity on this announcement? >> That's a great question, thank you. First of all is that we're bringing it all together. As you mentioned, over the past few years, Attunity Replicate has emerged as the choice of many Fortune 100 and other companies who are building modern architectures and moving data across different platforms, to the cloud, to their lakes, and they're doing it in a very efficient way. One of the things we've seen is that they needed the flexibility to adapt as they go through their journey, to adapt different platforms, and what we give them with Replicate was the flexibility to do so. We give them the flexibility, we give them the performance to get the data and efficiency to move only the change of the data as they happen and to do that in a real-time fashion. Now, that's all great, but once the data gets to the data lake, how do you then turn it into valuable information? That's when we introduced Compose for Hive, which we talked about in our last session a few month ago, which basically takes the next stage in the pipeline picking up incremental, continuous data that is fed into the data lake and turning those into operational data store, historical data stores, data store that's basically ready for analytics. What we've done with this release that we're really excited about is putting all of these together in a more integrated fashion, putting Attunity Enterprise Manager on top of it to help manage larger scale environments so customers can move faster in deploying these solutions. >> As you think about the role that Attunity's going to play over time, though, it's going to end up being part of a broader solution for how you handle your data. Imagine for a second the patterns that your customers are deploying. What is Attunity typically being deployed with? >> That's a great question. First of all, we're definitely part of a large ecosystem for building the new data architecture, new data management with data integration being more than ever a key part of that bigger ecosystem because as all they actually have today is more islands with more places where the data needs to go, and to your point, more patterns in which the data moves. One of those patterns that we've seen significantly increase in demand and deployment is streaming. Where data used to be batch, now we're all talking about streaming. Kafka has emerged as a very common platform, but not only Kafka. If you're on Amazon Web Services, you're using Kinesis. If you're in Azure, you're using Azure Event Hubs. You have different streaming technologies. That's part of how this has evolved. >> How is that challenge? 'Cause you just bring up a good point. I mean, with the big trend that customers want is they want either the same code basis on prem and that they have the hybrid, which means the gateway, if you will, to the public cloud. They want to have the same code base, or move workloads between different clouds, multi-cloud, it seems to be the Holy Grail, we've identified it. We are taking the position that we think multi-cloud will be the preferred architecture going forward. Not necessarily this year, but it's going to get there. But as a customer, I don't want to have to rebuild employees and get skill development and retraining on Amazon, Azure, Google. I mean, each one has its own different path, you mentioned it. How do you talk to customers about that because they might be like, whoa, I want it, but how do I work in that environment? You guys have a solution for that? >> We do, and in fact, one of the things we've seen, to your point, we've seen the adoption of multiple clouds, and even if that adoption is staged, what we're seeing is more and more customers that are actually referring to the term lock-in in respect to the cloud. Do we put all the eggs in one cloud, or do we allow ourselves the flexibility to move around and use different clouds, and also mitigate our risk in that respect? What we've done from that perspective is first of all, when you use the Attunity platform, we take away all the development complexity. In the Attunity platform, it is very easy to set up. Your data flow is your data pipelines, and it's all common and consistent. Whether you're working on prem, whether you work on Amazon Web Services, on Azure, or on Google or other platforms, it all looks and feels the same. First of all, and you solve the issue of the diversity, but also the complexity, because what we've done is, this is one of the big things that Attunity is focused on was reducing the complexity, allowing to configure these data pipelines without development efforts and resources. >> One of the challenges, or one of the things you typically do to take complexity out is you do a better job of design up front. And I know that Attunity's got a tool set that starts to address some of of these things. Take us a little bit through how your customers are starting to think in terms of designing flows as opposed to just cobbling together things in a bespoke way. How is that starting to change as customers gain experience with large data sets, the ability, the need to aggregate them, the ability to present them to developers in different ways? >> That's a great point, and again, one of the things we've focused on is to make the process of developing or configuring these different data flows easy and modular. First, while in Attunity you can set up different flows in different patterns, and you can then make them available to others for consumption. Some create the data ingestion, or some create the data ingestion and then create a data transformation with Compose for Hive, and with Attunity Enterprise Manager, we've now also introduced APIs that allow you to create your own microservices, consuming and using the services enabled by the platform, so we provide more flexibility to put all these different solutions together. >> What's the biggest thing that you see from a customer standpoint, from a problem that you solve? If you had to kind of lay it out, you know the classic, hey, what problem do you solve? 'Cause there are many, so take us through the key problem, and then, if there's any secondary issues that you guys can address customers, that seems the way conversation starts. What are key problems that you solve? >> I think one of the major problems that we solve is scale. Our customers that are deploying data lakes are trying to deploy and use data that is coming, not from five or 10 or even 50 data sources, we work at hundreds going on thousands of data sources now. That in itself represents a major challenge to our customers, and we're addressing it by dramatically simplifying and making the process of setting those up very repeatable, very easy, and then providing the management facility because when you have hundreds or thousands, management becomes a bigger issue to operationalize it. We invested a lot in a management facility for those, from a monitoring, control, security, how do you secure it? The data lake is used by many different groups, so how do we allow each group to see and work only on what belongs to that group? That's part it, too. So again, the scale is the major thing there. The other one is real timeliness. We talked about the move to streaming, and a lot of it is in order to enable streaming analytics, real-time analytics. That's only as good as your data, so you need to capture data in real time. And that of course has been our claim to fame for a long time, being the leading independent provider of CDC, change data capture technology. What we've done now, and also expanded significantly with the new release, version six, is creating universal database streaming. >> What is that? >> We take databases, we take databases, all the enterprise databases, and we turn them into live streams. When you think, by the way, by the most common way that people have used, customers have used to bring data into the lake from a database, it was Scoop. And Scoop is a great, easy software to use from an open source perspective, but it's scripting and batch. So, you're building your new modern architecture with the two are effectively scripting and batch. What we do with CDC is we enable to take a database, and instead of the database being something you come to periodically to read it, we actually turn it into a live feed, so as the data changes in the database, we stream it, we make it available across all these different platforms. >> Changes the definition of what live streaming is. We're live streaming theCUBE, we're data. We're data streaming, and you get great data. So, here's the question for you. This is a good topic, I love this topic. Pete and I talk about this all the time, and it's been addressed in the big data world, but it's kind of, you can see the pattern going mainstream in society globally, geopolitically and also in society. Batch processing and data in motion are real time. Streaming brings up this use case to the end customer, which is this is the way they've done it before, certainly store things in data lakes, that's not going to go away, you're going to store stuff, but the real gain is in motion. >> Itamar: Correct. >> How do you describe that to a customer when you go out and say, hey, you know, you've been living in a batch world, but wake up to the real world called real time. How do you get to them to align with it? Some people get it right away, I see that, some people don't. How do you talk about that because that seems to be a real cultural thing going on right now, or operational readiness from the customer standpoint? Can you just talk through your feeling on that? >> First of all, this often gets lost in translation, and we see quite a few companies and even IT departments that when you talk, when they refer to real time, or their business tells them we need real time, what they understand from it is when you ask for the data, the response will be immediate. You get real time access to the data, but the data is from last week. So, we get real time access, but for last week's data. And that's what we try to do is to basically say, wait a second, when you mean real time, what does real time mean? And we start to understand what is the meaning of using last week's data versus, or yesterday's data, over the real time data, and that makes a big difference. We actually see that today the access, the availability, the availability to act on the real time data, that's the frontier of competitive differentiation. That's what makes a customer experience better, that's what makes the business more operationally efficient than the competition. >> It's the data, not so much the process of what they used to do. They're version of real time is I responded to you pretty quickly. >> Exactly, the other thing that's interesting is because we see it with, again, change of the capture becoming a critical component of the modern data architecture. Traditionally, we used to talk about different type of tools and technology, now CDC itself is becoming a critical part of it, and the reason is that it serves and it answers a lot of fundamental needs that are now becoming critical. One is the need for real-time data. The other one is efficiency. If you're moving to the cloud, and we talked about this earlier, if you're data lake is going to be in the cloud, there's no way you're going to reload all your data because the bandwidth is going to get in the way. So, you have to move only the delta. You need the ability to capture and move only the delta, so CDC becomes fundamental both in enabling the real time as well the efficient, the low-impact data integration. >> You guys have a lot of partners, technology partners, global SIs, resellers, a bunch of different partnership levels. The question I have for you, love to get your reaction and share your insight into is, okay, as the relationship to the customer who has the problem, what's in it for me? I want to move my business forward, I want to do digital business, I need to get up my real-time data as it's happening. Whether it's near real time or real time, that's evolution, but ultimately, they have to move their developers down a certain path. They'll usually hire a partner. The relationship between partners and you, the supplier to the customer, has changed recently. >> That's correct. >> How is that evolving? >> First of all, it's evolving in several ways. We've invested on our part to make sure that we're building Attunity as a leading vendor in the ecosystem of they system integration consulting companies. We work with pretty much all the major global system integrators as well as regional ones, boutique ones, that focus on the emerging technologies as well as get the modern analytic-type platforms. We work a lot with plenty of them on major corporate data center-level migrations to the cloud. So again, the motivations are different, but we invest-- >> More specialized, are you seeing more specialty, what's the trend? >> We've been a technology partner of choice to both Amazon and Microsoft for enabling, facilitating the data migration to the cloud. They of course, their select or preferred group of partners they work with, so we all come together to create these solutions. >> Itamar, what's the goals for Attunity as we wrap up here? I give you the last word, as you guys have this big announcement, you're bringing it all together. Integrating is key, it's always been your ethos in the company. Where is this next level, what's the next milestone for you guys? What do you guys see going forward? >> First of all, we're going to continue to modernize. We're really excited about the new announcement we did today, Replicate six, AEM six, a new version of Compose for Hive that now also supports small data lakes, Aldermore, Scaldera, EMR, and a key point for us was expanding AEM to also enable analytics on the data we generate as data flows through it. The whole point is modernizing data integration, providing more intelligence in the process, reducing the complexity, and facilitating the automation end-to-end. We're going to continue to solve, >> Automation big, big time. >> Automation is a big thing for us, and the point is, you need to scale. In order to scale, we want to generate things for you so you don't to develop for every piece. We automate the automation, okay. The whole point is to deliver the solution faster, and the way we're going to do it is to continue to enhance each one of the products in its own space, if it's replication across systems, Compose for Hive for transformations in pipeline automation, and AEM for management, but also to create integration between them. Again, for us it's to create a platform that for our customers they get more than the sum of the parts, they get the unique capabilities that we bring together in this platform. >> Itamar, thanks for coming onto theCUBE, appreciate it, congratulations to Attunity. And you guys bringing it all together, congratulations. >> Thank you very much. >> This theCUBE live coverage, bringing it down here to New York City, Manhattan. I'm John Furrier, Peter Burris. Be right back with more after this short break. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE, Thank you very much. What's the big update for you guys the move to cloud, whether we're seeing enterprises What is the hard news that you guys have, and available in the real time That's really been the driver of your success. the flexibility to adapt as they go through their journey, Imagine for a second the patterns and to your point, more patterns in which the data moves. We are taking the position that we think multi-cloud We do, and in fact, one of the things we've seen, the ability to present them to developers in different ways? one of the things we've focused on is What's the biggest thing that you see We talked about the move to streaming, and instead of the database being something and it's been addressed in the big data world, or operational readiness from the customer standpoint? the availability to act on the real time data, I responded to you pretty quickly. because the bandwidth is going to get in the way. the supplier to the customer, has changed boutique ones, that focus on the emerging technologies facilitating the data migration to the cloud. What do you guys see going forward? on the data we generate as data flows through it. and the point is, you need to scale. And you guys bringing it all together, congratulations. it down here to New York City, Manhattan.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
John Furrier	PERSON	0.99+
five	QUANTITY	0.99+
last week	DATE	0.99+
New York City	LOCATION	0.99+
Itamar	PERSON	0.99+
second	QUANTITY	0.99+
CDC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Today	DATE	0.99+
Pete	PERSON	0.99+
50 data sources	QUANTITY	0.99+
10	QUANTITY	0.99+
Itamar Ankorian	PERSON	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
each group	QUANTITY	0.99+
yesterday	DATE	0.99+
fifth year	QUANTITY	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
Attunity Replicate	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
one	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
NYC	LOCATION	0.98+
Attunity	ORGANIZATION	0.97+
Aldermore	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one cloud	QUANTITY	0.97+
this year	DATE	0.97+
EMR	ORGANIZATION	0.96+
Big Data	EVENT	0.96+
Kafka	TITLE	0.95+
each one	QUANTITY	0.95+
Scaldera	ORGANIZATION	0.95+
thousands	QUANTITY	0.94+
Azure	ORGANIZATION	0.94+
Strata Hadoop	EVENT	0.94+
New York City, Manhattan	LOCATION	0.94+
6.0	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
Azure Event Hubs	TITLE	0.91+
2017	EVENT	0.91+
a second	QUANTITY	0.91+
Hive	TITLE	0.9+
rtune 100	ORGANIZATION	0.9+
CUBE	ORGANIZATION	0.9+
few month ago	DATE	0.88+
Attunity Enterprise Manager	TITLE	0.83+
thousands of data sources	QUANTITY	0.83+
2017	DATE	0.82+
AEM	TITLE	0.8+
third iteration	QUANTITY	0.79+
version six	QUANTITY	0.78+

Tendü Yogurtçu | BigData SV 2017

>> Announcer: Live from San Jose, California. It's The Cube, covering Big Data Silicon Valley 2017. (upbeat electronic music) >> California, Silicon Valley, at the heart of the big data world, this is The Cube's coverage of Big Data Silicon Valley in conjunction with Strata Hadoop, well of course we've been here for multiple years, covering Hadoop World for now our eighth year, now that's Strata Hadoop but we do our own event, Big Data SV in New York City and Silicon Valley, SV NYC. I'm John Furrier, my cohost George Gilbert, analyst at Wikibon. Our next guest is Tendü Yogurtçu with Syncsort, general manager of the big data, did I get that right? >> Yes, you got it right. It's always a pleasure to be at The Cube. >> (laughs) I love your name. That's so hard for me to get, but I think I was close enough there. Welcome back. >> Thank you. >> Great to see you. You know, one of the things I'm excited about with Syncsort is we've been following you guys, we talk to you guys every year, and it just seems to be that every year, more and more announcements happen. You guys are unstoppable. You're like what Amazon does, just more and more announcements, but the theme seems to be integration. Give us the latest update. You had an update, you bought Trillium, you got a hit deal with Hortonworks, you got integrated with Spark, you got big news here, what's the news here this year? >> Sure. Thank you for having me. Yes, it's very exciting times at Syncsort and I've probably say that every time I appear because every time it's more exciting than the previous, which is great. We bought Trillium Software and Trillium Software has been leading data quality over a decade in many of the enterprises. It's very complimentary to our data integration, data management portfolio because we are helping our customers to access all of their enterprise data, not just the new emerging sources in the connected devices and mobile and streaming. Also leveraging reference data, my main frame legacy systems and the legacy enterprise data warehouse. While we are doing that, accessing data, data lake is now actually, in some cases, turning into data swamp. That was a term Dave Vellante used a couple of years back in one of the crowd chats and it's becoming real. So, data-- >> Real being the data swamps, data lakes are turning into swamps because they're not being leveraged properly? >> Exactly, exactly. Because it's about also having access to write data, and data quality is very complimentary because dream has had trusted right data, so to enterprise customers in the traditional environments, so now we are looking forward to bring that enterprise trust of the data quality into data lake. In terms of the data integration, data integration has been always very critical to any organization. It's even more critical now that the data is shifting gravity and the amount of data organizations have. What we have been delivering in very large enterprise production environments for the last three years is we are hearing our competitors making announcements in those areas very recently, which is a validation because we are already running in very large production environments. We are offering value by saying "Create your applications for integrating your data," whether it's in the cloud or originating on the cloud or origination on the main frames, whether it's on the legacy data warehouse, you can deploy the same exact application without any recompilations, without any changes on your standalone Windows laptop or in Hadoop MapReduce, or Spark in the cloud. So this design once and deploy anywhere is becoming more and more critical with data, it's originating in many different places and cloud is definitely one of them. Our data warehouse optimization solution with Hortonworks and AtScale, it's a special package to accelerate this adoption. It's basically helping organizations to offload the workload from the existing Teradata or Netezza data warehouse and deploying in Hadoop. We provide a single button to automatically map the metadata, create the metadata in Hive or on Hadoop and also make the data accessible in the new environment and AtScale provides fast BI on top of that. >> Wow, that's amazing. I want to ask you a question, because this is a theme, so I just did a tweetup just now while you were talking saying "the theme this year is cleaning up the data lakes, or data swamps, AKA data lakes. The other theme is integration. Can you just lay out your premise on how enterprises should be looking at integration now because it's the multi-vendor world, it's the multi-cloud world, multi-data type and source with metadata world. How do you advise customers that have the plethora of action coming at them. IOT, you've got cloud, you've got big data, I've got Hadoop here, I got Spark over here, what's the integration formula? >> First thing is identify your business use cases. What's your business's challenge, what's your business goals, and the challenge, because that should be the real driver. We assist in some organizations, they start with the intention "we would like to create a data lake" without having that very clear understanding, what is it that I'm trying to solve with this data lake? Data as a service is really becoming a theme across multiple organizations, whether it's on the enterprise side or on some of the online retail organizations, for example. As part of that data as a service, organizations really need to adopt tools that are going to enable them to take advantage of the technology stack. The technology stack is evolving very rapidly. The skill sets are rare, and skill sets are rare because you need to be kind of making adjustments. Am I hiring Ph.D students who can program Scala in the most optimized way, or should I hire Java developers, or should I hire Python developers, the names of the tools in the stack, Spark one versus Spark two APIs, change. It's really evolving very rapidly. >> It's hard to find Scala developers, I mean, you go outside Silicon Valley. >> Exactly. So you need to be, as an organization, ours advises that you really need to find tools that are going to fit those business use cases and provide a single software environment, that data integration might be happening on premise now, with some of the legacy enterprise data warehouse, and it might happen in a hybrid, on premise and cloud environment in the near future and perhaps completely in the cloud. >> So standard tools, tools that have some standard software behind it, so you don't get stuck in the personnel hiring problem. Some unique domain expertise that's hard to hire. >> Yes, skill set is one problem, the second problem is the fact that the applications needs to be recompiled because the stack is evolving and the APIs are not compatible with the previous version, so that's the maintenance cost to keep up with things, to be able to catch up with the new versions of the stack, that's another area that the tools really help, because you want to be able to develop the application and deploy it anywhere in any complete platform. >> So Tendü, if I hear you properly, what you're saying is integration sounds great on paper, it's important, but there's some hidden costs there, and that is the skill set and then there's the stack recompiling, I'm making sure. Okay, that's awesome. >> The tools help with that. >> Take a step back and zoom out and talk about Syncsort's positioning, because you guys have been changing with the stacks as well, I mean you guys have been doing very well with the announcements, you've been just coming on the market all the time. What is the current value proposition for Syncsort today? >> The current value proposition is really we have organizations to create the next generation modern data architecture by accessing and liberating all enterprise data and delivering that data at the right time and the right quality data. It's liberate, integrate, with integrity. That's our value proposition. How do we do that? We provide that single software environment. You can have batch legacy data and streaming data sources integrated in the same exact environment and it enables you to adapt to Spark 2 or Flink or whichever complete framework is going to help them. That has been our value proposition and it is proven in many production deployments. >> What's interesting to is the way you guys have approached the market. You've locked down the legacy, so you have, we talk about the main frame and well beyond that now, you guys have and understand the legacy, so you kind of lock that down, protect it, make it secure, it's security-wise, but you do that too, but making sure it works because it's still data there, because legacy systems are really critical in the hybrid. >> Main frame expertise and heritage that we have is a critical part of our offering. We will continue to focus on innovation on the main frame side as well as on the distributed. One of the announcements that we made since our last conversation was we have partnership with Compuware and we now bring in more data types about application failures, it's a Band-Aid data to Splunk for operational intelligence. We will continue to also support more delivery types, we have batch delivery, we have streaming delivery, and now replication into Hadoop has been a challenge so our focus is now replication from the B2 on mainframe and ISA on mainframe to Hadoop environments. That's what we will continue to focus on, mainframe, because we have heritage there and it's also part of big enterprise data lake. You cannot make sense of the customer data that you are getting from mobile if you don't reference the critical data sets that are on the mainframe. With the Trillium acquisition, it's very exciting because now we are at a kind of pivotal point in the market, we can bring that data validation, cleansing, and matching superior capabilities we have to the big data environments. One of the things-- >> So when you get in low latency, you guys do the whole low latency thing too? You bring it in fast? >> Yes, we bring it, that's our current value proposition and as we are accessing this data and integrating this part of the data lake, now we have capabilities with Trillium that we can profile that data, get statistics and start using machine learning to automate the data steward's job. Data stewards are still spending 75% of their time trying to clean the data. So if we can-- >> Lot of manual work labor there, and modeling too, by the way, the modeling and just the cleaning, cleaning and modeling kind of go hand in hand. >> Exactly. If we can automate any of these steps to drive the business rules automatically and provide right data on the data lake, that would be very valuable. This is what we are hearing from our customers as well. >> We've heard probably five years about the data lake as the center of gravity of big data, but we're hearing at least a bifurcation, maybe more, where now we want to take that data and apply it, operationalize it in making decisions with machine learning, predictive analytics, but at the same time we're trying to square this strange circle of data, the data lake where you didn't say up front what you wanted it to look like but now we want ever richer metadata to make sense out of it, a layer that you're putting on it, the data prep layer, and others are trying to put different metadata on top of it. What do you see that metadata layer looking like over the next three to five years? >> The governance is a very key topic and social organizations who are ahead of the game in the big data and who already established that data lake, data governance and even analytics governance becomes important. What we are delivering here with Trillium, we will have generally available by end of Q1. We are basically bringing business rules to the data. Instead of bringing data to business rules, we are taking the business rules and deploying them where the data exists. That will be key because of the data gravity you mentioned because the data might be in the Hadoop environment, there might be in a, like I said, enterprise data warehouse, and it might be originating in the cloud, and you don't want to move the data to the business rules. You want to move the business rules to where the data exists. Cloud is an area that we see more and more of our customers are moving forward. Two main use cases around our integration is one, because the data is originating in cloud, and the second one is archiving data to cloud, and we announced actually, tighter integration with cloud with our director earlier this week for this event, and that we have been in cloud deployments and we have actually an offering, an elastic MapReduce already and on AC too for couple of years now, and also on the Google cloud storage, but this announcement is primarily making deployments even easier by leveraging cloud director's elasticity for increasing and reducing the deployment. Now our customers will also take advantage of integration jobs from that elasticity. >> Tendü, it's great to have you on The Cube because you have an engineering mind but you're also now general manager of the business, and your business is changing. You're in the center of the action, so I want to get your expertise and insight into enterprise readiness concept and we saw last week at Google Cloud 2017, you know, Google going down the path of being enterprise ready, or taking steps, I don't think they're fully ready, but they're certainly serious about the cloud on the enterprise, and that's clear from Diane Green, who knows the enterprise. It sparked the conversation last week, around what does enterprise readiness mean for cloud players, because there's so many details in between the lines, if you will, of what products are, that integration, certification, SLAs. What's your take on the notion of cloud readiness? Vizaviz, Google and others that are bringing cloud compute, a lot of resources, with an IOT market that's now booming, big data evolving very, very fast, lot of realtime, lot of analytics, lot of innovation happening. What's the enterprise picture look like from a readiness standpoint? How do these guys get ready? >> From a big picture, for enterprise there are couple of things that these cannot be afterthought. Security, metadata lineage is part of data governance, and being able to have flexibility in the architecture, that they will not be kind of recreating the jobs that they might have all the way to deployed and on premise environments, right? To be able to have the same application running from on premise to cloud will be critical because it gives flexibility for adaptation in the enterprise. Enterprise may have some MapReduce jobs running on premise with the Spark jobs on cloud because they are really doing some predictive analytics, graph analytics on those, they want to be able to kind of have that flexible architecture where we hear this concept of a hybrid environment. You don't want to be deploying a completely different product in the cloud and redo your jobs. That flexibility of architecture, flexibility-- >> So having different code bases in the cloud versus on prem requires two jobs to do the same thing. >> Two jobs for maintaining, two jobs for standardizing, and two different skill sets of people potentially. So security, governance, and being able to access easily and have applications move in between environments will be very critical. >> So seamless integration between clouds and on prem first, and then potentially multi-cloud. That's table stakes in your mind. >> They are absolutely table stakes. A lot of vendors are trying to focus on that, definitely Hadoop vendors are also focusing on that. Also, one of the things, like when people talk about governance, the requirements are changing. We have been talking about single view and customer 360 for a while now, right? Do we have it right yet? The enrichment is becoming a key. With Trillium we made the recent announcement, the precise enriching, it's not just the address that you want to deliver and make sure that address should be correct, it's also the email address, and the phone number, is it mobile number, is it landline? It's enriched data sets that we have to be really dealing, and there's a lot of opportunity, and we are really excited because data quality, discovery and integration are coming together and we have a good-- >> Well Tendü, thank you for joining us, and congratulations as Syncsort broadens their scope to being a modern data platform solution provider for companies, congratulations. >> Thank you. >> Thanks for coming. >> Thank you for having me. >> This is The Cube here live in Silicon Valley and San Jose, I'm John Furrier, George Gilbert, you're watching our coverage of Big Data Silicon Valley in conjunction with Strata Hadoop. This is Silicon Angles, The Cube, we'll be right back with more live coverage. We've got two days of wall to wall coverage with experts and pros talking about big data, the transformations here inside The Cube. We'll be right back. (upbeat electronic music)

Published Date : Mar 14 2017

SUMMARY :

It's The Cube, covering Big Data Silicon Valley 2017. general manager of the big data, did I get that right? Yes, you got it right. That's so hard for me to get, but more announcements, but the theme seems to be integration. a decade in many of the enterprises. on Hadoop and also make the data accessible in it's the multi-cloud world, multi-data type it's on the enterprise side or on some It's hard to find Scala developers, I mean, the near future and perhaps completely in the cloud. get stuck in the personnel hiring problem. another area that the tools really help, So Tendü, if I hear you properly, what you're coming on the market all the time. and delivering that data at the right the legacy, so you kind of lock that down, One of the announcements that we made since automate the data steward's job. the modeling and just the cleaning, and provide right data on the data lake, data, the data lake where you didn't say the data to the business rules. many details in between the lines, if you will, kind of recreating the jobs that they might code bases in the cloud versus on prem So security, governance, and being able to on prem first, and then potentially multi-cloud. it's also the email address, and Well Tendü, thank you for the transformations here inside The Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
John Furrier	PERSON	0.99+
two jobs	QUANTITY	0.99+
Two jobs	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
75%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Diane Green	PERSON	0.99+
San Jose, California	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Syncsort	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
second problem	QUANTITY	0.99+
last week	DATE	0.99+
Compuware	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Spark 2	TITLE	0.99+
one	QUANTITY	0.99+
one problem	QUANTITY	0.99+
Vizaviz	ORGANIZATION	0.99+
Tendü Yogurtçu	PERSON	0.99+
Spark	TITLE	0.99+
eighth year	QUANTITY	0.99+
One	QUANTITY	0.99+
five years	QUANTITY	0.99+
Two main use cases	QUANTITY	0.98+
Trillium	ORGANIZATION	0.98+
Python	TITLE	0.98+
Netezza	ORGANIZATION	0.98+
Trillium Software	ORGANIZATION	0.98+
this year	DATE	0.98+
Wikibon	ORGANIZATION	0.97+
Hortonworks	ORGANIZATION	0.97+
Hadoop	TITLE	0.97+
earlier this week	DATE	0.96+
today	DATE	0.96+
Teradata	ORGANIZATION	0.95+
Big Data Silicon Valley 2017	EVENT	0.94+
First thing	QUANTITY	0.94+
single view	QUANTITY	0.94+
big data	ORGANIZATION	0.92+
Hive	TITLE	0.92+
Java	TITLE	0.92+
The Cube	ORGANIZATION	0.92+
single button	QUANTITY	0.91+
AtScale	ORGANIZATION	0.91+
end of Q1	DATE	0.9+
single software	QUANTITY	0.9+
second one	QUANTITY	0.89+
first	QUANTITY	0.89+
California,	LOCATION	0.89+
Flink	TITLE	0.88+
Big Data	TITLE	0.88+
two different skill	QUANTITY	0.87+
Silicon Valley,	LOCATION	0.84+
360	QUANTITY	0.83+
three	QUANTITY	0.82+
last three years	DATE	0.8+
Valley	TITLE	0.79+
Google Cloud 2017	EVENT	0.79+
Windows	TITLE	0.78+
prem	ORGANIZATION	0.76+
couple of years back	DATE	0.76+
NYC	LOCATION	0.75+
two APIs	QUANTITY	0.75+

Amit Walia | BigData SV 2017

>> Announcer: Live from San Jose, California, it's the Cube, covering Big Data Silicon Valley 2017. (upbeat music) >> Hello and welcome to the Cube's special coverage of Big Data SV, Big Data in Silicon Valley in conjunction with Strata + Hadoop. I'm John Furrier with George Gilbert, with Mickey Bonn and Peter Burns as well. We'll be doing interviews all day today and tomorrow, here in Silicon Valley in San Jose. Our next guest is Amit Walia who's the Executive Vice President and Chief Product Officer of Informatica. Kicking of the day one of our coverage. Great to see you. Thanks for joining us on our kick off. >> Good to be here with you, John. >> So obviously big data. this is like the eighth year of us covering, what was once Hadoop World, now it's Strata + Hadoop, Big Data SV. We also do Big Data NYC with the Cube and it's been an interesting transformation over the past eight years. This year has been really really hot with you're starting to see Big Data starting to get a clear line of sight of where it's going. So I want to get your thoughts, Amit, on where the view of the marketplace is from your standpoint. Obviously Informatica's got a big place in the enterprise. And the real trends on how the enterprises are taking analytics and specifically with the cloud. You got the AI looming, all buzzed up on AI. That really seized, people had to get their arms around that. And you see IoT. Intel announced an acquisition, $15 billion for autonomous vehicles, which is essentially data. What's your views? >> Amit: Well I think it's a great question. 10 years have happened since Hadoop started right? I think what has happened as we see is that today what enterprises are trying to encapsulate is what they call digital transformation. What does it mean? I mean think about it, digital transformation for enterprises, it means three unique things. They're transforming their business models to serve their customers better, they're transforming their operational models for their own execution internally, if I'm a manufacturing or an execution-oriented company. The third one is basically making sure that their offerings are also tailored to their customers. And in that context, if you think about it, it's all a data-driven world. Because it's data that helps customers be more insightful, be more actionable, and be a lot more prepared for the future. And that covers the things that you said. Look, that's where Hadoop came into play with big data. But today the three things that organizations are catered around big data is just a lot of data right? How do I bring actionable insights out of it? So in that context, ML and AI are going to play a meaningful role. Because to me as you talk about IoT, IoT is the big game changer of big data becoming big or huge data if I may for a minute. So machine learning, AI, self-service analytics is a part of that, and the third one would be big data and Hadoop going to cloud. That's going to be very fast. >> John: And so the enterprises now are also transforming, so this digital transformation, as you point out, is absolutely real, it's happening. And you start to see a lot more focus on the business models of companies where it's not just analytics as a IT function, it's been talked about for a while, but now it's really more relevant because you're starting to see impactful applications. >> Exactly. >> So with cloud and (chuckles) the new IoT stuff you start to say okay apps matter. And so the data becomes super important. How is that changing the enterprises' readiness in terms of how they're consuming cloud and data and what not? What's you're view on that? Because you guys are deep in this. >> Amit: Yep. >> What's the enterprises' orientation these days? >> So slight nuance to that, as an answer. I think what organizations have realized is that today two things happened that never happened in the last 20 years. Massive fragmentation of the persistence layer, you see Hadoop itself fragmented the whole database layer. And a massive fragmentation of the app layer. So there are 3,000 enterprise size apps today. So just think about it, you're not restricted to one app. So what customers and enterprises are realizing is that, the data layer is where you need to organize yourself. So you need to own the data layer, you cannot just be in the app layer and the database layer because you got to be understanding your data. Because you could be anywhere and everywhere. And the best example I give in the world of cloud is, you don't own anything, you rent it. So what do you own? You own the darn data. So in that context, enterprise readiness as you came to, becomes very important. So understanding and owning your data is the critical secret sauce. And that's where companies are getting disrupted. So the new guys are leveraging data, which by the way the legacy companies had, but they couldn't figure it out. >> What is that? This is important. I want to just double-click on that. Because you mentioned the data layer, what's the playbook? Because that's like the number one question that I get. >> Mm-hmm. >> On Cube interviews or off camera is that okay, I want to have a data strategy. Now that's empty in its statement, but what is the playbook? I mean, is it architecture? Because the data is the strategic advantage. >> Amit: Yes. >> What are they doing? What's the architecture? What are some of the things that enterprises do? Now obviously they care about service level agreements and having potentially multicloud, for instance, as a key thing. But what is that playbook for this data layer? >> That's a very good question, sir. Enterprise readiness has a couple of dimensions. One you said is that there will be hybrid doesn't mean a ground cloud multicloud. I mean you're going to be in multi SAS apps, multi platform apps, multi databases in the cloud. So there is a hybrid world over there. Second is that organizations need to figure out a data platform of their own. Because ultimately what they care for is that, do I have a full view of my customer? Do I have a full view of the products that I'm selling and how they are servicing my customers? That can only happen if you have what I call a meta-data driven data platform. Third one is, boy oh boy, you talked about self-service analytics, you need to know answers today. Having analytics be more self-serving for the business user, not necessarily the IT user, and then leveraging AI to make all these things a lot more powerful. Otherwise, you're going to be spending, what? Hours and hours doing statistical analysis, and you won't be able to get to it given the scale and size of data models. And SLAs will play a big role in the world of cloud. >> Just to follow up on that, so it sounds like you've got the self-service analytics to help essentially explore and visualize. >> Amit: Mm-hmm. >> You've got the data governance and cataloging and lineage to make sure it is high quality and navigable, and then you want to operationalize it once you've built the models. But there's this tension between I want what made the data lake great, which was just dump it all in there so we have this one central place, but all the governance stuff on top of that is sort of just well, we got to organize it anyway. >> Yeah. >> How do you resolve that tension? >> That is a very good question. And that's where enterprises kind of woke up to. So a good example I'll give you, what everybody wanted to make a data lake. I mean if you remember two years ago, 80% of the data lakes fell apart and the reason was for the fact that you just said is that people made the data lake a data swamp if I may. Just dump a lot of data into my loop cluster, and life will be great. But the thing is that, and what customers of large enterprises realized is they became system integrators of their own. I got to bring data, catalog it, prepare it, surface it. So the belief of customers now is that, I need a place to go where basically it can easily bring in all the data, meta-data driven catalog, so I can use AI and ML to surface that data. So it's very easy at the preparation layer for my analysts to go around and play with data and then I can visualize anything. But it's all integrated out of the box, then each layer, each component being self-integrated, then it falls apart very quickly when you want to, to your question, at an enterprise level operationalize it. Large enterprises care about two things. Is it operationalizable? And is it scalable? That's where this could fall apart. And that's what our belief is. And that's where governance happens behind the scenes. You're not doing anything. Security of your data, governance of their data is driven through the catalog. You don't even feel it. It's there. >> I never liked the data lakes term. Dave Vellante knows I've always been kind of against, even from day one, 'cause data's more fluid, I call it a data ocean, but to your point, I want to get on that point because I think data lakes is one dimension, right? >> Yeah. >> And we talked about this at Informatica World, last year I think. And this year it's May 15th. >> Yes. >> I think your event is coming up, but you guys introduced meta-data intelligence. >> Yep. >> So there was, the old model was throw it centralized, do some data governance, data management, fence it out, call, make some queries, get some reports. I'm over simplifying but it was like, it was like a side function. You're getting at now is making that data valuable. >> Amit: Yep. >> So if it's in a lake or it's stored, you never know when the data's going to be relevant, so you have to have it addressable. Could you just talk about where this meta-data intelligence is going? Because you mentioned machine learning and AI. 'Cause this seems to be what everyone is talking about. In real time, how do I make the data really valuable when I need it? And what's the secret sauce that you guys have, specifically, to make that happen? >> So that, to contextualize that question, think about it. So if you. What you don't want to do is keep make everything manual. Our belief is that the intelligence around data has to be at the meta-data level, right? Across the enterprise, which is why, when we invested in the catalog, I used the word, "It's the google of data for the enterprise." No place in an enterprise you can go search for all your data, and given that the fast, rapid-changing sources of data, think about IoT, as you talked about, John. Or think about your customer data, for you and me may come from a new source tomorrow. Do you want the analyst to figure out where the data is coming from? Or the machine learning or AI to contextualize and tell you, you know what, I just discovered a great new source for where John is going to go shop. Do you want to put that as a part of analytics to give him an offer? That's where the organizing principle for data sits. The catalog and all the meta-data, which is where ML and AI will converge to give the analyst self-discovery of data sets, recommendations like in Amazon environment, recommendations like Facebook, find other people or other common data that's like a Facebook or a LinkedIn, that is where everything is going, and that's why we are putting all our efforts on AI. >> So you're saying, you want to abstract the way the complexity of where the data sits? So that the analyst or app can interface with that? >> That's exactly right. Because to me, those are the areas that are changing so rapidly, let that be. You can pick whatever data sets based on what you want, you can pick whichever app you want to use, wherever you want to go, or wherever your business wants to go. You can pick whichever analytical tool you like, but you want to be able to take all of those tools but be able to figure out what data is there, and that should change all the time. >> I'm trying to ask you a lot while you're here. What's going to be the theme this year at Informatica World? How do you take it to the next level? Can you just give us a teaser of what we might expect this year? 'Cause this seems to be the hottest trend. >> This is, so first, at Informatica World this year, we will be unveiling our whole new strategy, branding, and messaging, there's a whole amount of push on that one. But the two things that will be focused a lot on is, one is around that intelligent data platform. Which is basically what I'm talking about. The organizing principle of every enterprise for the next decade, and within that, where AI is going to play a meaningful role for people to spring forward, discover things, self-service, and be able to create sense from this mountains of data that's going to sit around us. But we won't even know what to do. >> All right, so what do you guys have in the product, just want to drill into this dynamic you just mentioned, which is new data sources. With IoT, this is going to completely make it more complex. You never know what data's going to be coming off the cars, the wearables, the smart cities. You have all these new killer use-cases that are going to be transformational. How do you guys handle that, and what's the secret sauce of? 'Cause that seems to be the big challenge, okay, I'm used to dealing with data, its structure, whether it's schemas, now we got unstructured. So okay, now I got new data coming in very fast, I don't even know when or where it's going to come in, so I have to be ready for these new data. What is the Informatica solution there? >> So in terms of taking data from any source, that's never been a challenge for us, because Informatica, one of the bread and butter for us is that we connect and bring data from any potential source on the planet, that's what we do. >> John: And you automate that? >> We automate that process, so any potential new source of data, whether it's IoT, unstructured, semi-structured, log, we connect to that. What I think the key is, where we are heavily invested, once you've brought all that. By the way, you can use Kafka Cues for that, you can use back-streaming, all of that stuff you could do. Question is, how do you make sense out of it? I can get all the data, dump it in a Kafka Cue, and then I take it to do some processing on Spark. But the intelligence is where all the Informatica secret sauce is, right? The meta-data, the transformations, that's what we are invested in, but in terms of connecting anything to everything? That we do for a living, we have done that for one quarter of a century, and we keep doing it. >> I mean, I love having a chat with you, Amit, you're a product guy, and we love product guys, 'cause they can give us a little teaser on the roadmap, but I got to ask you the question, with all this automation, you know, the big buzz out in the world is, "Oh machine learning and AI is replacing jobs." So where is the shift going to be, because you can almost connect the dots and say, "Okay, you're going to put some people out of work, "some developer, some automation, "maybe the systems management layer or wherever." Where are those jobs shifting to? Because you could almost say, "Okay, if you're going to abstract away and automate, "who loses their job?" Who gets shifted and what are those new opportunities, because you could almost say that if you automate in, that should create a new developer class. So one gets replaced, one gets created possibly. Your thoughts on this personnel transformation? >> Yeah, I think, I think what we see is that value creation will change. So the jobs will go to the new value. New areas where value is created. A great example of that is, look at developers today, right. Absolutely, I think they did a terrific job in making sure that the Hadoop ecosystem got legitimized, right? But in my opinion, where enterprise scalability comes, enterprises don't want lots of different things to be integrated and just plumbed together. They want things to work out of the box, which is why, you know, software works for them. But what happens is that they want that development community to go work on what I call value-added areas of the stack. So think about it, in connected car, they're working with lots of customers on the connected car issue, right? They don't want developers to work on the plumbing. They want us to kind of give that out of the box, because SLA is operational scale, and enterprise scalability matters, but in terms of the top-layer analytics, to make sure we can make sense out of it, that's what they're, that's where they want innovation. So what you will see is that, I don't think the jobs will go in vapor, but I do think the jobs will get migrated to a different part of the stack, which today it has not been, but that's, you know, we live in Silicon Valley, that's a natural evolution we see, so I think that will happen. In general in the larger industry, again I'd say, look, driverless cars, I don't think they've driven away jobs. What they've done is created a new class of people who work. So I do think that will be a big change. >> Yeah there's a fallacy there. I mean with the ATM argument was ATM's are going to replace tellers, yet more branches opened up. >> That's exactly it. >> So therefore creating new jobs. I want to get to the quick question, I know George has a question, but I want to get on the cost of ownership, because one of the things that's been criticized in some of these emerging areas, like Hadoop and Open Stack, for instance, just to pick two random examples. It's great, looks good, you know, all peace and love. An industry's being created, legitimized, but the cost of ownership has been critical to get that done, it's been expensive, talent, to find talent and deploying it was hard. We heard that on the Cube many times. How does the cost of ownership equation change? As you go after these more value, as developers and businesses go after these more value-creating activities in the Stack? >> See look, I always say, there is no free lunch. Nothing is free. And customers realize that, that open source, if you completely wanted to, to your point, as enterprises wanted to completely scale out and create an end-to-end operational infrastructure, open source ends up being pretty expensive. For all the reasons, right, because you throw in a lot of developers, and it's not necessarily scalable, so what we're seeing right now is that enterprises, as they have figured that this works for me, but when they want to go scale it out, they want to go back to what I call a software provider, who has the scale, who has the supportability, who also has the ability to react to changes and also for them to make sure that they get the comfort that it will work. So to me, that's where they find it cheaper. Just building it, experimenting with that, it's cheaper here, but scaling it out is cheaper with a software provider, so we see a lot of our customers when we start a little bit experimenting to developers, downloading something, works great, but would I really want to take it across Nordstrom or a JP Morgan or a Morgan Stanley. I need security, I need scalability, I need somebody to call to, at that point on those equations become very important. >> And that's where the out of box experience comes in, where you have the automation, that kind of. >> Exactly. >> Does that ease up some of the cost of ownership? >> Exactly, and the talent is a big issue, right? See we live in Silicon Valley, so we. By the way, Silicon Valley hiring talent is hard. Just think about it, if you go to Kansas City, hiring a scholar developer, that's a rare breed. So just, when I go around the globe and talk to customers, they don't see that talent at all that we here just somehow take for granted. They don't, so it's hard for them to kind of put their energy behind it. >> Let me ask. More on the meta-data layer. There's an analogy that's come up from the IIoT world where they're building these digital twins, and it's not just GE. IBM's talking about it, and actually, we've seen more and more vendors where the digital twin is this, it's a digital representation now of some physical object. But you could think of it as meta-data, you know, for a physical object, and it gets richer over time. So my question is, meta-data in the old data warehouse world, was we want one representation of the customer. But now it's, there's a customer representation for a prospect, and one for an account, and one for, you know, in warranty, and one for field service. Is that, how does that change what you offer? >> That's a very very good question. Because that's where the meta-data becomes so much more important because its manifestation is changing. I'll give you a great example, take Transamerica, Transamerica is a customer of ours leveraging big data at scale, and what they're doing is that, to your question, they have existing customers who have insurance through them. But they're looking for white space analysis, who could be potential opportunities? Two distinct ones, and within that, they're looking at relationships. I know you, John, you have Transamerica, could you be an influencer with me? Or within your family, extended family. I'm a friend, but what about a family member that you've declared out there on social media? So they are doing all that stuff in the context of a data lake. How are they doing it? So in that context, think about that complexity of the job, pumping data into a lake won't solve it for them, but that's a necessary first step. The second step is where all of that meta-data through ML and AI, starts giving them that relationship graph. To say, you know what, John in itself has this white space opportunity for you, but John is related to me in one way, him and me are connected on Facebook. John's related to you a little bit more differently, he has a stronger bond with you, and within his family, he has different strong bonds. So that's John's relationship graph. Leverage him, if he has been a good customer of yours. All of that stuff is now at the meta-data level, not just the monolithic meta-data, relationship graph. His relationship graph of what he has bought from you, so that you can just see that discovery becomes a very important element. Do you want to do that in different places? You want to do that in one place. I may be in a cloud environment, I may be on prem, so that's where when I say that meta-data becomes the organized principle, that's where it becomes real. >> Just a quick follow-up on that, then. It doesn't seem obvious that every end customer of yours, not the consumer but the buyer of the software, would have enough data to start building that graph. >> I don't think, to me, what happened was, the word big data, I thought got massively abused. A lot of Hadoop customers are not necessarily big data customers. I know a lot of banking customers, enterprise banking, whose data volumes will surprise you, but they're using Hadoop. What they want is intelligence. That's why I keep saying that the meta-data part, they are more interested in a deeper understanding of the data. A great example is, if John. I had a customer, who basically had a big bank. Rich net worth customer. In their will, the daughter was listed. When the daughter went to school, by the way, went to the bank branch in that city, she had no idea, she walked up, she basically wanted to open an account. Three more friends in the line. Manager comes out because at that point, the teller said, "This is somebody you should take special care of." Boom, she goes in a special cabin, the other friends are standing in a line. Think of the customer service perception, you just created a new millennia right? That's important. >> Well this brings up the interesting comment. The whole graph thing, we love, but this brings back the neural network trend. Which is a concept that's been around for a long long time, but now it's front and center. I remember talking to Diane Green who runs Google Cloud, she was saying that you couldn't hire neural network, they couldn't get jobs 15 years ago. Now you can't hire enough of them. So that brings up the ML conversation. So, I want to take that to a question and ask about the data lake, 'cause you guys have announced a new cloud data lake. >> Yes. >> So it sounds like, from what you're saying, is you're going beyond the data lake. So talk about what that is. Because data lake, people get, you throw stuff into a lake. And hopefully it doesn't become a swamp. How are you guys going beyond just the basic concept of a data lake with your new cloud data lake? >> Yeah, so, data lake. If you remember last year, actually at Strata San Jose we chatted, and we had announced the data lake because we realized customers, to your point John, as you said, were struggling on how to even build a data lake, and they were all over the place, and they were failing. And we announced the first data lake there, and then in Strata New York, basically we brought the meta-data ML part to the data lake. And then obviously right now we're taking it to the cloud, and what we see in the world of data lake is that customers ask for three things. First, they want the prebuilt integrated solution. Data can come in, but I want the intelligence of meta-data and I want data preparation baked in. I don't want to have three different tools that I will go around, so out of the box. But we also saw, as they become successful with our customers, they want to scale up, scale down. Cloud is just a great place to go. You can basically put a data lake out there, by the way in the context of data, a lot of new data sources are in the cloud, so it's easy for them to scale in and out in the cloud, experiment there and all that stuff. Also you know Amazon, we supported Amazon Kinesis, all of these new sources and technologies in the world of cloud are allowing experimentation in the data lake, so that allowed our customers to basically get ahead of the curve very quickly. So in some ways, cloud allowed customers to do things a lot faster, better, and cheaper. So that's what we basically put in the hands of our customers. Now that they are feeling comfortable, they can do a secured and governed data lake without feeling that it's still not self-served. They want to put it in the cloud and be a lot more faster and cheaper about it. >> John: And more analytics on it. >> More analytics. And now, because our ML, our AI, the meta-data part, connects cloud, ground, everything. So they have an organizing principle, whatever they put wherever, they can still get intelligence out of it. >> Amit, we got to break, but I want to get one final comment for you to kind of end the segment, and it's been fun watching you guys work over the past couple years. And I want to get your perspective because the product decisions always have kind of a time table to them, it's not like you made this up last night because it's trendy, but you guys have made some good product choices. It seems like the wind's at your back right now at Informatica. What, specifically, are bets that you guys made a couple years ago that are now bearing fruit? Can you just take a minute to end the segment, share some of those product bets. Because it's not always that obvious to make those product bets years earlier, seems to be a tail wind for you. You agree, and can you share some of those bets? >> I think you said it rightly, product bets are hard, right? Because you got to see three, four years ahead. The one big bet that we made is that we saw, as I said to you, the decoupling of the data layer. So we realized that, look, the app layer's getting fragmented. The cloud platforms are getting fragmented. Databases are getting fragmented. That that whole old monolithic architecture is getting fundamentally blown up, and the customers will be in a multi, multi, multi spread out hybrid world. Data is the organizing principle, so three years ago, we bet on the intelligent data platform. And we said that the intelligent data platform will be intelligent because of the meta-data driven layer, and at that point, AI was nowhere in sight. We put ML in that picture, and obviously, AI has moved, so the bet on the data platform. Second bet that, in that data platform, it'll all be AI, ML driven meta-data intelligence. And the third one is, we bet big on cloud. Big data we had already bet big on, by the way. >> John: You were already there. >> We knew the cloud. Big data will move to the cloud far more rapidly than the old technology moved to the cloud. So we saw that coming. We saw the (mumbles) wave coming. We worked so closely with AWS and the Azure team. With Google now, as well. So we saw three things, and that's what we bet. And you can see the rich offerings we have, the rich partnerships we have, and the rich customers that are live in those platforms. >> And the market's right on your doorstep. I mean, AI is hot, ML, you're seeing all this stuff converge with IoT. >> So those were, I think, forward-looking bets that paid out for us. (chuckles) And but there's so much more to do, and so much more upside for all of us right now. >> A lot more work to do. Amit, thank you for coming on, sharing your insight. Again, you guys got in good pole position in the market, and again it's right on your doorstep, so congratulations. This is the Cube, I'm John Furrier with George Gilbert. With more coverage in Silicon Valley for Big Data SV and Strata + Hadoop after this short break.

Published Date : Mar 14 2017

SUMMARY :

it's the Cube, covering Big Data Silicon Valley 2017. Kicking of the day one of our coverage. And the real trends on how the enterprises And that covers the things that you said. on the business models of companies where How is that changing the enterprises' readiness the data layer is where you need to organize yourself. Because that's like the number one question that I get. Because the data is the strategic advantage. What are some of the things that enterprises do? Second is that organizations need to figure out Just to follow up on that, and then you want to operationalize it and the reason was for the fact that you just said I never liked the data lakes term. And we talked about this is coming up, but you guys introduced So there was, the old model was 'Cause this seems to be what everyone is talking about. and given that the fast, rapid-changing sources of data, and that should change all the time. How do you take it to the next level? But the two things that will be focused a lot on is, All right, so what do you guys have in the product, because Informatica, one of the bread and butter for us By the way, you can use Kafka Cues for that, but I got to ask you the question, So what you will see is that, ATM's are going to replace tellers, We heard that on the Cube many times. So to me, that's where they find it cheaper. where you have the automation, that kind of. Exactly, and the talent is a big issue, right? Is that, how does that change what you offer? so that you can just see that discovery not the consumer but the buyer of the software, I don't think, to me, what happened was, the data lake, 'cause you guys have announced How are you guys going beyond just the basic concept a lot of new data sources are in the cloud, And now, because our ML, our AI, the meta-data part, and it's been fun watching you guys work And the third one is, we bet big on cloud. than the old technology moved to the cloud. And the market's right on your doorstep. And but there's so much more to do, This is the Cube, I'm John Furrier with George Gilbert.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amit Walia	PERSON	0.99+
Diane Green	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mickey Bonn	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Peter Burns	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Transamerica	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
George	PERSON	0.99+
Google	ORGANIZATION	0.99+
$15 billion	QUANTITY	0.99+
Amit	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
Nordstrom	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
May 15th	DATE	0.99+
Kansas City	LOCATION	0.99+
last year	DATE	0.99+
second step	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Morgan Stanley	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
third one	QUANTITY	0.99+
John Furrier	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
each component	QUANTITY	0.99+
two things	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
First	QUANTITY	0.99+
each layer	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
tomorrow	DATE	0.99+
today	DATE	0.99+
eighth year	QUANTITY	0.99+
first	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
three years ago	DATE	0.99+
3,000 enterprise	QUANTITY	0.98+
Big Data SV	ORGANIZATION	0.98+
this year	DATE	0.98+
next decade	DATE	0.98+
two years ago	DATE	0.98+
three	QUANTITY	0.97+

Ritika Gunnar & David Richards - #BigDataSV 2016 - #theCUBE

>> Narrator: From San Jose, in the heart of Silicon Valley, it's The Cube, covering Big Data SV 2016. Now your hosts, John Furrier and Peter Burris. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data Week, Big Data SV Strata Hadoop. This is The Cube, SiliconANGLE's flagship program. We go out to the events and extract the signals from the noise. I'm John Furrier, my co-host is Peter Burris. Our next guest is Ritika Gunnar, VP of Data and Analytics at IBM and David Richards is the CEO of WANdisco. Welcome to The Cube, welcome back. >> Thank you. >> It's a pleasure to be here. >> So, okay, IBM and WANdisco, why are you guys here? What are you guys talking about? Obviously, partnership. What's the story? >> So, you know what WANdisco does, right? Data replication, active-active replication of data. For the past twelve months, we've been realigning our products to a market that we could see rapidly evolving. So if you had asked me twelve months ago what we did, we were talking about replicating just Hadoop, but we think the market is going to be a lot more than that. I think Mike Olson famously said that this Hadoop was going to disappear and he was kind of right because the ecosystem is evolving to be a much greater stack that involves applications, cloud, completely heterogeneous storage environment, and as that happens the partnerships that we would need have to move on from just being, you know, the sort of Hadoop-specific distribution vendors to actually something that can deliver a complete solution to the marketplace. And very clearly, IBM has a massive advantage in the number of people, the services, ecosystem, infrastructure, in order to deliver a complete solution to customers, so that's really why we're here. >> If you could talk about the stack comment, because this is something that we're seeing. Mike Olson's kind of being political when he says make it invisible, but the reality is there is more to big data than Hadoop. There's a lot of other stuff going on. Call it stack, call it ecosystem. A lot of great things are growing, we just had Gaurav on from SnapLogic said, "everyone's winning." I mean, I just love that's totally true, but it's not just Hadoop. >> It's about Alldata and it's about all insight on that data. So when you think about Alldata, Alldata is a very powerful thing. If you look at what clients have been trying to do thus far, they've actually been confined to the data that may be in their operational systems. With the advent of Hadoop, they're starting to bring in some structured and unstructured data, but with the advent of IOT systems, systems of engagement, systems of records and trying to make sense of all of that, Alldata is a pretty powerful thing. When I think of Alldata, I think of three things. I think of data that is not only on premises, which is where a lot of data resides today, but data that's in the cloud, where data is being generated today and where a majority of the growth is. When I think of Alldata, I think of structured data, that is in your traditional operational systems, unstructured and semi-structured data from IOT systems et cetera, and when I think of Alldata, I think of not just data that's on premises for a lot of our clients, but actually external data. Data where we can correlate data with, for example, an acquisition that we just did within IBM with The Weather Company or augmenting with partnerships like Twitter, et cetera, to be able to extract insight from not just the data that resides within the walls of your organization, but external data as well. >> The old expression is if you want to go fast, do it alone, if you want to go deeper and broader and more comprehensive, do it as a team. >> That's right. >> That expression can be applied to data. And you look at The Weather data, you think, hmmm, that's an outlier type acquisition, but when you think about the diversity of data, that becomes a really big deal. And the question I want to ask you guys is, and Ritika, we'll start with you, there's always a few pressure points we've seen in big data. When that pressure is relieved, you've seen growth, and one was big data analytics kind of stalled a little bit, the winds kind of shifted, eye of the storm, whatever you want to call it, then cloud comes in. Cloud is kind of enabling that to go faster. Now, a new pressure point that we're seeing is go faster with digital transformation. So Alldata kind of brings us to all digital. And I know IBM is all about digitizing everything and that's kind of the vision. So you now have the pressure of I want all digital, I need data driven at the center of it, and I've got the cloud resource, so kind of the perfect storm. What's your thoughts on that? Do you see that similar picture? And then does that put the pressure on, say, WANdisco, say hey, I need replication, so now you're under the hood? Is that kind of where this is coming together? >> Absolutely. When I think about it, it's about giving trusted data and insights to everyone within the organization, at the speed in which they need it. So when you think about that last comment of, "At the speed in which they need it," that is the pressure point of what it means to have a digitally transformed business. That means being able to make insights and decisions immediately and when we look at what our objective is from an IBM perspective, it's to be able to enable our clients to be able to generate those immediate insights, to be able to transform their business models and to be able to provide the tooling and the skills necessary, whether we have it organically, inorganically, or through partnerships, like with WANdisco to be able to do that. And so with WANdisco, we believe we really wanted to be able to activate where that data resides. When I talk about Alldata and activation of that data, WANdisco provided to us complementary capabilities to be able to activate that data where it resides with a lot of the capabilities that they're providing through their fusion. So, being able to have and enable our end-users to have that digitally infused set of reactive type of applications is absolutely something... >> It's like David, we talk about, and maybe I'm oversimplifying your value proposition, but I always look at WANdisco as kind of the five nines of data, right? You guys make stuff work, and that's the theme here this year, people just want it to work, right? They don't want to have it down, right? >> Yeah, we're seeing, certainly, an uptick in understanding about what high availability, what continuous availability means in the context of Hadoop, and I'm sure we'll be announcing some pretty big deals moving forward. But we've only just got going with IBM. I would, the market should expect a number of announcements moving forward as we get going with this, but here's the very interesting question associated with cloud. And just to give you a couple of quick examples, we are seeing an increasing number of Global 1,000 companies, Fortune 100 companies move to cloud. And that's really important. If you would have asked me 12 months ago, how is the market going to shape up, I'd have said, well, most CIO's want to move to cloud. It's already happening. So, FINRA, the major financial regulator in the United States is moving to cloud, publicly announced it. The FCA in the UK publicly announced they are moving 100% to cloud. So this creates kind of a microcosm of a problem that we solve, which is how do you move transactional data from on-premise to cloud and create a sort of hybrid environment. Because with the migration, you have to build a hybrid cloud in order to do that anyway. So, if it's just archive systems, you can package it on a disk drive and post it, right? If we're talking about transactional data, i.e, stuff that you want to use, so for example, a big travel company can't stop booking flights while they move their data into the cloud, right? They would take six months to move petabyte scale data into cloud. We solve that problem. We enable companies to move transactional data from on-premise into cloud, without any interruption to services. >> So not six months? >> No, not six months. >> Six hours? >> And you can keep on using the data while it is in transit. So we've been looking for a really simplistic problem, right, to explain this really complex algorithm that we've got that you know does this active-active replication stuff. That's it, right? It's so simple, and nobody else can do it. >> So no downtime, no disruption to their business? >> No, and you can use the cloud or you can use the on-prem applications while the data is in transit. >> So when you say all cloud, now we're on a theme, Alldata, all digital, all cloud, there's a nuance there because most, and we had Gaurav from SnapLogic talk about it, there's always going to be an on-prem component. I mean, probably not going to see 100% everyone move to the cloud, public cloud, but cloud, you mean hybrid cloud essentially, with some on-prem component. I'm sure you guys see that with Bluemix as well, that you've got some dabbling in the public cloud, but ultimately, it's one resource pool. That's essentially what you're saying. >> Yeah, exactly. >> And I think it's really important. One of the things that's very attractive e about the WANdisco solution is that it does provide that hybridness from the on-premises to cloud and that being able to activate that data where it resides, but being able to do that in a heterogeneous fashion. Architectures are very different in the cloud than they are on premises. When you look at it, your data like may be as simple as Swift object store or as S3, and you may be using elements of Hadoop in there, but the architectures are changing. So the notion of being able to handle hybrid solutions both on-premises and cloud with the heterogeneous capability in a non-invasive way that provides continuous data is something that is not easily achieved, but it's something that every enterprise needs to take into account. >> So Ritika, talk about the why the WANdisco partnership, and specifically, what are some of the conversations you have with customers? Because, obviously there's, it sounds like, the need to go faster and have some of this replication active-active and kind of, five nines if you will, of making stuff not go down or non-disruptive operations or whatever the buzzword is, but you know, what's the motivation from your standpoint? Because IBM is very customer-centric. What are some of the conversations and then how does WANdisco fit into those conversations? >> So when you look at the top three use cases that most clients use for even Hadoop environments or just what's going on in the market today, the top three use cases are you know, can I build a logical data warehouse? Can I build areas for discovery or analytical discovery? Can I build areas to be able to have data archiving? And those top three solutions in a hybrid heterogeneous environment, you need to be able to have active-active access to the data where that data resides. And therefore, we believe, from an IBM perspective, that we want to be able to provide the best of breed regardless of where that resides. And so we believe from a WANdisco perspective, that WANdisco has those capabilities that are very complementary to what we need for that broader skills and tooling ecosystem and hence why we have formed this partnership. >> Unbelievably, in the market, we're also seeing and it feels like the Hadoop market's just got going, but we're seeing migrations from distributions like Cloudera into cloud. So you know, those sort of lab environments, the small clusters that were being set up. I know this is slightly controversial, and I'll probably get darts thrown at me by Mike Olson, but we are seeing pretty large-scale migration from those sort of labs that were set up initially. And as they progress, and as it becomes mission-critical, they're going to go to companies like IBM, really, aren't they, in order to scale up their infrastructure? They're going to move the data into cloud to get hyperscale. For some of these cases that Ritika was just talking about so we are seeing a lot of those migrations. >> So basically, Hadoop, there's some silo deployments of POC's that need to be integrated in. Is that what you're referring to? I mean, why would someone do that? They would say okay, probably integration costs, probably other solutions, data. >> If you do a roll-your-own approach, where you go and get some open-source software, you've got to go and buy servers, you've got to go and train staff. We've just seen one of our customers, a big bank, two years later get servers. Two years to get servers, to get server infrastructure. That's a pretty big barrier, a practical barrier to entry. Versus, you know, I can throw something up in Bluemix in 30 minutes. >> David, you bring up a good point, and I want to just expand on that because you have a unique history. We know each other, we go way back. You were on The Cube when, I think we first started seven years ago at Hadoop World. You've seen the evolution and heck, you had your own distribution at one point. So you know, you've successfully navigated the waters of this ecosystem and you had gray IP and then you kind of found your swim lanes and you guys are doing great, but I want to get your perspective on this because you mentioned Cloudera. You've seen how it's evolving as it goes mainstream, as you know, Peter says, "The big guys are coming in and with power." I mean, IBM's got a huge spark investment and it's not just you know, lip service, they're actually donating a ton of code and actually building stuff so, you've got an evolutionary change happening within the industry. What's your take on the upstarts like Cloudera and Hortonworks and the Dishrow game? Because that now becomes an interesting dynamic because it has to integrate well. >> I think there will always be a market for the distribution of opensource software. As that sort of, that layer in the stack, you know, certainly Cloudera, Hortonworks, et cetera, are doing a pretty decent job of providing a distribution. The Hadoop marketplace, and Ritika laid this on pretty thick as well, is not Hadoop. Hadoop is a component of it, but in cloud we talk about object store technology, we talk about Swift, we talk about S3. We talk about Spark, which can be run stand-alone, you don't necessarily need Hadoop underneath it. So the marketplace is being stretched to such a point that if you were to look at the percentage of the revenue that's generated from Hadoop, it's probably less than one percent. I talked 12 months ago with you about the whale season, the whales are coming. >> Yeah, they're here. >> And they're here right now, I mean... >> (laughs) They're mating out in the water, deals are getting done. >> I'm not going to deal with that visual right now, but you're quite right. And I love the Peter Drucker quote which is, "Strategy is a commodity, execution is an art." We're now moving into the execution phase. You need a big company in order to do that. You can't be a five hundred or a thousand person... >> Is Cloudera holding onto dogma with Hadoop or do they realize that the ecosystem is building around them? >> I think they do because they're focused on the application layer, but there's a lot of competition in the application layer. There's a little company called IBM, there's a little company called Microsoft and the little company called Amazon that are kind of focused on that as well, so that's a pretty competitive environment and your ability to execute is really determined by the size of the organization to be quite frank. >> Awesome, well, so we have Hadoop Summit coming up in Dublin. We're going to be in Ireland next month for Hadoop Summit with more and more coverage there. Guys, thanks for the insight. Congratulations on the relationship and again, WANdisco, we know you guys and know what you guys have done. This seems like a prime time for you right now. And IBM, we just covered you guys at InterConnect. Great event. Love The Weather Company data, as a weather geek, but also the Apple announcement was really significant. Having Apple up on stage with IBM, I think that is really, really compelling. And that was just not a Barney deal, that was real. And the fact that Apple was on stage was a real testament to the direction you guys are going, so congratulations. This is The Cube, bringing you all the action, here live in Silicon Valley here for Big Data Week, BigData SV, and Strata Hadoop. We'll be right back with more after this short break.

Published Date : Mar 30 2016

SUMMARY :

the heart of Silicon Valley, and David Richards is the CEO of WANdisco. What's the story? and as that happens the partnerships but the reality is there is but data that's in the cloud, if you want to go deeper and broader to ask you guys is, and to be able to provide the tooling how is the market going to that we've got that you know the cloud or you can use dabbling in the public cloud, from the on-premises to cloud the need to go faster and the top three use cases are you know, and it feels like the Hadoop of POC's that need to be integrated in. a practical barrier to entry. and it's not just you know, lip service, in the stack, you know, mating out in the water, And I love the Peter and the little company called Amazon to the direction you guys are

ENTITIES

Entity	Category	Confidence
Michiel	PERSON	0.99+
Anna	PERSON	0.99+
David	PERSON	0.99+
Bryan	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Chris	PERSON	0.99+
NEC	ORGANIZATION	0.99+
Ericsson	ORGANIZATION	0.99+
Kevin	PERSON	0.99+
Dave Frampton	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Kerim Akgonul	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Jared	PERSON	0.99+
Steve Wood	PERSON	0.99+
Peter	PERSON	0.99+
Lisa Martin	PERSON	0.99+
NECJ	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Mike Olson	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Michiel Bakker	PERSON	0.99+
FCA	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Nokia	ORGANIZATION	0.99+
Lee Caswell	PERSON	0.99+
ECECT	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
OTEL	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
Bryan Pijanowski	PERSON	0.99+
Rich Lane	PERSON	0.99+
Kerim	PERSON	0.99+
Kevin Bogusz	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jared Woodrey	PERSON	0.99+
Lincolnshire	LOCATION	0.99+
Keith	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Chuck	PERSON	0.99+
Jeff	PERSON	0.99+
National Health Services	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
March	DATE	0.99+
Nutanix	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Ireland	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Michael Dell	PERSON	0.99+
Rajagopal	PERSON	0.99+
Dave Allante	PERSON	0.99+
Europe	LOCATION	0.99+
March of 2012	DATE	0.99+
Anna Gleiss	PERSON	0.99+
Samsung	ORGANIZATION	0.99+
Ritika Gunnar	PERSON	0.99+
Mandy Dhaliwal	PERSON	0.99+

Jack Norris | Strata-Hadoop World 2012

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George Gilder	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Kristin Filetti	PERSON	0.99+
Joel Hellison	PERSON	0.99+
John Schroeder	PERSON	0.99+
Joe	PERSON	0.99+
Jack	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Jack Norris	PERSON	0.99+
John	PERSON	0.99+
40 days	QUANTITY	0.99+
Melinda Graham	PERSON	0.99+
64%	QUANTITY	0.99+
$99	QUANTITY	0.99+
comScore	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Dave	PERSON	0.99+
Tuesday	DATE	0.99+
Matt BARR	PERSON	0.99+
Hellerstein	PERSON	0.99+
Google	ORGANIZATION	0.99+
George Gilder	PERSON	0.99+
Ted	PERSON	0.99+
John ferry	PERSON	0.99+
30 years	QUANTITY	0.99+
30,000 times	QUANTITY	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
a week later	DATE	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Dana	PERSON	0.99+
Tim SDS	PERSON	0.99+
one point	QUANTITY	0.99+
Java	TITLE	0.99+
first	QUANTITY	0.99+
six months later	DATE	0.99+
one	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
Linux	TITLE	0.98+
once a week	QUANTITY	0.98+
18 months	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
HBase	TITLE	0.98+
Kozum	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
this morning	DATE	0.97+
Telekom	ORGANIZATION	0.97+
this week	DATE	0.97+
10 years ago	DATE	0.97+
second dimension	QUANTITY	0.97+
both	QUANTITY	0.97+
Kozum	ORGANIZATION	0.95+
third one	QUANTITY	0.95+
One	QUANTITY	0.94+
three things	QUANTITY	0.94+
a year ago	DATE	0.94+
Hadoop	TITLE	0.93+
siliconangle.com	OTHER	0.93+
Knicks	ORGANIZATION	0.93+
Regents	ORGANIZATION	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Strata Hadoop World: